This document discusses techniques for recommender systems including multi-armed bandit (MAB), Thompson sampling, user clustering, and using item features. It provides examples of how MAB works using the ε-greedy approach and explores the tradeoff between exploration and exploitation. User clustering is presented as a way to group users based on click-through rate to improve targeting. Finally, it suggests using different item features like images, text, and collaborative filtering data as inputs to recommendation models.
The document contains log data from user activities on a platform. There are three columns - user_id, event, and event_date. It logs the activities of 5 users over several days, including events like logins, posts, comments, views. It also includes some aggregated data on unique events and totals by user.
This document discusses using BigQuery and Dataflow for ETL processes. It explains loading raw data from databases into BigQuery, transforming the data with Dataflow, and writing the results. It also mentions pricing of $5 per terabyte for BigQuery storage and notes that Dataflow provides virtual CPUs and RAM. Finally, it includes a link about performing ETL from relational databases to BigQuery.
The document discusses deep learning paper reading roadmaps and lists several github repositories that aggregate deep learning papers. It also discusses developing mobile applications that utilize machine learning and the differences between developing for iOS versus Android. Lastly, it mentions continuing to learn through practice and experimentation with deep learning techniques.
This document discusses using BigQuery and Dataflow for ETL processes. It explains loading raw data from databases into BigQuery, transforming the data with Dataflow, and writing the results. It also mentions pricing of $5 per terabyte for BigQuery storage and notes that Dataflow provides virtual CPUs and RAM. Finally, it includes a link about performing ETL from relational databases to BigQuery.
The document discusses deep learning paper reading roadmaps and lists several github repositories that aggregate deep learning papers. It also discusses developing mobile applications that utilize machine learning and the differences between developing for iOS versus Android. Lastly, it mentions continuing to learn through practice and experimentation with deep learning techniques.
Vectorized Processing in a Nutshell. (in Korean)
Presented by Hyoungjun Kim, Gruter CTO and Apache Tajo committer, at DeView 2014, Sep. 30 Seoul Korea.
The document discusses various machine learning clustering algorithms like K-means clustering, DBSCAN, and EM clustering. It also discusses neural network architectures like LSTM, bi-LSTM, and convolutional neural networks. Finally, it presents results from evaluating different chatbot models on various metrics like validation score.
The document discusses challenges with using reinforcement learning for robotics. While simulations allow fast training of agents, there is often a "reality gap" when transferring learning to real robots. Other approaches like imitation learning and self-supervised learning can be safer alternatives that don't require trial-and-error. To better apply reinforcement learning, robots may need model-based approaches that learn forward models of the world, as well as techniques like active localization that allow robots to gather targeted information through interactive perception. Closing the reality gap will require finding ways to better match simulations to reality or allow robots to learn from real-world experiences.
[243] Deep Learning to help student¨s Deep LearningNAVER D2
?
This document describes research on using deep learning to predict student performance in massive open online courses (MOOCs). It introduces GritNet, a model that takes raw student activity data as input and predicts outcomes like course graduation without feature engineering. GritNet outperforms baselines by more than 5% in predicting graduation. The document also describes how GritNet can be adapted in an unsupervised way to new courses using pseudo-labels, improving predictions in the first few weeks. Overall, GritNet is presented as the state-of-the-art for student prediction and can be transferred across courses without labels.
[234]Fast & Accurate Data Annotation Pipeline for AI applicationsNAVER D2
?
This document provides a summary of new datasets and papers related to computer vision tasks including object detection, image matting, person pose estimation, pedestrian detection, and person instance segmentation. A total of 8 papers and their associated datasets are listed with brief descriptions of the core contributions or techniques developed in each.
[226]NAVER ?? deep click prediction: ????? ????NAVER D2
?
This document presents a formula for calculating the loss function J(θ) in machine learning models. The formula averages the negative log likelihood of the predicted probabilities being correct over all samples S, and includes a regularization term λ that penalizes predicted embeddings being dissimilar from actual embeddings. It also defines the cosine similarity term used in the regularization.
The document discusses running a TensorFlow Serving (TFS) container using Docker. It shows commands to:
1. Pull the TFS Docker image from a repository
2. Define a script to configure and run the TFS container, specifying the model path, name, and port mapping
3. Run the script to start the TFS container exposing port 13377
The document discusses linear algebra concepts including:
- Representing a system of linear equations as a matrix equation Ax = b where A is a coefficient matrix, x is a vector of unknowns, and b is a vector of constants.
- Solving for the vector x that satisfies the matrix equation using linear algebra techniques such as row reduction.
- Examples of matrix equations and their component vectors are shown.
This document describes the steps to convert a TensorFlow model to a TensorRT engine for inference. It includes steps to parse the model, optimize it, generate a runtime engine, serialize and deserialize the engine, as well as perform inference using the engine. It also provides code snippets for a PReLU plugin implementation in C++.
The document discusses machine reading comprehension (MRC) techniques for question answering (QA) systems, comparing search-based and natural language processing (NLP)-based approaches. It covers key milestones in the development of extractive QA models using NLP, from early sentence-level models to current state-of-the-art techniques like cross-attention, self-attention, and transfer learning. It notes the speed and scalability benefits of combining search and reading methods for QA.
69. ? ??? ???? high TPS? ??
save minhash(110)[1 DB write] =
load minhash(100)[1 DB read] x
compute minhash(101~110)[input buffer]
old minhash new new new
local minhash
memorystorage
updated minhash
??? ????
??? ?????
?????.
micro batch
?? ???
????? ???
?? ??.
(????)
78. Redis Data structure candidates
Strings
Sets - support add, remove, union, intersection
- plain K/V
?????!
VS
79. ?? ??? ??? string?? ??? ???? ? ???.
sig45 Tom Jerry Robert Jack
string?? ??? ??
^[Tom, Jerry, Robert, Jack] ̄
??? ? ?? ???
json.dumps(data)
json.loads(data_str)
[write] [read]
^[Tom, Jerry, Robert, Jack] ̄
80. ??? ??? String??. ??!
Order! O(1) vs O(N)
In [24]: %time gs.load_benchmark('user','key')
CPU times: user 0.32 s, sys: 0.03 s, total: 0.34 s
Wall time: 0.42 s
In [25]: %time gs.load_benchmark('user','set')
CPU times: user 32.34 s, sys: 0.13 s, total: 32.47 s
Wall time: 33.88 s
1)
? ????? String? Set?? ?? ??.
string set
81. `redis string¨ ? mget ??? ????.
(multiple get at once)
N call round trip -> 1 call
In [9]: %timeit [redis.get(s) for s in sigs]
100 loops, best of 3: 9.99 ms per loop
In [10]: %timeit redis.mget(sigs)
1000 loops, best of 3: 759 us per loop
2)