Spark Machine Learning and Deep Learning Deep Dive.
Scenarios that use Spark hybrid with other data analytics tools (MS R on Spark, Tensorflow(keras) with Spark, Scikit-learn with Spark, etc)
Running GA4 without gtag.js using ssGTM and elbwalker? Markus Baersch
?
Walker.js is an open source event tracking library that can be used to feed data into Google Analytics 4 (GA4) without using gtag.js. It provides more control over session handling and sends events to configurable destinations. Events can be sent to a self-hosted Google Tag Manager (GTM) container that forwards them to GA4, or to a custom endpoint. Walker.js parses data attributes to automatically construct analytics events or events can be manually pushed via JavaScript. It handles consent to determine where data can be sent. The library provides flexibility in how data is collected and routed compared to relying solely on gtag.js.
Bitmovin AV1/VVC Presentation_Streaming Media East by Christian FeldmannBitmovin Inc
?
This document provides an overview and update on AV1 and VVC video coding standards. It summarizes the novel technical features of each, including AV1 improvements like overlapped block motion compensation and affine motion, and VVC features like triangle partitioning and decoder-side motion refinement. Performance results show VVC provides around 30% better compression than HM 16.20, while AV1 is around 20% better. Both standards are still in development and neither has wide adoption yet. VVC licensing is unknown while AV1 remains free and open.
Gdc 14 bringing unreal engine 4 to open_glchangehee lee
?
This document discusses bringing Unreal Engine 4 to OpenGL by porting its render hardware interface (RHI) to support OpenGL. It describes mapping the D3D11-based RHI to OpenGL, developing a cross-compiler to compile HLSL shaders to GLSL, addressing differences between D3D and OpenGL like texture coordinate systems, and optimizations to achieve performance parity with D3D11. It also covers bringing UE4 to Android by leveraging an NVIDIA Tegra K1 mobile chip's full OpenGL 4.4 support.
eMetrics London - The AB Testing Hype CycleCraig Sullivan
?
The document discusses best practices for A/B testing, including:
1. Performing analytics health checks and modelling to understand user flows before testing.
2. Testing in areas informed by analytics rather than copying competitors, and accounting for device mix.
3. Doing user research like surveys, interviews and usability testing to inform testing.
4. Prioritizing high opportunity, low cost tests and creating a money model to estimate potential returns.
5. Conducting pre-flight checks to ensure tests are functioning properly across devices before launching.
1. Q-learning is a type of reinforcement learning algorithm that seeks to learn the optimal policy for an agent to take actions in an environment to maximize rewards.
2. The algorithm works by maintaining a Q-table that contains Q-values representing the expected rewards for state-action pairs, which are updated using the Bellman equation as the agent interacts with the environment.
3. Over time, the Q-values converge and the agent learns the optimal policy to take the best actions under different states to maximize long-term rewards without requiring a model of the environment.
Recommendation systems today are widely used across many applications such as in multimedia content platforms, social networks, and ecommerce, to provide suggestions to users that are most likely to fulfill their needs, thereby improving the user experience. Academic research, to date, largely focuses on the performance of recommendation models in terms of ranking quality or accuracy measures, which often don¨t directly translate into improvements in the real-world. In this talk, we present some of the most interesting challenges that we face in the personalization efforts at Netflix. The goal of this talk is to sunshine challenging research problems in industrial recommendation systems and start a conversation about exciting areas of future research.
This document discusses techniques for lighting and tonemapping in 3D graphics to better simulate the human visual system. It covers gamma correction, which accounts for how monitors display light intensities non-linearly. It also discusses filmic tonemapping, which produces crisp blacks, saturated dark tones, and soft highlights similar to film, by applying a tone curve modeled after photographic film. This provides advantages over other tonemapping operators like Reinhard for reproducing accurate colors across a high dynamic range.
A Bizarre Way to do Real-Time LightingSteven Tovey
?
This document provides a 10 step guide for implementing real-time lighting on the PlayStation 3 (PS3) using its parallel architecture of 6 Synergistic Processing Units (SPUs). It discusses rendering a pre-pass to extract normals and depth, calculating lighting in a tile-based parallel manner on the SPUs, and compositing the final lighting texture. Special techniques like using atomics, striping data across SPUs, and maintaining pipeline balance are needed to optimize performance on the PS3's parallel architecture. The goal is to achieve real-time lighting for a game with 20 cars racing at night, while preserving picture quality and reducing frame latency to acceptable levels.
Data.Monks sGTM is a universal endpoint.pptxDoug Hall
?
Server side, serverless, server-to-server, scalable from zero to enterprise, cookieless, and ITP compliant cookies, open source, automated data collection and auditing, personalisation data flows for 1 Euro/day
The document summarizes collaborative filtering in cloud computing. It discusses key concepts like cloud computing, collaborative filtering, and Hadoop. It then covers different types of collaborative filtering like user-based, item-based, memory-based, and model-based approaches. Specific algorithms like nearest neighbor and top-N recommendations are explained. Challenges like cold start, ratings, interfaces and metrics to evaluate performance are also summarized.
Presentation at KGC2012 about the Architecture and Optimization techniques used to make the Relic FX System as used in the Company of Heroes and Dawn Of War Series, and many other Relic Games.
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit
?
With the rapid growth of available datasets, it is imperative to have good tools for extracting insight from big data. The Spark ML library has excellent support for performing at-scale data processing and machine learning experiments, but more often than not, Data Scientists find themselves struggling with issues such as: low level data manipulation, lack of support for image processing, text analytics and deep learning, as well as the inability to use Spark alongside other popular machine learning libraries. To address these pain points, Microsoft recently released The Microsoft Machine Learning Library for Apache Spark (MMLSpark), an open-source machine learning library built on top of SparkML that seeks to simplify the data science process and integrate SparkML Pipelines with deep learning and computer vision libraries such as the Microsoft Cognitive Toolkit (CNTK) and OpenCV. With MMLSpark, Data Scientists can build models with 1/10th of the code through Pipeline objects that compose seamlessly with other parts of the SparkML ecosystem. In this session, we explore some of the main lessons learned from building MMLSpark. Join us if you would like to know how to extend Pipelines to ensure seamless integration with SparkML, how to auto-generate Python and R wrappers from Scala Transformers and Estimators, how to integrate and use previously non-distributed libraries in a distributed manner and how to efficiently deploy a Spark library across multiple platforms.
eMetrics London - The AB Testing Hype CycleCraig Sullivan
?
The document discusses best practices for A/B testing, including:
1. Performing analytics health checks and modelling to understand user flows before testing.
2. Testing in areas informed by analytics rather than copying competitors, and accounting for device mix.
3. Doing user research like surveys, interviews and usability testing to inform testing.
4. Prioritizing high opportunity, low cost tests and creating a money model to estimate potential returns.
5. Conducting pre-flight checks to ensure tests are functioning properly across devices before launching.
1. Q-learning is a type of reinforcement learning algorithm that seeks to learn the optimal policy for an agent to take actions in an environment to maximize rewards.
2. The algorithm works by maintaining a Q-table that contains Q-values representing the expected rewards for state-action pairs, which are updated using the Bellman equation as the agent interacts with the environment.
3. Over time, the Q-values converge and the agent learns the optimal policy to take the best actions under different states to maximize long-term rewards without requiring a model of the environment.
Recommendation systems today are widely used across many applications such as in multimedia content platforms, social networks, and ecommerce, to provide suggestions to users that are most likely to fulfill their needs, thereby improving the user experience. Academic research, to date, largely focuses on the performance of recommendation models in terms of ranking quality or accuracy measures, which often don¨t directly translate into improvements in the real-world. In this talk, we present some of the most interesting challenges that we face in the personalization efforts at Netflix. The goal of this talk is to sunshine challenging research problems in industrial recommendation systems and start a conversation about exciting areas of future research.
This document discusses techniques for lighting and tonemapping in 3D graphics to better simulate the human visual system. It covers gamma correction, which accounts for how monitors display light intensities non-linearly. It also discusses filmic tonemapping, which produces crisp blacks, saturated dark tones, and soft highlights similar to film, by applying a tone curve modeled after photographic film. This provides advantages over other tonemapping operators like Reinhard for reproducing accurate colors across a high dynamic range.
A Bizarre Way to do Real-Time LightingSteven Tovey
?
This document provides a 10 step guide for implementing real-time lighting on the PlayStation 3 (PS3) using its parallel architecture of 6 Synergistic Processing Units (SPUs). It discusses rendering a pre-pass to extract normals and depth, calculating lighting in a tile-based parallel manner on the SPUs, and compositing the final lighting texture. Special techniques like using atomics, striping data across SPUs, and maintaining pipeline balance are needed to optimize performance on the PS3's parallel architecture. The goal is to achieve real-time lighting for a game with 20 cars racing at night, while preserving picture quality and reducing frame latency to acceptable levels.
Data.Monks sGTM is a universal endpoint.pptxDoug Hall
?
Server side, serverless, server-to-server, scalable from zero to enterprise, cookieless, and ITP compliant cookies, open source, automated data collection and auditing, personalisation data flows for 1 Euro/day
The document summarizes collaborative filtering in cloud computing. It discusses key concepts like cloud computing, collaborative filtering, and Hadoop. It then covers different types of collaborative filtering like user-based, item-based, memory-based, and model-based approaches. Specific algorithms like nearest neighbor and top-N recommendations are explained. Challenges like cold start, ratings, interfaces and metrics to evaluate performance are also summarized.
Presentation at KGC2012 about the Architecture and Optimization techniques used to make the Relic FX System as used in the Company of Heroes and Dawn Of War Series, and many other Relic Games.
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit
?
With the rapid growth of available datasets, it is imperative to have good tools for extracting insight from big data. The Spark ML library has excellent support for performing at-scale data processing and machine learning experiments, but more often than not, Data Scientists find themselves struggling with issues such as: low level data manipulation, lack of support for image processing, text analytics and deep learning, as well as the inability to use Spark alongside other popular machine learning libraries. To address these pain points, Microsoft recently released The Microsoft Machine Learning Library for Apache Spark (MMLSpark), an open-source machine learning library built on top of SparkML that seeks to simplify the data science process and integrate SparkML Pipelines with deep learning and computer vision libraries such as the Microsoft Cognitive Toolkit (CNTK) and OpenCV. With MMLSpark, Data Scientists can build models with 1/10th of the code through Pipeline objects that compose seamlessly with other parts of the SparkML ecosystem. In this session, we explore some of the main lessons learned from building MMLSpark. Join us if you would like to know how to extend Pipelines to ensure seamless integration with SparkML, how to auto-generate Python and R wrappers from Scala Transformers and Estimators, how to integrate and use previously non-distributed libraries in a distributed manner and how to efficiently deploy a Spark library across multiple platforms.
Deep learning text NLP and Spark Collaboration . ?? ??? Text NLP & Sparkhoondong kim
?
This slide explain the Deep Learning Text NLP for Korean Language. We will also discuss expansion using Spark in Deep Learning Approach to BigData Scale data.
? ??????? ??? deep learning Text NLP? ??? ????. ??, BigData Scale ???? ?? Deep Learning Approach ? ??, Spark ? ??? ??? ???? ???.
Apache Kafka Streams + Machine Learning / Deep LearningKai W?hner
?
This document discusses applying machine learning models to real-time stream processing using Apache Kafka. It covers building analytic models from historical data, applying those models to real-time streams without redevelopment, and techniques for online training of models. Live demos are presented using open source tools like Kafka Streams, Kafka Connect, and H2O to apply machine learning to streaming use cases like flight delay prediction. The key takeaway is that streaming platforms can leverage pre-built machine learning models to power real-time analytics and actions.
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDatabricks
?
Apache Spark is a powerful, scalable real-time data analytics engine that is fast becoming the de facto hub for data science and big data. However, in parallel, GPU clusters are fast becoming the default way to quickly develop and train deep learning models. As data science teams and data savvy companies mature, they will need to invest in both platforms if they intend to leverage both big data and artificial intelligence for competitive advantage.
This session will cover:
C How to leverage Spark and TensorFlow for hyperparameter tuning and for deploying trained models
C DeepLearning4J, CaffeOnSpark, IBM¨s SystemML and Intel¨s BigDL
C Sidecar GPU cluster architecture and Spark-GPU data reading patterns
C The pros, cons and performance characteristics of various approaches
You¨ll leave the session better informed about the available architectures for Spark and deep learning, and Spark with and without GPUs for deep learning. You¨ll also learn about the pros and cons of deep learning software frameworks for various use cases, and discover a practical, applied methodology and technical examples for tackling big data deep learning.
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...MLconf
?
Convolutional Neural Networks at scale in Spark MLlib:
Jeremy Nixon will focus on the engineering and applications of a new algorithm built on top of MLlib. The presentation will focus on the methods the algorithm uses to automatically generate features to capture nonlinear structure in data, as well as the process by which it¨s trained. Major aspects of that include compositional transformations over the data, convolution, and distributed backpropagation via SGD with adaptive gradients and an adaptive learning rate. Applications will look into how to use convolutional neural networks to model data in computer vision, natural language and signal processing. Details around optimal preprocessing, the type of structure that can be learned, and managing its ability to generalize will inform developers looking to apply nonlinear modeling tools to problems that they face.
Build a deep learning pipeline on apache spark for ads optimizationCraig Chao
?
This document discusses building deep learning pipelines on Apache Spark for ad optimization. It begins by discussing how data has become a new form of colonialism. It then explains why deep learning should be done on Apache Spark rather than just TensorFlow. The remainder of the document discusses machine learning pipelines on Apache Spark, how machine learning and deep learning can be used for ad optimization, and various approaches to deep learning on Apache Spark using tools like MMLSpark, Databricks, DL4J, BigDL, and SystemML.
1. PyData is a community for users and developers of open-source data tools in Python including NumPy, Pandas, SciPy, scikit-learn, IPython, and Jupyter.
2. Pandas is a software library written for data manipulation and analysis in Python, built on top of NumPy and SciPy. It provides data structures and operations for working with relational or labeled data and time series.
3. Jupyter Notebook is an open-source web application that allows users to create and share documents that contain live code, equations, visualizations and explanatory text. It supports over 40 programming languages including Python, R and Julia.
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...Kai W?hner
?
Talk from JavaOne 2017: Apache Kafka + Kafka Streams for Scalable, Mission Critical Deep Learning.
Intelligent real time applications are a game changer in any industry. Deep Learning is one of the hottest buzzwords in this area. New technologies like GPUs combined with elastic cloud infrastructure enable the sophisticated usage of artificial neural networks to add business value in real world scenarios. Tech giants use it e.g. for image recognition and speech translation. This session discusses some real-world scenarios from different industries to explain when and how traditional companies can leverage deep learning in real time applications.
This session shows how to deploy Deep Learning models into real time applications to do predictions on new events. Apache Kafka will be used to execute analytic models in a highly scalable and performant way.
The first part introduces the use cases and concepts behind Deep Learning. It discusses how to build Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN) and Autoencoders leveraging open source frameworks like TensorFlow, DeepLearning4J or H2O.
The second part shows how to deploy the built analytic models to real time applications leveraging Apache Kafka as streaming platform and Apache Kafka¨s Streams API to embed the intelligent business logic into any external application or microservice.
Some further material around Apache Kafka and Machine Learning:
- Blog Post: How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka: https://www.confluent.io/blog/build-deploy-scalable-machine-learning-production-apache-kafka/
- Video: Build and Deploy Analytic Models with H2O.ai and Apache Kafka: https://www.youtube.com/watch?v=-q7CyIExBKM&feature=youtu.be
- Code: Github Examples using Apache Kafka, TensorFlow, H2O, DeepLearning4J: https://github.com/kaiwaehner/kafka-streams-machine-learning-examples
100% Serverless big data scale production Deep Learning Systemhoondong kim
?
- BigData Sale Deep Learning Training System (with GPU Docker PaaS on Azure Batch AI)
- Deep Learning Serving Layer (with Auto Scale Out Mode on Web App for Linux Docker)
- BigDL, Keras, Tensorlfow, Horovod, TensorflowOnAzure
- E-commerce BigData Scale AI Journey
- BigData Scale Deep Learning Production System Use Case
- Deep Learning, Cloud PaaS, Microservices, DevOps, etc.
- E-Commerce AI Production System Strategy