In this talk, you will learn how to use, or create Deep Learning architectures for Image Recognition and other neural network computations in Apache Spark. Alex, Tim and Sujee will begin with an introduction to Deep Learning using BigDL. Then they will explain and demonstrate how image recognition works using step by step diagrams, and code which will give you a fundamental understanding of how you can perform image recognition tasks within Apache Spark. Then, they will give a quick overview of how to perform image recognition on a much larger dataset using the Inception architecture. BigDL was created specifically for Spark and takes advantage of Sparks ability to distribute data processing workloads across many nodes. As an attendee in this session, you will learn how to run the demos on your laptop, on your own cluster, or use the BigDL AMI in the AWS Marketplace. Either way, you walk away with a much better understanding of how to run deep learning workloads using Apache Spark with BigDL. Presentation by Alex Kalinin, Tim Fox, Sujee Maniyam & Dave Nielsen at re:invent.
This document discusses building a data lake on AWS. It notes that organizations that successfully generate value from data will outperform competitors. It outlines challenges of data visibility, multiple access mechanisms, and analyzers needing access. AWS is presented as the perfect solution with its storage, analysis and security capabilities at scale. Case studies of Celgene and IEP are presented that used AWS for their data lakes. Traditional analytics are separated from data warehousing, but data lakes extend this by including diverse data and analytical engines at larger scale with lower costs. The AWS portfolio for data lakes, analytics and IoT is presented as the most complete toolset. Building value from the data lake is discussed through machine learning, analytics, data movement and visualization.
Generative Adversarial Networks (GANs) using Apache MXNetApache MXNet
油
The document provides an overview of generative adversarial networks (GANs) using Apache MXNet. It introduces GANs and deep learning concepts. It then demonstrates how to implement GANs using MXNet with examples like DCGAN. Finally, it discusses other GAN models and provides resources for using MXNet on AWS.
Serverless Text Analytics with Amazon ComprehendDonnie Prakoso
油
Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text.
This deck provides how to build your own text analytics using Amazon Comprehend and integration with other AWS services. On top of that, this deck also provides an introduction to Amazon Lex.
Data Modelling is an important tool in the toolbox of a developer. By building and communicating a shared understanding of the domain they're working with, their applications and APIs are more useable and maintainable. However, as you scale up your technical teams, how do you keep these benefits whilst avoiding time-consuming meetings every time something new comes along? This talk reminds ourselves of key data modelling technique and how our use of Kafka changes and informs them. It then examines how these patterns change as more teams join your organisation and how Kafka comes into its own in this world.
The document discusses architectures for the 21st century. It emphasizes that 21st century architectures should be controllable, resilient, and adaptable. It also discusses focusing architectures on data. Other topics covered include accessibility in the present and future, with a focus on voice and interactions beyond just voice. Security practices for well-architected systems are presented, including security in continuous integration/continuous delivery pipelines. Automation to enhance security is also discussed.
Amir sadoughi developing large-scale machine learning algorithms on amazon ...MLconf
油
The document discusses Amazon SageMaker, a machine learning platform that allows users to build, train, and deploy machine learning models. It describes key aspects of developing machine learning algorithms on SageMaker such as interface design, system design, testing, and communications. Specific topics covered include storage optimization, compute resources, network design, unit testing, benchmarking, and hyperparameter tuning. The document provides an example of developing an exponential moving average algorithm on SageMaker.
This document discusses building AI-powered apps on AWS using services like Polly, Rekognition, and Cognito. It provides demos of text-to-speech with Polly and image analysis with Rekognition. It also discusses using Cognito for user identity management and CodeStar for developing mobile apps.
What it Means to be a Next-Generation Managed Service ProviderDatadog
油
- The webinar will last 60 minutes with Q&A at the end. Questions should be asked via the chat panel and participants should keep their lines muted. The webinar will be recorded.
- John Gray from Datadog, Thomas Robinson from AWS, and Patrick Hannah from CloudHesive will present on monitoring tools and strategies across cloud infrastructure and the AWS Managed Service Provider program.
- Next-generation managed service providers need comprehensive monitoring across customers' infrastructure to quickly resolve issues, improve efficiency, and provide value. Tools like Datadog allow for unified monitoring across platforms and environments.
10 Ways to Scale with Redis - LA Redis Meetup 2019Dave Nielsen
油
Redis has 10 different data structures (String, Hash, List, Set, Sorted Set, Bit Array, Bit Field, Hyperloglog, Geospatial Index, Streams) plus Pub/Sub and many Redis Modules. In this talk, Dave will give 10 examples of how to use these data structures to scale your website. I will start with the basics, such as a cache and User session management. Then I demonstrate user generated tags, leaderboards and counting things with hyberloglog. I will with a demo of Redis Pub/Sub vs Redis Streams which can be used to scale your Microservices-based architecture.
10 Ways to Scale Your Website Silicon Valley Code Camp 2019Dave Nielsen
油
Redis has 10 different data structures (String, Hash, List, Set, Sorted Set, Bit Array, Bit Field, Hyperloglog, Geospatial Index, Streams) plus Pub/Sub and many Redis Modules. In this talk, Dave will give 10 examples of how to use these data structures to scale your website. I will start with the basics, such as a cache and User session management. Then I demonstrate user generated tags, leaderboards and counting things with hyberloglog. I will with a demo of Redis Pub/Sub vs Redis Streams which can be used to scale your Microservices-based architecture.
Redis Streams plus Spark Structured StreamingDave Nielsen
油
Continuous applications have 3 things in common: They collect data from sources (ex: IoT devices), process them in real-time (example: ETL), and deliver them to machine learning serving layer for decision making. Continuous applications face many challenges as they grow to production. Often, due to the rapid increase in the number of devices or end-users or other data sources, the size of their data set grows exponentially. This results in a backlog of data to be processed. The data will no longer be processed in near-real-time.
Redis Streams enables you to collect both binary and text data in the time series format. The consumer groups of Redis Stream help you match the data processing rate of your continuous application with the rate of data arrival from various sources.
Apache Sparks Structured Streaming API enables real-time decision making for Continuous Applications.
In this session, Dave will perform a live demonstration of how to integrate open source Redis with Apache Sparks Structured Streaming API using Spark-Redis library. I will also walk through the code and run a live continuous application.
Microservices - Is it time to breakup? Dave Nielsen
油
Microservices: Is it time to break up? discusses when and how to transition from a monolithic application architecture to a microservices architecture. While microservices allow for improved scalability, performance, and agility, they also introduce more complexity. The document recommends using Redis data services to address scalability needs within a monolithic application initially. As needs grow, the application can then be decomposed into microservices while still leveraging Redis to manage shared data. Redis Enterprise provides high availability, security, and performance advantages for microservices architectures in the cloud.
Add Redis to Postgres to Make Your Microservices Go Boom!Dave Nielsen
油
際際滷s for talk delivered at PostgresOpen 2018 in San Francisco https://postgresql.us/events/pgopen2018/schedule/session/538-add-redis-to-postgres-to-make-your-microservice-go-boom/
Redis as a Main Database, Scaling and HADave Nielsen
油
Iskren Chernev, an Independent developer, uses a lot of Redis. In this talk, Iskren will look at a particular Redis use-case -- using it as the main database (not cache). Iskren will show how to achieve reasonable guarantees about data integrity, speed, high-availability in an event of failure and infinite horizontal scalability. This particular approach has proven successful in managing clusters of up to 2400 nodes, and storing data north of 7TB before replication. We'll cover ways to separate your data appropriately into many nodes, performing different types of migrations (from another database, from one cluster to another, scaling migrations and migrating out of Redis), moving nodes without downtime, some configuration tips and monitoring.
The document discusses a Cloud Storage API that provides a common library allowing developers to access either Amazon S3 or Nirvanix IMFS storage services with the same code, avoiding vendor lock-in and allowing switching between providers by changing configuration parameters only, while focusing on website features rather than storage integration challenges and differences between provider APIs and pricing.
Mashery is an API management platform formed in 2006 to help companies enable their APIs for the cloud. It provides services like developer key provisioning, reporting, community management, monitoring, access control, and capacity management to offload these responsibilities from its clients. Mashery operates a highly redundant infrastructure across multiple public clouds to ensure reliability and uses techniques like scripted alert reactions, data replication, and DNS failover for high availability.
- Google App Engine is a platform for easily developing and hosting scalable web applications, with no need for complex server management. It automatically scales the applications and handles all the operational details.
- App Engine applications run on Google's infrastructure and benefit from automatic scaling across multiple servers. It also provides security isolation and quotas to prevent applications from disrupting others.
- The platform uses a stateless, request-based architecture and scales applications automatically as traffic increases by distributing requests across multiple servers. It also uses quotas to ensure fairness among applications.
This document discusses cloud computing and the need for a unified cloud storage API. It describes cloud computing as utilizing on-demand computing resources over the internet rather than local servers, and identifies various cloud computing layers from hardware to applications. It also notes that cloud storage is useful for persistent apps but that current solutions lead to vendor lock-in or lack redundancy. The proposed solution is a cloud storage API that provides abstraction from specific vendors, supports multiple languages and clients, and allows for custom business logic.
Integrating Wikis And Other Social ContentDave Nielsen
油
The document discusses integrating social content like wikis into websites. It defines Web 2.0 as user-generated content and lists use cases like forums, reviews, and photo sharing. It introduces WYSIWYG wikis from Wetpaint that allow collaborative editing. It demonstrates how to add Wetpaint Injected social components to a site using server controls and WordPress plugins. Finally, it proposes ideas for a developer contest using social wikis and invites contact for more information.
research explores the application of machine learning to predict common training areas and client needs in East Africa's dynamic labor market. By leveraging historical data, industry trends, and advanced algorithms, the study aims to revolutionize how training programs are designed and delivered
INSIGHTS INTO STABILITY ASPECTS OF HYBRID SYSTEM; AN ENABLING TECHNOLOGY FOR ...hugoshan513
油
The development of a country is dependent on the per person energy consumption rate, which is very low
in Bangladesh. Bangladesh installed a capacity of 10416 MW electricity on June 2014 and three fourth of
which is considered to be accessible. Near about 45% people has no access to electricity. Therefore,
electricity shortage is an acute crisis in Bangladesh. As Saint Martin Island is far away from the main
land, it is almost impossible and cost ineffective to supply electricity from the national grid. For
connection of nearly 6000 peoples of Saint Martin to the main stream of development and to make this
island more attractive to the tourists, it is very essential to provide electricity for them. Power generation
by combining solar, wind and diesel, known as hybrid system can be the most efficient technique for the
electrification of these types of Island. Based on this principle, in this paper a hybrid system is designed
for electrification of Saint Martins Island. In the analysis, realistic data is used for load calculation and
optimization analysis for most effective solution. Hybrid Optimization Model for Electric Renewable
(HOMER) software is used to find out the final optimization and sensitive analysis of hybrid system. This
system satisfies the load demand and reduces carbon emission which will help to generate green energy.
10 Ways to Scale with Redis - LA Redis Meetup 2019Dave Nielsen
油
Redis has 10 different data structures (String, Hash, List, Set, Sorted Set, Bit Array, Bit Field, Hyperloglog, Geospatial Index, Streams) plus Pub/Sub and many Redis Modules. In this talk, Dave will give 10 examples of how to use these data structures to scale your website. I will start with the basics, such as a cache and User session management. Then I demonstrate user generated tags, leaderboards and counting things with hyberloglog. I will with a demo of Redis Pub/Sub vs Redis Streams which can be used to scale your Microservices-based architecture.
10 Ways to Scale Your Website Silicon Valley Code Camp 2019Dave Nielsen
油
Redis has 10 different data structures (String, Hash, List, Set, Sorted Set, Bit Array, Bit Field, Hyperloglog, Geospatial Index, Streams) plus Pub/Sub and many Redis Modules. In this talk, Dave will give 10 examples of how to use these data structures to scale your website. I will start with the basics, such as a cache and User session management. Then I demonstrate user generated tags, leaderboards and counting things with hyberloglog. I will with a demo of Redis Pub/Sub vs Redis Streams which can be used to scale your Microservices-based architecture.
Redis Streams plus Spark Structured StreamingDave Nielsen
油
Continuous applications have 3 things in common: They collect data from sources (ex: IoT devices), process them in real-time (example: ETL), and deliver them to machine learning serving layer for decision making. Continuous applications face many challenges as they grow to production. Often, due to the rapid increase in the number of devices or end-users or other data sources, the size of their data set grows exponentially. This results in a backlog of data to be processed. The data will no longer be processed in near-real-time.
Redis Streams enables you to collect both binary and text data in the time series format. The consumer groups of Redis Stream help you match the data processing rate of your continuous application with the rate of data arrival from various sources.
Apache Sparks Structured Streaming API enables real-time decision making for Continuous Applications.
In this session, Dave will perform a live demonstration of how to integrate open source Redis with Apache Sparks Structured Streaming API using Spark-Redis library. I will also walk through the code and run a live continuous application.
Microservices - Is it time to breakup? Dave Nielsen
油
Microservices: Is it time to break up? discusses when and how to transition from a monolithic application architecture to a microservices architecture. While microservices allow for improved scalability, performance, and agility, they also introduce more complexity. The document recommends using Redis data services to address scalability needs within a monolithic application initially. As needs grow, the application can then be decomposed into microservices while still leveraging Redis to manage shared data. Redis Enterprise provides high availability, security, and performance advantages for microservices architectures in the cloud.
Add Redis to Postgres to Make Your Microservices Go Boom!Dave Nielsen
油
際際滷s for talk delivered at PostgresOpen 2018 in San Francisco https://postgresql.us/events/pgopen2018/schedule/session/538-add-redis-to-postgres-to-make-your-microservice-go-boom/
Redis as a Main Database, Scaling and HADave Nielsen
油
Iskren Chernev, an Independent developer, uses a lot of Redis. In this talk, Iskren will look at a particular Redis use-case -- using it as the main database (not cache). Iskren will show how to achieve reasonable guarantees about data integrity, speed, high-availability in an event of failure and infinite horizontal scalability. This particular approach has proven successful in managing clusters of up to 2400 nodes, and storing data north of 7TB before replication. We'll cover ways to separate your data appropriately into many nodes, performing different types of migrations (from another database, from one cluster to another, scaling migrations and migrating out of Redis), moving nodes without downtime, some configuration tips and monitoring.
The document discusses a Cloud Storage API that provides a common library allowing developers to access either Amazon S3 or Nirvanix IMFS storage services with the same code, avoiding vendor lock-in and allowing switching between providers by changing configuration parameters only, while focusing on website features rather than storage integration challenges and differences between provider APIs and pricing.
Mashery is an API management platform formed in 2006 to help companies enable their APIs for the cloud. It provides services like developer key provisioning, reporting, community management, monitoring, access control, and capacity management to offload these responsibilities from its clients. Mashery operates a highly redundant infrastructure across multiple public clouds to ensure reliability and uses techniques like scripted alert reactions, data replication, and DNS failover for high availability.
- Google App Engine is a platform for easily developing and hosting scalable web applications, with no need for complex server management. It automatically scales the applications and handles all the operational details.
- App Engine applications run on Google's infrastructure and benefit from automatic scaling across multiple servers. It also provides security isolation and quotas to prevent applications from disrupting others.
- The platform uses a stateless, request-based architecture and scales applications automatically as traffic increases by distributing requests across multiple servers. It also uses quotas to ensure fairness among applications.
This document discusses cloud computing and the need for a unified cloud storage API. It describes cloud computing as utilizing on-demand computing resources over the internet rather than local servers, and identifies various cloud computing layers from hardware to applications. It also notes that cloud storage is useful for persistent apps but that current solutions lead to vendor lock-in or lack redundancy. The proposed solution is a cloud storage API that provides abstraction from specific vendors, supports multiple languages and clients, and allows for custom business logic.
Integrating Wikis And Other Social ContentDave Nielsen
油
The document discusses integrating social content like wikis into websites. It defines Web 2.0 as user-generated content and lists use cases like forums, reviews, and photo sharing. It introduces WYSIWYG wikis from Wetpaint that allow collaborative editing. It demonstrates how to add Wetpaint Injected social components to a site using server controls and WordPress plugins. Finally, it proposes ideas for a developer contest using social wikis and invites contact for more information.
research explores the application of machine learning to predict common training areas and client needs in East Africa's dynamic labor market. By leveraging historical data, industry trends, and advanced algorithms, the study aims to revolutionize how training programs are designed and delivered
INSIGHTS INTO STABILITY ASPECTS OF HYBRID SYSTEM; AN ENABLING TECHNOLOGY FOR ...hugoshan513
油
The development of a country is dependent on the per person energy consumption rate, which is very low
in Bangladesh. Bangladesh installed a capacity of 10416 MW electricity on June 2014 and three fourth of
which is considered to be accessible. Near about 45% people has no access to electricity. Therefore,
electricity shortage is an acute crisis in Bangladesh. As Saint Martin Island is far away from the main
land, it is almost impossible and cost ineffective to supply electricity from the national grid. For
connection of nearly 6000 peoples of Saint Martin to the main stream of development and to make this
island more attractive to the tourists, it is very essential to provide electricity for them. Power generation
by combining solar, wind and diesel, known as hybrid system can be the most efficient technique for the
electrification of these types of Island. Based on this principle, in this paper a hybrid system is designed
for electrification of Saint Martins Island. In the analysis, realistic data is used for load calculation and
optimization analysis for most effective solution. Hybrid Optimization Model for Electric Renewable
(HOMER) software is used to find out the final optimization and sensitive analysis of hybrid system. This
system satisfies the load demand and reduces carbon emission which will help to generate green energy.
Selzy: Simplifying Email Marketing for Maximum GrowthSelzy
油
This presentation is about Selzy, an easy-to-use and affordable email marketing tool that helps businesses create and launch effective email campaigns with minimal effort. It highlights the challenges of traditional email marketing, showcases Selzys AI-powered email builder, fast setup, and 24/7 support, and demonstrates the tools impact through user growth and market potential. With a strong ROI and a rapidly expanding customer base, Selzy positions itself as a powerful yet simple solution for businesses looking to boost engagement and sales through email marketing.
How can Competitive Intelligence Platforms benefit a Business?Contify
油
Competitive intelligence platforms help businesses stay ahead by analyzing market trends, tracking competitors, and identifying growth opportunities. They provide real-time insights, improving decision-making and strategic planning. With data-driven analysis, businesses can optimize marketing, enhance product development, and gain a competitive edge, ensuring long-term success in a dynamic market.
For more information please visit here https://www.contify.com/platform/
Agile Infinity: When the Customer Is an Abstract ConceptLoic Merckel
油
巨介 巨 腫咋介 介稲腫咋介 瑞稲 腫諮稲介署: 駒瑞駒稲 腫腫 基駒告 咋署告介咋介諮駒腫諮
In some SAFe and Scrum setups, the user is so astronomically far removed, they become a myth.
The product? Unclear.
The focus? Process.
Working software? Closing Jira tickets.
Customer feedback? A demo to a proxy of a proxy.
Customer value? A velocity chart.
Agility becomes a prescribed ritual.
Agile becomes a performance, not a mindset.
Welcome to the Agile business:
鏝 where certifications are dispensed like snacks from vending machines behind a 7/11 in a back alley of Kiyamachi,
鏝 where framework templates are sold like magic potions,
鏝 where Waterfall masquerades in Scrum clothing,
鏝 where Prime One-Day delivery out-of-the-box rigid processes are deployed in the name of adaptability.
And yet...
鏝 Some do scale value.
鏝 Some focus on real outcomes.
鏝 Some remember the customer is not a persona in a deck; but someone who actually uses the product and relies on it to succeed.
鏝 Some do involve the customer along the way.
And this is the very first principle of the Agile Manifesto.
Not your typical SAFe deck.
鏝 Viewer discretion advised: this deck may challenge conventional thinking.
Only the jester can speak truth to power.
GenAI-powered assistants compared in a real case - 2025-03-18Alessandra Bilardi
油
BigDL Deep Learning in Apache Spark - AWS re:invent 2017
1. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:INVENT
Large Scale Deep Learning
with BigDL
2. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Large Scale Deep Learning
with BigDL
T i m F o x | B i g D a t a a n d M a c h i n e L e a r n i n g C o n s u l t a n t | E l e p h a n t S c a l e
3. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ABOUT ME
Ti m F o x ,
P r i n c i p a l @ E l e p h a n t S c a l e
P r a c t i t i o n e r a n d Tr a i n e r i n D a t a E n g i n e e r i n g
a n d D a t a S c i e n c e
Author of Data Science in Python on LinkedIn Learning
t i m @ e l e p h a n t s c a l e . c o m
L i n k e d i n : t i m - f o x - 0 0 6 3 5 4 1
4. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ABOUT Elephant Scale
Tr a i n i n g i n B i g D a t a a n d A I t e c h n o l o g i e s
B i g D a t a : S p a r k , H a d o o p , C l o u d , N o S Q L , S t r e a m i n g
A I : M a c h i n e L e a r n i n g , D e e p L e a r n i n g , B i g D L ,
Te n s o r f l o w
B i g D L t r a i n i n g a v a i l a b l e !
P u b l i c a n d P r i v a t e t r a i n i n g s a v a i l a b l e
B i g D L S a n d b o x : e l e p h a n t s c a l e . c o m / s a n d b o x
E l e p h a n t s c a l e . c o m
i n f o @ e l e p h a n t s c a l e . c o m
5. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Quick Roundup of AI / Machine
Learning / Deep Learning
6. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AI / MACHINE LEARNING / DEEP LEARNING
Artificial Intelligence (AI):
Broader concept of machines being able to carry
out 'smart' tasks
Machine Learning:
A type of AI that allows software to learn from
data without explicitly programmed
Deep Learning:
Using Neural Networks to solve some hard
problems
A r t i f i c i a l
I n t e l l i g e n c e
M a c h i n e
L e a r n i n g
D e e p
L e a r n i n g
7. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
DEEP LEARNING APPLICATIONS
S e l f D r i v i n g C a r s
ML system using image recognition
Where the edge of the road / road sign / car in front
F a c e r e c o g n i t i o n
Facebook images
System learns from images manually tagged and then automatically detects faces in
uploaded photos
8. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
DEEP LEARNING HISTORY
E a r l y a t t e m p t s a t D e e p L e a r n i n g d i d n o t s u c c e e d .
Compute Power was insufficient for the time.
Training Datasets were insufficiently sized for good results.
We lacked the ability to parallelize our work.
I n t h e m o d e r n e r a , D e e p L e a r n i n g h a s b e e n s u c c e s s f u l .
'Big Data' now we have so much data to train our models
'Big Data ecosystem' excellent big data platforms (Hadoop, Spark, NoSQL)
are available as open source
'Big Compute' - cloud platforms significantly lowered the barrier to massive
compute power
$1 buys you 16 core + 128 G + 10 Gigabit machine for 1 hr on AWS!
So running a 100 node cluster for 5 hrs $500
9. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AI SOFTWARE ECO SYSTEM
M a c h i n e L e a r n i n g D e e p L e a r n i n g
Java
- Weka
- Mahout
- DeepLearning4J
Python
- SciKit
- Tensorflow
- Theano
- Caffe
R
- Many libraries - Deepnet
- Darch
Distributed
- H20
- Spark
- H20
- Spark
- BigDL
Cloud - AWS - AWS
10. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MACHINE LEARNING AND BIG DATA
Until recently most of the machine learning is done on single computer (with lots of
memory100s of GBs)
Most R/Python/Java libraries are single node based
Now Big Data tools make it possible to run machine learning algorithms at massive scale
distributed across a cluster
11. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MODERN DEEP LEARNING FRAMEWORKS
12. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
TOOLS FOR SCALABLE MACHINE LEARNING
A p a c h e S p a r k M L
Runs on top of popular Spark framework
Massively scalable
Can use memory (caching) effectively for
iterative algorithms
Language support: Scala, Java, Python, R
B i g D L
Built for Apache Spark and Optimized for Intel Xeon
Language Support: Scala, Java, Python
Te n s o r F l o w
Based on data flow graphs
Language support: Python, C++
https://www.tensorflow.org/
13. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
TOOLS FOR SCALABLE CLOUD
MACHINE LEARNING
A m a z o n M a c h i n e L e a r n i n g
Ready to go algorithms
Visualization tools
Wizards to guide
Scalable on Amazon Cloud
14. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
BigDL
15. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
WHAT IS BIGDL
A d i s t r i b u t e d d e e p l e a r n i n g l i b r a r y f o r A p a c h e S p a r k
F e a t u r e p a r i t y wi t h p o p u l a r d e e p l e a r n i n g f r a m e wo r k s
Caffe, Torch, Tensorflow
H i g h P e r f o r m a n c e
Powered by Intel Math Kernel Library (MKL) and multi threaded programming
C a n s c a l e t o h u g e d a t a s e t s
Using Apache Spark for scale
O p e n s o u r c e ! ( D e c 2 0 1 6 )
A c t i v e D e v e l o p m e n t
16. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
PRODUCTION ML/DL SYSTEMS ARE COMPLEX!
A c t u a l M L / D L i s o n l y s m a l l p o r t i o n o f m a s s i v e p r o d u c t i o n
s y s t e m
B i g D L r u n n i n g o n a s c a l a b l e p l a t f o r m l i k e S p a r k h e l p s
s i m p l i f y t h e c o m p l e x i t y
17. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
BIGDL FILLS THE 'GAP' IN BIG DATA +
DEEP LEARNING
F o l l o ws p r o v e n d e s i g n p a t t e r n s f o r d e a l i n g wi t h B i g D a t a
S e n d s ' c o m p u t e t o d a t a ' r a t h e r t h a n r e a d i n g m a s s i v e d a t a
o v e r n e t wo r k .
U s e s ' d a t a l o c a l i t y ' o f H D F S ( H a d o o p F i l e S y s t e m
U t i l i z e s ' c l u s t e r m a n a g e r s ' l i k e YA R N / M E S O S
Automatically handles hardware/software failures
Elasticity and resource sharing in a cluster
18. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
BIGDL & SPARK
R u n B i g D L a p p l i c a t i o n s a s S p a r k a p p l i c a t i o n s
S c a l a , J a v a , a n d P y t h o n s u p p o r t
U s e o t h e r S p a r k ' s f e a t u r e s
In memory compute
Integrate with Spark ML and Streaming
E a s y d e v e l o p m e n t wi t h J u p y t e r N o t e b o o k
19. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
BIGDL VS TENSORFLOW
B i g D L Te n s o r f l o w
Runtime Scala Engine with Python
front-end
C++ Engine with Python front-
end
Hadoop compatibility Can run natively on Spark &
Hadoop
Accesses Hadoop data as a
client only
Distributed Operation Scalable with Apache Spark
for massive scale out of the
box
Does not support massive
distribution out of the box
Runs Tensorflow Models Yes Yes
Acceleration CPU w/MKL CPU/GPU
Summary Excellent for distributing
deep-learning models to
massive scale on big-data.
Great TCO value.
Excellent library for small-
medium scale data, although
GPU hardware costs can be
significant.
20. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
BIGDL: BIG COMPUTE PLUS BIG DATA
B i g D L h e l p s u s i n b a l a n c i n g o u r n e e d s
Big Compute: Fast Linear Algebra, Intel MKL library
Optimized for Intel Xeon
Big Data: I/O parallelized to run on many CPUs
B i g D L A l l o ws M a s s i v e S c a l a b i l i t y
Natively Designed to run on Spark
Works with Hadoop eco system (via Spark)
Hadoop is THE Big Data platform for on-premise deployments
P l a y s n i c e l y wi t h o t h e r B i g D L f r a m e wo r k s
Use existing Tensorflow or Caffe at scale in BigDL
Train new models based on existing TF / Caffe models
21. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
BIGDL USE CASES
F r a u d d e t e c t i o n
S e n t i m e n t a n a l y s i s
I m a g e r e c o g n i t i o n
Find more at: https://github.com/intel-analytics/analytics-zoo/
22. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
GPUs and CPUs
23. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
GPUS (GRAPHICS PROCESSING UNITS)
G P U s h a v e a d d r e s s e d p a s t i s s u e s i n t r a i n i n g p e r f o r m a n c e
Example: Tensorflow - optimized to run well on GPUs.
C P U i n p a s t n o t v e c t o r i z e d f o r p a r a l l e l c o m p u t e
Meant that GPUs were much faster for deep learning
M o d e r n I n t e l X e o n C P U s h a v e v e c t o r i z e d l i n e a r a l g e b r a
Properly optimized, approaches speed of GPUs
CPUs are now a credible alternative to running on GPUs
Cost Advantage and Scalability
24. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
INTEL MATH KERNEL LIBRARY (MKL)
F e a t u r e s h i g h l y o p t i m i z e d , t h r e a d e d , a n d v e c t o r i z e d m a t h
f u n c t i o n s t h a t m a x i m i z e p e r f o r m a n c e o n e a c h p r o c e s s o r f a m i l y .
U t i l i z e s i n d u s t r y - s t a n d a r d C a n d F o r t r a n A P I s f o r c o m p a t i b i l i t y
w i t h p o p u l a r B L A S , L A PA C K , a n d F F T W f u n c t i o n s n o c o d e
c h a n g e s r e q u i r e d .
D i s p a t c h e s o p t i m i z e d c o d e f o r e a c h p r o c e s s o r a u t o m a t i c a l l y
w i t h o u t t h e n e e d t o b r a n c h c o d e .
P r o v i d e s p r i o r i t y s u p p o r t , c o n n e c t i n g y o u d i r e c t l y t o I n t e l
e n g i n e e r s f o r c o n f i d e n t i a l a n s w e r s t o t e c h n i c a l q u e s t i o n s
25. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
INTEL MKL PERFORMANCE
26. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
CPU VERSUS GPU FOR BIG DATA
C P U o f f e r s h i g h e r s c a l a b i l i t y a t l o w e r c o s t v e r s u s G P U
O p t i m i z e d S o f t w a r e a n d l i b r a r i e s o n C P U a l l o w s i n g l e - n o d e
p e r f o r m a n c e t o a p p r o a c h G P U p e r f o r m a n c e .
G P U p l u s C P U a r c h i t e c t u r e s c a n b e e f f e c t i v e f o r s m a l l e r n u m b e r
o f n o d e s , w h e n c o s t i s n o t a c o n c e r n .
B i g C o m p u t e v e r s u s B i g D a t a
27. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Running BigDL
28. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
RUNNING BIGDL
D e ve l o p i n g :
U s e t h e f o l l o wi n g t o d e v e l o p
y o u r B i g D L a p p s e ff o r t l e s s l y
Docker
VM Sandbox
D e p l o y i n g :
C l o u d r e a d y d e p l o y m e n t
Amazon AMI
29. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
DEMO: GETTING STARTED WITH BIGDL
We wi l l p r o v i d e :
Docker
Sandbox VM
AWS Marketplace AMI
30. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
BigDL Summary
B i g D L o f f e r s o u t s t a n d i n g s c a l a b i l i t y a n d p e r f o r m a n c e
B i g D L o p t i m i z e s T C O b y r u n n i n g b e i n g t u n e d a n d o p t i m i z e d f o r
I n t e l X e o n P r o c e s s o r s
B i g D L b r i n g s d e e p l e a r n i n g t o S p a r k C l u s t e r s a n d H a d o o p
D a t a s e t s
B i g D L c a n b e u s e d t o d e p l o y Te n s o r f l o w a n d C a f f e m o d e l s t o b i g
d a t a .
31. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
IMAGE RECOGNITION WITH
APACHE SPARK AND BIGDL
A l e x K a l i n i n | V P, A I / M a c h i n e L e a r n i n g | S i z m e k
32. ABOUT ME
A l e x K a l i n i n
V P, A I / M a c h i n e L e a r n i n g | S i z m e k
a l e x . k a l i n i n @ s i z m e k . c o m
L i n k e d i n : l i n k e d i n . c o m / i n / a l e x k a l i n i n /
33. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
https://www.linkedin.com/in/alexkalinin/
alex.kalinin@sizmek.com
1 0 0 , 0 0 0 , 0 0 0 , 0 0 0 r e q u e s t s p e r d a y
P B s o f t r a i n i n g d a t a
AI-POWERED MARKETING AND OPTIMIZATION
7 0 , 0 0 0 , 0 0 0 / m i n u t e
1 , 2 0 0 , 0 0 0 / s e c
34. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FEED-FORWARD NETWORK
35. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
43
37
45
40
= (ゐ ヰ)
?
?
?
?
?
?
36. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
-0.53
0.01
-0.17
0.70
0.51
?
?
= (ゐ ヰ)
43
37
45
40
?
?
?
?
37. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
-0.53
0.01
-0.17
0.70
0.51
?
?
= (ゐ ヰ)
43
37
45
40
-1.56
?
?
?
38. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
-0.53
0.01
-0.17
0.70
0.51
?
?
= 瑞( ゐ ヰ )
43
37
45
40
-1.56
?
?
?
39. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
-0.53
0.01
-0.17
0.70
0.51
?
?
= 瑞( ゐ ヰ )
43
37
45
40
?
?
?
0
40. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
43
37
45
40
-0.12
0.13
0.21
-0.07
-0.05
?
?
= 瑞( ゐ ヰ )
?
?
0
11.9
41. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
?
?
= 瑞( ゐ ヰ )
43
37
45
40
-.11
?
0
11.9
42. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
?
?
= 瑞( ゐ ヰ )
43
37
45
40 ?
0
11.9
0
43. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
?
?
= 瑞( ゐ ヰ )
43
37
45
40 0.15
0
11.9
0
44. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
-0.67
?
= 瑞( ゐ ヰ )
43
37
45
40 0.15
0
11.9
0
45. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
0
?
= 瑞( ゐ ヰ )
43
37
45
40 0.15
0
11.9
0
46. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
0
0.52
= 瑞( ゐ ヰ )
43
37
45
40 0.15
0
11.9
0
47. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FEED-FORWARD NETWORK
48. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
49. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
50. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
51. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FULLY CONNECTED
52. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FULLY CONNECTED
53. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FULLY CONNECTED
54. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FULLY CONNECTED
Input Size:
Connections:
40,000
1,600,000,000
200
200
200
200
10 layers: 16 billion
55. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
INVENTION OF CONVOLUTIONAL
NEURAL NETWORKS
L e N e t - 5 n e t wo r k d e v e l o p e d i n 1 9 9 8 b y Ya n n L e C u n
To r s t e n H u b e l a n d D a v i d W i e s e l
56. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
HUBEL & WIESEL
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1363130/
57. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
HIERARCHICAL & LOCAL VISUAL CORTEX
58. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
HIERARCHICAL & LOCAL VISUAL CORTEX
Lines,
Dots
Orientation,
Movement
High-Level
Shapes
59. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
KEY FEATURES OF CONVOLUTIONAL
NETWORK
C o n v o l u t i o n
P o o l i n g
60. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FULLY CONNECTED
61. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
CONVOLUTION
62. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
CONVOLUTION
63. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
CONVOLUTION
64. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
CONVOLUTION
65. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
CONVOLUTION
O n l y f o u r we i g h t s
66. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
CONVOLUTION
0.10 -0.06
0.24 0.17
Filter
67. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
POOLING
Source: https://cs231n.github.io/convolutional-networks/
68. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
CONVOLUTIONAL NETWORK
0
0
0
0
1
0
0
0
0
0
0
1
2
3
4
5
6
7
8
9
Convolution Pooling Convolution PoolingInput FC FC
Source: https://www.clarifai.com /technology
69. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
http://scs.ryerson.ca/~aharley/vis/conv/flat.html
70. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
DEMO
G i t H u b : h t t p s : / / g i t h u b . c o m / a l e x - k a l i n i n / l e n e t - b i g d l
71. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Getting Started With BigDL
72. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ABOUT ME
S u j e e M a n i y a m
F o u n d e r / P r i n c i p a l @ E l e p h a n t S c a l e
P r a c t i t i o n e r a n d Tr a i n e r i n D a t a E n g i n e e r i n g
a n d D a t a S c i e n c e
Author
- "Hadoop and Spark" video training on O'Reilly Media
- "HBase Design Patterns"
- "Hadoop illuminated"
s u j e e @ e l e p h a n t s c a l e . c o m
L i n k e d i n : l i n k e d i n . c o m / i n / s u j e e m a n i y a m
73. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
RUNNING BIGDL
D e ve l o p i n g :
U s e t h e f o l l o wi n g t o d e v e l o p
y o u r B i g D L a p p s e ff o r t l e s s l y
Docker
VM Sandbox
Amazon AMI
D e p l o y i n g :
C l o u d r e a d y d e p l o y m e n t
Amazon AMI
74. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
GETTING STARTED WITH BIGDL
We wi l l d e m o n s t r a t e
Docker
Sandbox VM
AWS Marketplace AMI
75. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Docker
Step 1 : Install Docker on your laptop
Step 2 : get docker image
docker pull elephantscale/bigdl-sandbox
Step 3 : download tutorials
git clone https://github.com/elephantscale/bigdl-tutorials
Step 4 : Launch docker
cd bigdl-tutorials
./run-bigdl-docker.sh elephantscale/bigdl-sandbox
Step 5 : Go to Jupyter notebook
76. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
VM-Sandbox
Step 1 : Install VMware Player or VirtualBox on your laptop
Step 2 : Download BigDL-Sandbox image
http://elephantscale.com/sandbox/
Step 3 : (In Sandbox) download tutorials
git clone https://github.com/elephantscale/bigdl-tutorials
Step 4 : (In Sandbox) Run BigDL natively
cd bigdl-tutorials
./run-bigdl-native.sh
Step 5 : (In Sandbox) Go to Jupyter notebook
77. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Docker on AWS
Step 1 : Spin up an AMI (Ubuntu recommended)
Step 2 : Install Docker on the instance
Step 3 : get docker image
docker pull elephantscale/bigdl-sandbox
Step 4 : download tutorials
git clone https://github.com/elephantscale/bigdl-tutorials
Step 5 : Launch docker
cd bigdl-tutorials
./run-bigdl-docker.sh elephantscale/bigdl-sandbox
Step 6 : Go to Jupyter notebook
78. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AMI on AWS
Step 1 : Spin up BigDL AMI
Step 2 : download tutorials
git clone https://github.com/elephantscale/bigdl-tutorials
Step 3 : Run BigDL
cd bigdl-tutorials
./run-bigdl-native.sh
Step 4 : Go to Jupyter notebook
79. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
QUESTIONS
G i t H u b : h t t p s : / / g i t h u b . c o m / a l e x - k a l i n i n / l e n e t - b i g d l
L i n k e d I n : h t t p s : / / ww w. l i n k e d i n . c o m / i n / a l e x k a l i n i n /
80. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Notebooks and Resources
BigDL: software.intel.com/bigdl
Tutorials: github.com/dnielsen/bigdl-resources
Sandbox: elephantscale.com/sandbox
BigDL AMI: aws.amazon.com/marketplace/
Training: elephantscale.com
際際滷s: slideshare.net/dcnielsen/
Tim Fox Sujee Maniyam Alex Kalinin Dave Nielsen
Elephant Scale Elephant Scale Sizmek Intel Software
81. 息 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
THANK YOU!