際際滷

際際滷Share a Scribd company logo
Cameras and Cats
  
Why computer vision
is changing the way
you see the world
!1
@_changxu
Image: Pexels
!2
Computer Vision?
You might have heard of this buzz phrase
and think that its some futuristic thing,
but its in your life already.
!3
iPhone FaceID
Image: Wired
Authenticate into your iPhone with your face
Front-facing
camera
!4
Nanit baby camera
Image: Nanit
A daily report on how well
your baby slept last night
Camera
above your
babys crib
!5
Waldo photos
Find your photo needles in album haystacks
Waldos image recognition
picks out just the photos
including your daughter
Teachers take photos of
campers and upload to Waldo
Image: Pexels
!6 Image: Engadget
Osmo game system Hands-on play beyond the screen
Camera sees
whats on the
table
!7
Ring video doorbell
Image: Direct Electric Company
Video feed outside your doorCamera sees
whats outside
your door
!8
Density people counter
Image: Density
Gather space utilization
data across all rooms
Camera counts
people as they enter
and exit a room
!9
Lane departure warnings
Image: Montgomery Rennie Jonson
Beeps if you deviate from your lane
Camera behind the
rear-view mirror
sees lane markings
in front
!10 Image: The Spoon
Self-checkout store
Pick up you want and put it in your bag
!11 Image: Arstechnica
Self-check out store ceiling
Cameras on the ceiling and in the shelves
identify you and the items you pick up
Cameras
!12
Camera-based growing system
Image: iUNU
Monitor crops to detect discoloration on the leaves that may
indicate certain diseases and alert the farmer immediately
Cameras
!13 Image: DigitalGlobe via Satellite Today
Using satellite images to count cars
Count cars at retailers parking
lots over time. Hedge fund uses
this data to make predictions on
how a retailer is doing.
!14 Image: DigitalGlobe via Satellite Today
Observe sponsors logos on-screen
and quantify their visual impact
GumGum sports sponsorship measurement
Everything you are doing uses computer vision.
!15
You just havent thought about it in any of these ways.
One of the reasons that computer vision is so useful is
because it is pervasive.
What is computer vision?
Computer vision is being able to interpret the
physical world through cameras and sensors.
!16
!17 Image: Freepik
In the past, we had to manually input instructions to tell machines to do one
action and then another.
!18
$$$
$$$
###
###
Image: Freepik
Then, computers gained the ability to infer the world through data such as
Tweets, posts and purchases. But they only understood numbers and words.
$$$
###
###
!19 Image: Freepik
Now, computers can understand what they are seeing and sensing. They take inputs
that are images and videos, which allows them to directly observe the physical world.
$$$
So we can have many cameras
that humans dont need to look at.
!20 Image: Disney (Big Hero 6)
Using cameras as their eyes,
computers will become more
knowledgeable and aware of
the physical world and be able
to directly interact with us.
Baymax must
have great
computer vision
hookups.
Sowhy is computer vision
interesting now?
!21
!22 Image: Pexels
Because the internet
is full of cat pictures.
And why not?
To answer that question, were going to talk about cats.
!23
Cameras and sensors
Connectivity
Data
Intelligence
Computing power
1
2
3
4
5
Image: Pexels
There are five reasons that computer vision is exploding now.
!24 Images: Ring, Mobileye, DJI, Drishti Robotics
Cameras and sensors1 Cameras are being deployed everywhere, because
they are better, smaller, and cheaper. There are more
and more inputs of the physical world to computers.
At home
On the factory floor
On drones
In your car
!25 Image: The Verge
Cameras and sensors1 We have also hugely benefitted from the rise of smartphones
over the past decade, which put a camera in everyones
pocket. The massive scale also made cameras a lot cheaper.
!26
Connectivity2
Image: Freepik
These cameras are connected. They take photos at the edge
and then upload to the cloud to aggregate data. This helps
them to constantly improve how they understand the world.
!27
Connectivity2
Image: inVia Robotics
inVia Robotics makes robots for e-commerce fulfillment
warehouses. The robots navigate autonomously by reading
QR codes along the shelves and on the bins. They function as
a swarm by batching orders and optimizing routes together.
Cameras
!28
Data3
300+ hours of video uploaded every minute
50+ million photos posted every day
300+ million photos posted every day
Sources: Merchdope, Statistic Brain Research Institute, Gizmodo, Automated Insights
There is more and more visual data being shared over the
web, powered by cameras in everyones pocket and the many
connected and distributed cameras.
!29
Data3
Image: Google Driverless Car Project
We have also benefitted from the autonomous car movement,
through which we have gathered a huge amount of data
about road conditions, cars, pedestrians, and road signs.
!30 Image: Pexels
But just having many pictures of cats wont necessarily tell a computer whats a cat.
You look at this photo for a fraction
of a second and you know its a cat.
But how does a computer know?
!31 Image: Pexels
How does it know that this is a cat in front of a door
not a tiger and a sunset? Same colors, same stripes!
!32 Image: Pexels
How does it know that this line delineates the chest of this cat against a door,
when the image it sees is completely flat?
!33 Image: Todd Peterson
Plus, the computer has never seen this particular cat in this pose with this
background, because you just took this photo.
!34 Image: Pexels
That is, if youre lucky enough to get the whole cat. You might just get part of a cat.
How does a computer know that this is a cat?
!35
Understanding images is
incredibly hard. We take it for
granted because our phone
opens when it sees our face.
Image: Freepik
!36 Image: Pexels
(210, 179, 172)
R G B
(127, 0, 0)
(246, 171, 97)
To a computer, an image is a
collection of pixels, where each one
is represented by three numbers.
!37 Image: Pexels
12 megapixels  12 million pixels
Since each photo on your iPhone X has
!38 Image: Pexels
12 megapixels  12 million pixels
 36 million numbers
So to look at an image, a computer needs to analyze 36 million numbers.
!39 Image: Pexels
12 megapixels  12 million pixels
 36 million numbers
 trillions of relationships
But its even more complicated. Groups of pixels together gives you
the eyes, ears, and whiskers. This means that computers needs to
analyze trillions of pixel-to-pixel relationships to look at a single image.
!40
Intelligence4
Image: Pexels
Understanding what is in an
image is staggeringly complex.
We have developed algorithms
to simplify the computation so
that it wouldnt take days to tell
you if you took a photo of a cat.
!41
Im going to explain one algorithmic innovation
to you that is critical to understanding images:
Convolution
This is the most technical part of this talk,
but it is foundational to image recognition.
(And I will explain it without math.)
Intelligence4
!42
Intelligence4
Image: Pexels
If you look at an image, youll notice two things:
1) You look at each area separately. The
door ledge on the bottom right is
not relevant to cat ears on the top.
!43
Intelligence4
Image: Pexels
2) If you know what cat ears look like,
then you can find all occurrences of
cat ears in the entire image.
!44
Convolution is a method that allows you to easily and quickly do
both of those things to images.
Otherwise, you might look at how each pixel is related to every
other pixel in the entire image, or decipher each time whether
youre looking at cat ears. Using convolution saves you a lot of work.
Intelligence4
!45
Intelligence4
Image: Pexels
Convolution
I convolve all the pixels in this small box
to arrive at a new number. Now this
number has the information from itself
and the eight pixels surrounding it, but
it doesnt have any information from
the pixels that are far away, which
makes this an efficient operation.
I convolve the pixels and it tells me that
Im looking at a line at a certain angle.
!46
Intelligence4
Image: Pexels
Convolution
I convolve over larger and larger areas
and I see two lines at these angles with
a furry texture in the middle. This tells
me that Im looking at the ear of a cat.
!47
Intelligence4
Image: Pexels
I convolve all over the image and now I
see that there are two cat ears, eyes,
whiskers, and paws. Now I have pretty
good confidence that Im looking at a cat.
This is why you often hear the words
Convolutional Neural Networks, or CNNs,
or ConvNets, when people talk about
computer vision.
!48
Convolution is just one algorithm. Researchers have
developed many other algorithms to reduce the
complexity and help computers recognize whats in an
image accurately and quickly. Most are esoteric and
related to the inner workings of statistics and models.
Ill give two more examples that are easier to grasp.
Intelligence4
!49
Intelligence4
Image: Pexels
Mirroring Cropping
Rotating Shifting colors
Data augmentation is distorting images slightly so that you
have more data to train your model.
!50
Intelligence4
Image: Pexels
Transfer learning is taking a model that is already trained and
applying it to your problem. You need to tweak it because
they used professional photos whereas you are using your
phone camera. This is much easier than starting from scratch.
!51
Computing power5
Image: Scio Info Tech
CPUs GPUs
 Used in traditional computing
 Great at taking a sequential list of
instructions and executing them quickly
 Have multiple cores
 Have thousands of cores that can operate in
parallel and can perform a multitude of
identical simple jobs simultaneously
 Developed for video games in order to
render images on the screen efficiently
 Similarly, image recognition calls for applying
convolution quickly across the image
Computer vision is better suited for GPUs, which is why the
GPU market has gotten a lot of attention of late.
!52
Computing power5
Note: Figures include capital leases; AMZN Capex spend represents total consolidated capex across all businesses
Source: Company data, Goldman Sachs Global Investment Research
Capex spend by public cloud vendor
Major cloud providers have been adding GPUs and making
serious investments in machine-learning specific offerings.
No one publicly discloses
how many Nvidia GPUs they
are buying, but overall Capex
spend is directionally correct.
!53
Computing power5
Public cloud market size and share
Note: Market size based on Gartner Estimates; Company data based on GS Estimates
Source: Company data, Goldman Sachs Global Investment Research
Why? The public cloud market is growing rapidly. We always
need more computing power to recognize cats more quickly.
!54 Image: Pexels
Cameras and sensors
Connectivity
Data
Intelligence
Computing power
1
2
3
4
5
This is a virtuous cycle that gives better predictions and makes
computer vision better and better and more pervasive in our world.
So what are cat photos to you?
!55
Computer vision is breaking another wall that allows us to
interact digitally and physically. It fundamentally changes on
how businesses operate in the physical world.
What challenges do you have that computer vision can solve?
I have a few ideas.
!56 Image: Pexels
Identity and security: Can your product become benefit from seeing who it is
interfacing with? Either a digital product like FaceID or a physical product like a door
!57 Image: Pexels
E-commerce: Why am I still flipping through online catalogs and imagining
clothes on me, but when they arrive, the clothes would invariably fit terribly?
!58 Image: Pexels
Change detection: So much of our jobs is to keep an eye on something.
When youre driving and you want to switch lanes, you need to remember to
check your blind spots. Why do we still have blind spots?
!59 Image: Pexels
The police walk around the city all day to give tickets to illegally parked cars.
Why not put cameras around the city and ticket cars automatically?
!60
Or when youre in a factory and making lots of electronic widgets, a machine
could be out of alignment and start to make defective products. Why not
have a camera stare at it and alert you if it sees something different?
Image: ADDitude
!61 Image: Pexels
A lot of diseases start with changes that are imperceptible to the untrained
eye, like slight tremors in your fingers might indicate Parkinsons. Why not
have a camera as an observer in your home with a direct line to your doctor?
!62 Image: Pexels
Computer vision is changing the way we interact with the world.
Wed be hard pressed to find a business that this would not be relevant for.
Upfront portfolio companies using computer vision
!63
*
*Ring was a former portfolio company until it was sold to Amazon in 2018
!64
CHANG XU
Principal, Upfront Ventures
Find me at:
twitter.com/_changxu
medium.com/@changxu
linkedin.com/in/changx
Image: Pexels

More Related Content

Cameras and Cats - Why Computer Vision is Changing the Way You See the World

  • 1. Cameras and Cats Why computer vision is changing the way you see the world !1 @_changxu Image: Pexels
  • 2. !2 Computer Vision? You might have heard of this buzz phrase and think that its some futuristic thing, but its in your life already.
  • 3. !3 iPhone FaceID Image: Wired Authenticate into your iPhone with your face Front-facing camera
  • 4. !4 Nanit baby camera Image: Nanit A daily report on how well your baby slept last night Camera above your babys crib
  • 5. !5 Waldo photos Find your photo needles in album haystacks Waldos image recognition picks out just the photos including your daughter Teachers take photos of campers and upload to Waldo Image: Pexels
  • 6. !6 Image: Engadget Osmo game system Hands-on play beyond the screen Camera sees whats on the table
  • 7. !7 Ring video doorbell Image: Direct Electric Company Video feed outside your doorCamera sees whats outside your door
  • 8. !8 Density people counter Image: Density Gather space utilization data across all rooms Camera counts people as they enter and exit a room
  • 9. !9 Lane departure warnings Image: Montgomery Rennie Jonson Beeps if you deviate from your lane Camera behind the rear-view mirror sees lane markings in front
  • 10. !10 Image: The Spoon Self-checkout store Pick up you want and put it in your bag
  • 11. !11 Image: Arstechnica Self-check out store ceiling Cameras on the ceiling and in the shelves identify you and the items you pick up Cameras
  • 12. !12 Camera-based growing system Image: iUNU Monitor crops to detect discoloration on the leaves that may indicate certain diseases and alert the farmer immediately Cameras
  • 13. !13 Image: DigitalGlobe via Satellite Today Using satellite images to count cars Count cars at retailers parking lots over time. Hedge fund uses this data to make predictions on how a retailer is doing.
  • 14. !14 Image: DigitalGlobe via Satellite Today Observe sponsors logos on-screen and quantify their visual impact GumGum sports sponsorship measurement
  • 15. Everything you are doing uses computer vision. !15 You just havent thought about it in any of these ways. One of the reasons that computer vision is so useful is because it is pervasive.
  • 16. What is computer vision? Computer vision is being able to interpret the physical world through cameras and sensors. !16
  • 17. !17 Image: Freepik In the past, we had to manually input instructions to tell machines to do one action and then another.
  • 18. !18 $$$ $$$ ### ### Image: Freepik Then, computers gained the ability to infer the world through data such as Tweets, posts and purchases. But they only understood numbers and words.
  • 19. $$$ ### ### !19 Image: Freepik Now, computers can understand what they are seeing and sensing. They take inputs that are images and videos, which allows them to directly observe the physical world. $$$ So we can have many cameras that humans dont need to look at.
  • 20. !20 Image: Disney (Big Hero 6) Using cameras as their eyes, computers will become more knowledgeable and aware of the physical world and be able to directly interact with us. Baymax must have great computer vision hookups.
  • 21. Sowhy is computer vision interesting now? !21
  • 22. !22 Image: Pexels Because the internet is full of cat pictures. And why not? To answer that question, were going to talk about cats.
  • 23. !23 Cameras and sensors Connectivity Data Intelligence Computing power 1 2 3 4 5 Image: Pexels There are five reasons that computer vision is exploding now.
  • 24. !24 Images: Ring, Mobileye, DJI, Drishti Robotics Cameras and sensors1 Cameras are being deployed everywhere, because they are better, smaller, and cheaper. There are more and more inputs of the physical world to computers. At home On the factory floor On drones In your car
  • 25. !25 Image: The Verge Cameras and sensors1 We have also hugely benefitted from the rise of smartphones over the past decade, which put a camera in everyones pocket. The massive scale also made cameras a lot cheaper.
  • 26. !26 Connectivity2 Image: Freepik These cameras are connected. They take photos at the edge and then upload to the cloud to aggregate data. This helps them to constantly improve how they understand the world.
  • 27. !27 Connectivity2 Image: inVia Robotics inVia Robotics makes robots for e-commerce fulfillment warehouses. The robots navigate autonomously by reading QR codes along the shelves and on the bins. They function as a swarm by batching orders and optimizing routes together. Cameras
  • 28. !28 Data3 300+ hours of video uploaded every minute 50+ million photos posted every day 300+ million photos posted every day Sources: Merchdope, Statistic Brain Research Institute, Gizmodo, Automated Insights There is more and more visual data being shared over the web, powered by cameras in everyones pocket and the many connected and distributed cameras.
  • 29. !29 Data3 Image: Google Driverless Car Project We have also benefitted from the autonomous car movement, through which we have gathered a huge amount of data about road conditions, cars, pedestrians, and road signs.
  • 30. !30 Image: Pexels But just having many pictures of cats wont necessarily tell a computer whats a cat. You look at this photo for a fraction of a second and you know its a cat. But how does a computer know?
  • 31. !31 Image: Pexels How does it know that this is a cat in front of a door not a tiger and a sunset? Same colors, same stripes!
  • 32. !32 Image: Pexels How does it know that this line delineates the chest of this cat against a door, when the image it sees is completely flat?
  • 33. !33 Image: Todd Peterson Plus, the computer has never seen this particular cat in this pose with this background, because you just took this photo.
  • 34. !34 Image: Pexels That is, if youre lucky enough to get the whole cat. You might just get part of a cat. How does a computer know that this is a cat?
  • 35. !35 Understanding images is incredibly hard. We take it for granted because our phone opens when it sees our face. Image: Freepik
  • 36. !36 Image: Pexels (210, 179, 172) R G B (127, 0, 0) (246, 171, 97) To a computer, an image is a collection of pixels, where each one is represented by three numbers.
  • 37. !37 Image: Pexels 12 megapixels 12 million pixels Since each photo on your iPhone X has
  • 38. !38 Image: Pexels 12 megapixels 12 million pixels 36 million numbers So to look at an image, a computer needs to analyze 36 million numbers.
  • 39. !39 Image: Pexels 12 megapixels 12 million pixels 36 million numbers trillions of relationships But its even more complicated. Groups of pixels together gives you the eyes, ears, and whiskers. This means that computers needs to analyze trillions of pixel-to-pixel relationships to look at a single image.
  • 40. !40 Intelligence4 Image: Pexels Understanding what is in an image is staggeringly complex. We have developed algorithms to simplify the computation so that it wouldnt take days to tell you if you took a photo of a cat.
  • 41. !41 Im going to explain one algorithmic innovation to you that is critical to understanding images: Convolution This is the most technical part of this talk, but it is foundational to image recognition. (And I will explain it without math.) Intelligence4
  • 42. !42 Intelligence4 Image: Pexels If you look at an image, youll notice two things: 1) You look at each area separately. The door ledge on the bottom right is not relevant to cat ears on the top.
  • 43. !43 Intelligence4 Image: Pexels 2) If you know what cat ears look like, then you can find all occurrences of cat ears in the entire image.
  • 44. !44 Convolution is a method that allows you to easily and quickly do both of those things to images. Otherwise, you might look at how each pixel is related to every other pixel in the entire image, or decipher each time whether youre looking at cat ears. Using convolution saves you a lot of work. Intelligence4
  • 45. !45 Intelligence4 Image: Pexels Convolution I convolve all the pixels in this small box to arrive at a new number. Now this number has the information from itself and the eight pixels surrounding it, but it doesnt have any information from the pixels that are far away, which makes this an efficient operation. I convolve the pixels and it tells me that Im looking at a line at a certain angle.
  • 46. !46 Intelligence4 Image: Pexels Convolution I convolve over larger and larger areas and I see two lines at these angles with a furry texture in the middle. This tells me that Im looking at the ear of a cat.
  • 47. !47 Intelligence4 Image: Pexels I convolve all over the image and now I see that there are two cat ears, eyes, whiskers, and paws. Now I have pretty good confidence that Im looking at a cat. This is why you often hear the words Convolutional Neural Networks, or CNNs, or ConvNets, when people talk about computer vision.
  • 48. !48 Convolution is just one algorithm. Researchers have developed many other algorithms to reduce the complexity and help computers recognize whats in an image accurately and quickly. Most are esoteric and related to the inner workings of statistics and models. Ill give two more examples that are easier to grasp. Intelligence4
  • 49. !49 Intelligence4 Image: Pexels Mirroring Cropping Rotating Shifting colors Data augmentation is distorting images slightly so that you have more data to train your model.
  • 50. !50 Intelligence4 Image: Pexels Transfer learning is taking a model that is already trained and applying it to your problem. You need to tweak it because they used professional photos whereas you are using your phone camera. This is much easier than starting from scratch.
  • 51. !51 Computing power5 Image: Scio Info Tech CPUs GPUs Used in traditional computing Great at taking a sequential list of instructions and executing them quickly Have multiple cores Have thousands of cores that can operate in parallel and can perform a multitude of identical simple jobs simultaneously Developed for video games in order to render images on the screen efficiently Similarly, image recognition calls for applying convolution quickly across the image Computer vision is better suited for GPUs, which is why the GPU market has gotten a lot of attention of late.
  • 52. !52 Computing power5 Note: Figures include capital leases; AMZN Capex spend represents total consolidated capex across all businesses Source: Company data, Goldman Sachs Global Investment Research Capex spend by public cloud vendor Major cloud providers have been adding GPUs and making serious investments in machine-learning specific offerings. No one publicly discloses how many Nvidia GPUs they are buying, but overall Capex spend is directionally correct.
  • 53. !53 Computing power5 Public cloud market size and share Note: Market size based on Gartner Estimates; Company data based on GS Estimates Source: Company data, Goldman Sachs Global Investment Research Why? The public cloud market is growing rapidly. We always need more computing power to recognize cats more quickly.
  • 54. !54 Image: Pexels Cameras and sensors Connectivity Data Intelligence Computing power 1 2 3 4 5 This is a virtuous cycle that gives better predictions and makes computer vision better and better and more pervasive in our world.
  • 55. So what are cat photos to you? !55 Computer vision is breaking another wall that allows us to interact digitally and physically. It fundamentally changes on how businesses operate in the physical world. What challenges do you have that computer vision can solve? I have a few ideas.
  • 56. !56 Image: Pexels Identity and security: Can your product become benefit from seeing who it is interfacing with? Either a digital product like FaceID or a physical product like a door
  • 57. !57 Image: Pexels E-commerce: Why am I still flipping through online catalogs and imagining clothes on me, but when they arrive, the clothes would invariably fit terribly?
  • 58. !58 Image: Pexels Change detection: So much of our jobs is to keep an eye on something. When youre driving and you want to switch lanes, you need to remember to check your blind spots. Why do we still have blind spots?
  • 59. !59 Image: Pexels The police walk around the city all day to give tickets to illegally parked cars. Why not put cameras around the city and ticket cars automatically?
  • 60. !60 Or when youre in a factory and making lots of electronic widgets, a machine could be out of alignment and start to make defective products. Why not have a camera stare at it and alert you if it sees something different? Image: ADDitude
  • 61. !61 Image: Pexels A lot of diseases start with changes that are imperceptible to the untrained eye, like slight tremors in your fingers might indicate Parkinsons. Why not have a camera as an observer in your home with a direct line to your doctor?
  • 62. !62 Image: Pexels Computer vision is changing the way we interact with the world. Wed be hard pressed to find a business that this would not be relevant for.
  • 63. Upfront portfolio companies using computer vision !63 * *Ring was a former portfolio company until it was sold to Amazon in 2018
  • 64. !64 CHANG XU Principal, Upfront Ventures Find me at: twitter.com/_changxu medium.com/@changxu linkedin.com/in/changx Image: Pexels