The informatic challenges of 2013 and beyond are bigger than any one company. This presentation provides an overview of a number of recent, successful crowd-sourced and community-driven applications that combine Big Data approaches with Community involvement. The speaker dives into the numbers and specific details of Factuals approach to large-scale, multi-authored data collection and aggregation, and how the companys data ethos and business positioning dictates both the shape of its technology and its vision of large-scale, collective data ecosystems.
In the next 10 years, smart machines will augment humans in many tasks like assisting doctors, fighting in battles, manufacturing, and assisting in various professions. While machines will replace humans in some routine tasks, the partnership between humans and machines will build on our respective strengths. Humans have advantages in tasks requiring thinking, creativity, social/emotional skills, and improvisation, while machines are better suited for repetitive, dangerous, large/small-scale, and data-driven tasks. An optimal partnership is emerging where humans and machines collaborate to achieve more than either could alone.
Big Data make us with superpowers: keep it in mind when thinking how to use itSkender Kollcaku
油
We will be producing more data every year than in the previous 100,000 years (faster and faster). Data will be mostly processed by machines, a lot of machines.
Data will generate more data (think about engines, self-driving cars, sensors, wearables, unstructured data). New types of data require new interdisciplinary approach (how do you index
a personal mood or humor?)
This presentation was for Social Media Week Berlin on Tuesday, 24th September. It was targeted at NGOs, NPOs, activist organisations and charities who have important key messages to share with the community. The event will combine elements of a presentation and workshop. We will examine case studies of campaigns that have successfully used data visualisation in tandem with social media and content marketing techniques to spread information and ideas, and to counteract prevailing myths about climate change and renewable energy technology. We will then allow time for participants to split up into small working groups. Structured discussion tasks and group feedback will allow participants to investigate how these strategies can apply to their own organisation or issue. Participants will learn practical steps for identifying important messages, researching and developing content, incorporating data visualisation in a powerful and meaningful way, and promoting their data visualisation campaigns through social media and email outreach. In particular, the event will focus on developing powerful stories that will attract the support of influential sharers and thought leaders from a range of backgrounds, from activism through to industry, so as to maximise the campaign's reach and impact.
Will the machines save us or kill us all? that is the question. While many are thrilled with the latest AI
breakthroughs and dream of a shinning AI-powered world, others, like Bill Gates, Elon Musk, Steve Wozniak and the late
and legendary Stephen Hawking, expressed concerns about the evolution of the machines and warned about an
apocalyptic future.
http://www.altitude.com/
Dedupe, Merge and Purge: the art of normalizationTyler Bell
油
This presentation stresses the importance of entity resolution within a business context and provides real-world examples and pragmatic insight into the process of canonicalization.
The document discusses the nature of data driven service innovation. Some key points made include:
- Data driven service innovation aims to create new services by finding innovative uses of data, but this process is messy and experimental as requirements are poorly defined initially.
- Big data projects resemble research more than production, requiring agility to combine conventional project management with an ability to fail fast and learn from mistakes.
- The complexity of modern ICT systems makes perfect causal understanding impossible. We must acknowledge our ignorance and use a probe-sense-analyze-act approach.
- Developing services is challenging as the customer experience is co-created and hard to define formally. Operational staff closest to customers provide important insights.
Big Data and the Future of Journalism (Futurist Keynote Speaker Gerd Leonhard...Gerd Leonhard
油
This is a slightly edited version of my slides presented in London on June 7, 2013 and the Reuters Institute see https://reutersinstitute.politics.ox.ac.uk/research/conferences/forthcoming-conferences/big-data-big-ideas-for-media.html
BTW: You can download ALL of my slideshows, free books and other stuff at http://futuristgerd.com/downloads/
"Data stockpiles are growing exponentially...consumer profiles, media content usage patterns, Twitter and Facebook posts, online purchases, public records, real-time media user behavior and much more. The Big Ideas conference speakers will inspire tactics and strategies to harness these data.
The media industry's leading edge experts from journalism and business disciplines will detail their own case studies, outlining their challenges and triumphs using tools to understand complex data sets. They will outline how these experiences have paved the way to prize-winning journalism, audience insights and growing revenues..."
Host(s):
Amanda Miller (https://www.linkedin.com/in/amanda-c-miller-a2b9808/)
Brandy Farlow (https://www.linkedin.com/in/brandy-farlow-4520057b/)
Also, thanks to: Steve Fiore (https://www.linkedin.com/in/stephen-fiore-8087305/)
Host Organization: RENCI ACTS - https://renci.org/team-science/
20240919
Title: The role of AI as a team member in scientific research; AI Teammate: Need for Episodic Memory and GTD (Generate-Test-Debug) Architectures
Speaker: Jim Spohrer
Abstract: After reviewing some of the history of artificial intelligence, and the challenges of keeping up with accelerating change, we will explore possible future roles for AI as a team member in scientific research. As the marginal cost of computing gets closer to zero, fixing the so-called "hallucination" problem will likely require adding an episodic memory and GTD (Generate-Test-Debug) architecture to existing AI systems. Fixing the "energy consumption" problem for AI tools will also be a major challenge. However, even with these largely technical challenges solved, who owns and controls the evolution of the AI tools used for team science? Who owns and controls the training data and development processes used to create the tools? Would you prefer using a vendor tool, a tool provided by your company or university, a tool you created, a digital twin of you, or a nation-state owned AI tool? Or will you be using all of these types of AI tools and more? Learning to invest wisely in these changes and other changes (e.g., UN Sustainable Development Goals) will require significant advances in the science of team science. It will also require advances in adjacent disciplines, including game theory, economics, and emerging transdisciplines such as service science that depend on better models of the world, ourselves and each other, and our organizations and tools to achieve trust and win-win outcomes.
Takeaways:
- A range of technical and social challenges must be addressed as AI fills the role of team member in scientific research
- Episodic memory and GTD architectures are an approach to the "hallucination" problem
- Ultimately, our AI digital twins of ourselves will evolve from tool to assistant to collaborator to coach to mediator - Learning to invest wisely in change will require transdisciplinary advances.
Very brief bio (72 words): Jim Spohrer is a retired industry executive (IBM, Apple) based in the Bay Area California. He serves on the Board of Directors of the non-profit International Society of Service Innovation Professionals (ISSIP) and ServCollab ("Serving Humanity Through Collaboration), and also a UIDP (University-Industry Demonstration Program) Senior Fellow. He has over 90 publications and 9 patents. He has a PhD from Yale in Computer Science/Artificial Intelligence and a BS in Physics from MIT.
Machine Learning: Understanding the Invisible Force Changing Our WorldKen Tabor
油
This document discusses the rise of machine learning and artificial intelligence. It provides quotes from industry leaders about the potential for AI to improve lives and build a better society. The text then explains what machine learning is, how it works through supervised, unsupervised and reinforcement learning, and some of the business applications of AI like product recommendations, fraud detection and machine translation. It also discusses the increasing investment in and priority placed on AI by companies, governments and researchers. The document encourages readers to consider the ethical implications of AI and ensure it is developed and applied in a way that benefits all of humanity.
This document discusses Microsoft's efforts in artificial intelligence and machine learning. It provides context on the current state of AI, highlighting how machine learning has progressed from addressing specific tasks to becoming more general. It outlines Microsoft's investments in AI, including forming a new 5,000-person division and making AI pervasive across its products. The document also discusses challenges around developing machine learning programs and ensuring AI is developed in a responsible, trustworthy manner.
Disrupting technologies like Data Science and Knowledge Automation are projected to have an economic impact of trillions of dollars in the next decade.
This presentation was given at the Dallas Tableau User Group on Oct 29, 2103 and
Origins of the Marketing Intelligence Engine (SXSW 2015)PR 20/20
油
Marketing automation platforms save time, improve efficiency and increase productivity. They give companies an unprecedented ability to understand buyers, identify opportunities, track campaign performance and link marketing activities to business outcomes.
But, they do not provide insight into the billions of bits of data being created as consumers move from screen to screen and interact online and offline with brands, and they do not recommend actions to improve performance.
Humans are limited by their biases, beliefs, education, experiences, knowledge and brainpower. All of these things contribute to our finite ability to process information, build strategies and achieve performance potential.
Algorithms, in contrast, have an almost infinite ability to process information. They possess the power to understand natural language queries, identify patterns and anomalies, and parse massive data sets to deliver recommendations better, faster and cheaper than people can.
What inevitably comes next are marketing intelligence engines that process data and recommend actions to improve performance based on probabilities of success.
Strata Conference NYC 2013 Full VersionTaewook Eom
油
The document provides an overview of topics discussed at the Strata Conference in 2013, including keynotes, sessions, and speakers. It discusses big data technologies like Hadoop, NoSQL, data science, and real-time stream processing. Some highlights include discussions on defining data science roles, predicting human vs machine performance, organizing data-centric companies, and the future of Hadoop.
This document discusses the evolution of big data analytics from descriptive to predictive and prescriptive capabilities. It notes that while descriptive analytics has seen the most success, predictive and prescriptive analytics using machine learning techniques open up new opportunities to create value from large and diverse data sources. However, big data analytics also faces challenges in data preparation, integration, and making the insights intuitive for users. While artificial narrow intelligence has achieved successes in games like chess and Go, general artificial intelligence that matches human-level thinking across domains has not yet been achieved.
Data Science in the Real World: Making a Difference Srinath Perera
油
We use the terms Big Data and Data Science for use of data processing to make sense of the world around us. Spanning many fields, Big Data brings together technologies like Distributed Systems, Machine Learning, Statistics, and Internet of Things together. It is a multi-billion-dollar industry including use cases like targeted advertising, fraud detection, product recommendations, and market surveys. With new technologies like Internet of Things (IoT), these use cases are expanding to scenarios like Smart Cities, Smart health, and Smart Agriculture.
These usecases use basic analytics, advanced statistical methods, and predictive technologies like Machine Learning. However, it is not just about crunching the data. Some usecases like Urban Planning can be slow, and there is enough time to process the data. However, with use cases like traffic, patient monitoring, surveillance the the value of results degrades much faster with time and needs results within milliseconds to seconds. Collecting data from many sources, cleaning them up, processing them using computation clusters, and doing all these fast is a major challenge.
This talk will discuss motivation behind big data and data science and how it can make a difference. Then it will discuss the challenges, systems, and methodologies for implementing and sustaining a data science pipeline.
Big Data v. Small data - Rules to thumb for 2015Visart
油
Open data, big data, small data - what's the difference? Do you work with data? Small and medium sized businesses are pressured to transform traditional practices into data-driven models. In this presentation, CEO, Ugur Kadakal explains the big data v. small data and the insights we can pull from each for better business intelligence.
Do you work with data, or just like learning about it? Check out our blog on www.Visart.io for data stories and other resources.
Big data and predictive analytics will transform accounting work and require accountants to develop new skills. By 2018, there will be a shortage of 30,000 data-savvy managers in Australia who can make effective decisions based on big data analysis. Accountants will need to shift from reactive to proactive roles by leveraging accounting data and predictive tools to find patterns, gain insights, and predict client scenarios in order to maximize opportunities and minimize risks for their clients. The "predictive accountant" who adopts these new data-focused skills will be well-positioned for the future of the profession.
In this deck from the HPC User Forum in Tucson, Steve Conway from Hyperion Research presents: The Need for Deep Learning Transparency.
"We humans dont fully understand how humans think. When it comes to deep learning, humans also dont understand yet how computers think. Thats a big problem when were entrusting our lives to self-driving vehicles or to computers that diagnose serious diseases, or to computers installed to protect national security. We need to find a way to make these black box computers transparent."
"We help IT professionals, business executives, and the investment community make fact-based decisions on technology purchases and business strategy. Our industry experts are the former IDC high performance computing (HPC) analyst team, which remains intact and continues all of its global activities. The group is comprised of the worlds most respected HPC industry analysts who have worked together for more than 25 years."
Watch the video: https://wp.me/p3RLHQ-it7
Learn more: http://hyperionresearch.com/
and
http://hpcuserforum.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Jim from IBM discusses the future of AI. He notes that while AI is currently hyped, pattern recognition using deep learning only works because of the large amounts of data and computing power now available. True AI requiring commonsense reasoning is still 5-10 years away. He outlines a timeline for solving different AI problems and notes IBM's $240 million partnership with MIT to advance AI. The benefits of AI include access to expertise and improved productivity, but risks include job loss and potential issues with superintelligence. Other technologies like augmented reality may have a larger impact. Stakeholders in AI include individuals, organizations, governments, and industries. [END SUMMARY]
Rao Mikkilineni discusses the emergence of cognitive computing models and a new cognitive infrastructure. He argues that increasing data volumes and the need for real-time insights are driving the need for intelligent, sentient, and resilient systems. The new cognitive infrastructure will include a cognitive and infrastructure agnostic control overlay, composable services, and cognitive deep learning integration. It will enable a post-hypervisor cognitive computing era with intelligent, distributed systems.
The document discusses emerging technologies and their potential impacts. It provides timelines of predicted technological advances from 2020 to 2036 including flying cars, human missions to Mars, widespread use of robots, cures for cancer and poverty, longevity treatments, and more. It examines current and soon-to-be present developments in areas like biotech, neurotech, green tech, and digital tech. It also discusses technologies like nanotechnology, quantum computing, augmented and virtual reality, robotics, digital ecosystems, advanced AI, and more. The document emphasizes experimenting with emerging technologies through "sandboxes" and considering both the promises and risks of technological change.
Designing AI for Humanity at dmi:Design Leadership Conference in BostonCarol Smith
油
As design leaders we must enable our teams with skills and knowledge to take on the new and exciting opportunities that building powerful AI systems bring. Dynamic systems require transparency regarding data provenance, bias, training methods, and more, to gain users trust. Carol will cover these topics and challenge us as design leaders, to represent our fellow humans by provoking conversations regarding critical ethical and safety needs.
Presented at dmi:Design Leadership Conference in Boston in October 2018.
This is a power point presentation on Hadoop and Big Data. This covers the essential knowledge one should have when stepping into the world of Big Data.
This course is available on hadoop-skills.com for free!
This course builds a basic fundamental understanding of Big Data problems and Hadoop as a solution. This course takes you through:
This course builds Understanding of Big Data problems with easy to understand examples and illustrations.
History and advent of Hadoop right from when Hadoop wasnt even named Hadoop and was called Nutch
What is Hadoop Magic which makes it so unique and powerful.
Understanding the difference between Data science and data engineering, which is one of the big confusions in selecting a carrier or understanding a job role.
And most importantly, demystifying Hadoop vendors like Cloudera, MapR and Hortonworks by understanding about them.
This course is available for free on hadoop-skills.com
The document discusses the nature of data driven service innovation. Some key points made include:
- Data driven service innovation aims to create new services by finding innovative uses of data, but this process is messy and experimental as requirements are poorly defined initially.
- Big data projects resemble research more than production, requiring agility to combine conventional project management with an ability to fail fast and learn from mistakes.
- The complexity of modern ICT systems makes perfect causal understanding impossible. We must acknowledge our ignorance and use a probe-sense-analyze-act approach.
- Developing services is challenging as the customer experience is co-created and hard to define formally. Operational staff closest to customers provide important insights.
Big Data and the Future of Journalism (Futurist Keynote Speaker Gerd Leonhard...Gerd Leonhard
油
This is a slightly edited version of my slides presented in London on June 7, 2013 and the Reuters Institute see https://reutersinstitute.politics.ox.ac.uk/research/conferences/forthcoming-conferences/big-data-big-ideas-for-media.html
BTW: You can download ALL of my slideshows, free books and other stuff at http://futuristgerd.com/downloads/
"Data stockpiles are growing exponentially...consumer profiles, media content usage patterns, Twitter and Facebook posts, online purchases, public records, real-time media user behavior and much more. The Big Ideas conference speakers will inspire tactics and strategies to harness these data.
The media industry's leading edge experts from journalism and business disciplines will detail their own case studies, outlining their challenges and triumphs using tools to understand complex data sets. They will outline how these experiences have paved the way to prize-winning journalism, audience insights and growing revenues..."
Host(s):
Amanda Miller (https://www.linkedin.com/in/amanda-c-miller-a2b9808/)
Brandy Farlow (https://www.linkedin.com/in/brandy-farlow-4520057b/)
Also, thanks to: Steve Fiore (https://www.linkedin.com/in/stephen-fiore-8087305/)
Host Organization: RENCI ACTS - https://renci.org/team-science/
20240919
Title: The role of AI as a team member in scientific research; AI Teammate: Need for Episodic Memory and GTD (Generate-Test-Debug) Architectures
Speaker: Jim Spohrer
Abstract: After reviewing some of the history of artificial intelligence, and the challenges of keeping up with accelerating change, we will explore possible future roles for AI as a team member in scientific research. As the marginal cost of computing gets closer to zero, fixing the so-called "hallucination" problem will likely require adding an episodic memory and GTD (Generate-Test-Debug) architecture to existing AI systems. Fixing the "energy consumption" problem for AI tools will also be a major challenge. However, even with these largely technical challenges solved, who owns and controls the evolution of the AI tools used for team science? Who owns and controls the training data and development processes used to create the tools? Would you prefer using a vendor tool, a tool provided by your company or university, a tool you created, a digital twin of you, or a nation-state owned AI tool? Or will you be using all of these types of AI tools and more? Learning to invest wisely in these changes and other changes (e.g., UN Sustainable Development Goals) will require significant advances in the science of team science. It will also require advances in adjacent disciplines, including game theory, economics, and emerging transdisciplines such as service science that depend on better models of the world, ourselves and each other, and our organizations and tools to achieve trust and win-win outcomes.
Takeaways:
- A range of technical and social challenges must be addressed as AI fills the role of team member in scientific research
- Episodic memory and GTD architectures are an approach to the "hallucination" problem
- Ultimately, our AI digital twins of ourselves will evolve from tool to assistant to collaborator to coach to mediator - Learning to invest wisely in change will require transdisciplinary advances.
Very brief bio (72 words): Jim Spohrer is a retired industry executive (IBM, Apple) based in the Bay Area California. He serves on the Board of Directors of the non-profit International Society of Service Innovation Professionals (ISSIP) and ServCollab ("Serving Humanity Through Collaboration), and also a UIDP (University-Industry Demonstration Program) Senior Fellow. He has over 90 publications and 9 patents. He has a PhD from Yale in Computer Science/Artificial Intelligence and a BS in Physics from MIT.
Machine Learning: Understanding the Invisible Force Changing Our WorldKen Tabor
油
This document discusses the rise of machine learning and artificial intelligence. It provides quotes from industry leaders about the potential for AI to improve lives and build a better society. The text then explains what machine learning is, how it works through supervised, unsupervised and reinforcement learning, and some of the business applications of AI like product recommendations, fraud detection and machine translation. It also discusses the increasing investment in and priority placed on AI by companies, governments and researchers. The document encourages readers to consider the ethical implications of AI and ensure it is developed and applied in a way that benefits all of humanity.
This document discusses Microsoft's efforts in artificial intelligence and machine learning. It provides context on the current state of AI, highlighting how machine learning has progressed from addressing specific tasks to becoming more general. It outlines Microsoft's investments in AI, including forming a new 5,000-person division and making AI pervasive across its products. The document also discusses challenges around developing machine learning programs and ensuring AI is developed in a responsible, trustworthy manner.
Disrupting technologies like Data Science and Knowledge Automation are projected to have an economic impact of trillions of dollars in the next decade.
This presentation was given at the Dallas Tableau User Group on Oct 29, 2103 and
Origins of the Marketing Intelligence Engine (SXSW 2015)PR 20/20
油
Marketing automation platforms save time, improve efficiency and increase productivity. They give companies an unprecedented ability to understand buyers, identify opportunities, track campaign performance and link marketing activities to business outcomes.
But, they do not provide insight into the billions of bits of data being created as consumers move from screen to screen and interact online and offline with brands, and they do not recommend actions to improve performance.
Humans are limited by their biases, beliefs, education, experiences, knowledge and brainpower. All of these things contribute to our finite ability to process information, build strategies and achieve performance potential.
Algorithms, in contrast, have an almost infinite ability to process information. They possess the power to understand natural language queries, identify patterns and anomalies, and parse massive data sets to deliver recommendations better, faster and cheaper than people can.
What inevitably comes next are marketing intelligence engines that process data and recommend actions to improve performance based on probabilities of success.
Strata Conference NYC 2013 Full VersionTaewook Eom
油
The document provides an overview of topics discussed at the Strata Conference in 2013, including keynotes, sessions, and speakers. It discusses big data technologies like Hadoop, NoSQL, data science, and real-time stream processing. Some highlights include discussions on defining data science roles, predicting human vs machine performance, organizing data-centric companies, and the future of Hadoop.
This document discusses the evolution of big data analytics from descriptive to predictive and prescriptive capabilities. It notes that while descriptive analytics has seen the most success, predictive and prescriptive analytics using machine learning techniques open up new opportunities to create value from large and diverse data sources. However, big data analytics also faces challenges in data preparation, integration, and making the insights intuitive for users. While artificial narrow intelligence has achieved successes in games like chess and Go, general artificial intelligence that matches human-level thinking across domains has not yet been achieved.
Data Science in the Real World: Making a Difference Srinath Perera
油
We use the terms Big Data and Data Science for use of data processing to make sense of the world around us. Spanning many fields, Big Data brings together technologies like Distributed Systems, Machine Learning, Statistics, and Internet of Things together. It is a multi-billion-dollar industry including use cases like targeted advertising, fraud detection, product recommendations, and market surveys. With new technologies like Internet of Things (IoT), these use cases are expanding to scenarios like Smart Cities, Smart health, and Smart Agriculture.
These usecases use basic analytics, advanced statistical methods, and predictive technologies like Machine Learning. However, it is not just about crunching the data. Some usecases like Urban Planning can be slow, and there is enough time to process the data. However, with use cases like traffic, patient monitoring, surveillance the the value of results degrades much faster with time and needs results within milliseconds to seconds. Collecting data from many sources, cleaning them up, processing them using computation clusters, and doing all these fast is a major challenge.
This talk will discuss motivation behind big data and data science and how it can make a difference. Then it will discuss the challenges, systems, and methodologies for implementing and sustaining a data science pipeline.
Big Data v. Small data - Rules to thumb for 2015Visart
油
Open data, big data, small data - what's the difference? Do you work with data? Small and medium sized businesses are pressured to transform traditional practices into data-driven models. In this presentation, CEO, Ugur Kadakal explains the big data v. small data and the insights we can pull from each for better business intelligence.
Do you work with data, or just like learning about it? Check out our blog on www.Visart.io for data stories and other resources.
Big data and predictive analytics will transform accounting work and require accountants to develop new skills. By 2018, there will be a shortage of 30,000 data-savvy managers in Australia who can make effective decisions based on big data analysis. Accountants will need to shift from reactive to proactive roles by leveraging accounting data and predictive tools to find patterns, gain insights, and predict client scenarios in order to maximize opportunities and minimize risks for their clients. The "predictive accountant" who adopts these new data-focused skills will be well-positioned for the future of the profession.
In this deck from the HPC User Forum in Tucson, Steve Conway from Hyperion Research presents: The Need for Deep Learning Transparency.
"We humans dont fully understand how humans think. When it comes to deep learning, humans also dont understand yet how computers think. Thats a big problem when were entrusting our lives to self-driving vehicles or to computers that diagnose serious diseases, or to computers installed to protect national security. We need to find a way to make these black box computers transparent."
"We help IT professionals, business executives, and the investment community make fact-based decisions on technology purchases and business strategy. Our industry experts are the former IDC high performance computing (HPC) analyst team, which remains intact and continues all of its global activities. The group is comprised of the worlds most respected HPC industry analysts who have worked together for more than 25 years."
Watch the video: https://wp.me/p3RLHQ-it7
Learn more: http://hyperionresearch.com/
and
http://hpcuserforum.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Jim from IBM discusses the future of AI. He notes that while AI is currently hyped, pattern recognition using deep learning only works because of the large amounts of data and computing power now available. True AI requiring commonsense reasoning is still 5-10 years away. He outlines a timeline for solving different AI problems and notes IBM's $240 million partnership with MIT to advance AI. The benefits of AI include access to expertise and improved productivity, but risks include job loss and potential issues with superintelligence. Other technologies like augmented reality may have a larger impact. Stakeholders in AI include individuals, organizations, governments, and industries. [END SUMMARY]
Rao Mikkilineni discusses the emergence of cognitive computing models and a new cognitive infrastructure. He argues that increasing data volumes and the need for real-time insights are driving the need for intelligent, sentient, and resilient systems. The new cognitive infrastructure will include a cognitive and infrastructure agnostic control overlay, composable services, and cognitive deep learning integration. It will enable a post-hypervisor cognitive computing era with intelligent, distributed systems.
The document discusses emerging technologies and their potential impacts. It provides timelines of predicted technological advances from 2020 to 2036 including flying cars, human missions to Mars, widespread use of robots, cures for cancer and poverty, longevity treatments, and more. It examines current and soon-to-be present developments in areas like biotech, neurotech, green tech, and digital tech. It also discusses technologies like nanotechnology, quantum computing, augmented and virtual reality, robotics, digital ecosystems, advanced AI, and more. The document emphasizes experimenting with emerging technologies through "sandboxes" and considering both the promises and risks of technological change.
Designing AI for Humanity at dmi:Design Leadership Conference in BostonCarol Smith
油
As design leaders we must enable our teams with skills and knowledge to take on the new and exciting opportunities that building powerful AI systems bring. Dynamic systems require transparency regarding data provenance, bias, training methods, and more, to gain users trust. Carol will cover these topics and challenge us as design leaders, to represent our fellow humans by provoking conversations regarding critical ethical and safety needs.
Presented at dmi:Design Leadership Conference in Boston in October 2018.
This is a power point presentation on Hadoop and Big Data. This covers the essential knowledge one should have when stepping into the world of Big Data.
This course is available on hadoop-skills.com for free!
This course builds a basic fundamental understanding of Big Data problems and Hadoop as a solution. This course takes you through:
This course builds Understanding of Big Data problems with easy to understand examples and illustrations.
History and advent of Hadoop right from when Hadoop wasnt even named Hadoop and was called Nutch
What is Hadoop Magic which makes it so unique and powerful.
Understanding the difference between Data science and data engineering, which is one of the big confusions in selecting a carrier or understanding a job role.
And most importantly, demystifying Hadoop vendors like Cloudera, MapR and Hortonworks by understanding about them.
This course is available for free on hadoop-skills.com
Bigger than Any One: Solving Large Scale Data Problems with People and Machines
1. Bigger Than Any One
Solving Large Scale Data Problems with People and Machines
Tyler Bell: @twbell
2. "You just want to survive long
enough to have a long run, to
prove that your product has
value. And in the short term,
human solutions require
much less work. Worry about
scaling when you need to"
Get 'Data Jujitsu' free at: http://oreilly.
com/data/radarreports/data-jujitsu.csp
10. A Good Body of Literature Exists:
CrowdDB "harnessing Human Computation" to
address the Closed World Assumptions of SQL
Crowdsourced Databases: "Query Processing with
People"
Deco: Declarative Crowdsourcing "Human
Computation"
Mob Data Sourcing: a good state-of-the-art survey
14. "Weve built a real-time human
computation engine to help us
identify search queries as soon as
they're trending, send these
queries to real humans to be
judged, and then incorporate the
human annotations into our back-
end models."
"Clockwork Raven steps in to do
"we use a small custom pool of what algorithms cannot: it sends
Mechanical Turk judges to ensure your data analysis tasks to real
high quality." people and gets fast, cheap and
accurate results."
Edwin Chen & Alpa Jai, Twitter Blog, 8 Jan 2013: http://bit.ly/SixU2q
Clockwork Raven: http://bit.ly/RkhxkW
15. Gold Data Search for Candidate Matches Resolve Candidates
Compare Attributes
Likely Matches Unsure Likely Mismatches
Send to MTurk for Evaluation
Factual:
Likely Matches Likely Mismatches
MT for QA
Compute Accuracy Score
Accuracy
Score
16. Factual Moderation Queue
~60% automatically accepted or rejected based on preset
rules/filters
~30% can be parsed, processed by attribute, and validated by
MTurkers
~10% handled by in-house moderators or 'Trusted Turkers'
18. The Data Science Debate 2012
Peter Skomoroch (LinkedIn), Michael Driscoll (Metamarkets), DJ Patil (Greylock Partners),Toby Segaran (Google), Pete
Warden (Jetpac), Amy Heineike (Quid)
"In data science, domain expertise is more important than
machine learning skill"
2012 Strata Debate: http://bit.ly/A78ASK
Mike Loukides, 'The Unreasonable Necessity of Subject Experts' : http://oreil.ly/GAJHD9
19. "We've discovered
that creative-data
scientists can solve
problems in every
field better than
experts in those
fields can."
"The expertise you
need is strategy
expertise in
answering these
questions."
Interview with Jeremy Howard, Chief Scientist, Kaggle: http://slate.me/UM0wQ7
20. "Data struggles with context.
Human decisions are not discrete
events. They are embedded in
sequences and contexts. The
human brain has evolved to
account for this reality [...] Data
analysis is pretty bad at narrative
and emergent thinking, and it
cannot match the explanatory
suppleness of even a mediocre
novel."
"Data inherently has all of the
foibles of being human, [it] is not a
magic force in society; its an
extension of us."
NYT Article 24 Feb 2013: http://nyti.ms/15cMV80
David Brooks NYT Op-Ed 18 Feb 2013: http://nyti.ms/12EQmpQ
Google Flu Trends: http://www.google.org/flutrends/us/#US
21. "People from different places
and different backgrounds tend
to produce different sorts of
information. And so we risk
ignoring a lot of important
nuance if relying on big data as
a social/economic/political
mirror."
"Big data is undoubtedly useful for addressing and overcoming many
important issues face by society. But we need to ensure that we
aren't seduced by the promises of big data to render theory
unnecessary."
Mark Graham, 'Big Data and the End of Theory', The Guardian, 9 Mar 2012: http://bit.ly/yq87yW
Chris Anderson, 'The End of Theory', Wired (2008): http://bit.ly/LOU8
22. "Without
[cleansing and
harmonisation ]
theyre not data
markets; theyre
data jumble
sales."
Edd Dumbill, 'Data Markets Compared', 7 March 2012: http://bit.ly/Ogsh30
InfoChimps: http://bit.ly/yIzY8x
Paul Miller, 'Discussing Data Markets in New York City', 13 March 2013: http://bit.ly/TFjNnj
24. "Its just dumb that a
"Humans dont tend to 100mil+ people carry GPS
think in floating point device in their pockets and
pairs." we have to buy expensive
proprietary data to find out
about the shape of where
we live."
"It is equal parts sad-
making and hate-making
that we're all still stuck "Seriously, if youre
suffering the lack of a presenting people
comprehensive and open geographic data, ask them
dataset for places." sometime if youre getting
it right."
Aaron Cope: http://www.aaronland.info/weblog/2013/02/03/reality/#youarehere
Kellan Elliott-McCrea: http://laughingmeme.org/2013/02/03/are-you-here-a-feature-on-the-side/
26. "So far InfoArmy has paid
$146K to researchers
(averaging $23/report), and
some researchers have made
over $5,000 individually. We
are very happy that many hard Single report owners
working researchers have High prices
made money. However, Low quality
without a clear path to revenue Unpopularity of updates
this model is unsustainable." Report Abandonment
Sold 44 reports total
Info Army: https://www.infoarmy.com/
Tech Crunch report: http://tcrn.ch/VsXnCv
27. Apple Maps'
Report a Problem
Image: http://www.macrumors.com/2012/09/24/how-to-report-a-problem-with-ios-6-maps-data/
28. 40 million users
65,000 Waze users made 500 million edits.
Users fixed 70% of system-detected map problems and 100% of all user-
reported map problems.
Almost all user-reported map problems were taken care of within a week.
Users added more than 50,000 gas stations on the map in the first month.
Users spend 7 hours a month on the platform.
'Waze Now Lets Users Instantly Map Closed Roads, Acts Like Google Maps On Steroids': http://bit.ly/15iHyEo
29. Factual
provides both 'Flag' and 'Write' interfaces
(this is good)
Factual Write API: http://developer.factual.com/display/docs/Core+API+-+Writes
Flag API: http://developer.factual.com/display/docs/Core+API+-+Flag
30. With No Means to Follow-Up
(this is bad. But we're on it)
33. Play ball with the competitors: "Think instead about collaborating with all
the other players and getting the network effects of a larger audience and a
bigger pool of offers. [...] The individual companies behind the consortium
still compete to acquire subscribers, but they now collaborate with a
common platform for monetizing those subscribers through marketing."
Alistair Goodman, CEO Placecast, All Things D, 7 Feb 2012 http://bit.ly/YA8Nq4
34. "Seeds [...] are one of the original information storage devices, its almost
hard to understand why libraries havent always included seeds"
Long Now Foundation: 'Seeds are the new books": http://blog.
longnow.org/02013/02/26/seeds-are-the-new-books/
35. Data In game theory and economic theory, a zerosum game is a
mathematical representation of a situation in which a participant's
gain (or loss) of utility is exactly balanced by the losses (or gains)
of the utility of the other participant(s). If the total gains of the
participants are added up, and the total losses are subtracted, they
will sum to zero. Thus cutting a cake, where taking a larger piece
reduces the amount of cake available for others, is a zerosum
game if all participants value each unit of cake equally (see
marginal utility). In contrast, non-zerosum describes a situation in
which the interacting parties' aggregate gains and losses are either
less than or more than zero. A zerosum game is also called a
strictly competitive game while non-zerosum games can be either
competitive or non-competitive. Zerosum games are most often
solved with the minimax theorem which is closely related to linear
programming duality or with Nash equilibrium.
http://en.wikipedia.org/wiki/Zero_sum