Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
油
This document provides an overview of data mining concepts from Chapter 1 of the textbook "Data Mining: Concepts and Techniques". It discusses the motivation for data mining due to increasing data collection, defines data mining as the extraction of useful patterns from large datasets, and outlines some common applications like market analysis, risk management, and fraud detection. It also introduces the key steps in a typical data mining process including data selection, cleaning, mining, and evaluation.
The document introduces data mining and knowledge discovery in databases. It discusses why data mining is needed due to large datasets that cannot be analyzed manually. It also covers the data mining process, common data mining techniques like association rules and decision trees, applications of data mining in various domains, and some popular data mining tools.
This talk will cover various medical applications of deep learning including tumor segmentation in histology slides, MRI, CT, and X-Ray data. Also, more complicated tasks such as cell counting where the challenge is to count how many objects are in an image. It will also cover generative adversarial networks and how they can be used for medical applications. This presentation is accessible to non-doctors and non-computer scientists.
Adaptive Machine Learning for Credit Card Fraud DetectionAndrea Dal Pozzolo
油
This document discusses machine learning techniques for credit card fraud detection. It addresses challenges like concept drift, imbalanced data, and limited supervised data. The author proposes contributions in learning from imbalanced and evolving data streams, a prototype fraud detection system using all supervised information, and a software package/dataset. Methods discussed include resampling techniques, concept drift handling, and a "racing" algorithm to efficiently select the best strategy for unbalanced classification on a given dataset. Evaluation measures the ability to accurately rank transactions by fraud risk.
This document provides an overview of data mining. It defines data mining as the process of analyzing data from different perspectives to extract useful information. The document then explains that data mining is used by companies to analyze large amounts of data and discover relationships that can help increase revenue or cut costs. It provides examples of how data mining has been used in various industries and lists the basic steps and types of data mining.
With the increasingly connected world revolving around the revolution of internet and new technologies like mobiles, smartphones, and tablets, and with the wide usage of wireless technologies, the information security risks have increased. Both individuals and organizations are under regular attacks for commercial or non-commercial gains. The objectives of such attacks may be to take revenge, malign the reputation of a competitor organization, understand the strategies and sensitive information about the competitor, simply have fun of exploiting the vulnerabilities. Hence, the need to protect information assets and ensure information security receives adequate attention.
In this session, I will discuss how AI and Machine Learning can be applied in detecting, predicting and preventing cyber security/information security vulnerabilities and what are the benefits of using Machine Learning and AI. We also touch upon some of the tools available to perform the same.
Social media analytics powered by data scienceNavin Manaswi
油
The document discusses social media analytics using big data and data science. It begins by defining social media and its importance for businesses, as well as big data analytics. It then explains how data science can be leveraged in social media analytics to gain powerful insights through techniques like sentiment analysis, social network analysis, and identifying top influencers. Specific use cases are presented for industries like finance and opportunities discussed for applying these techniques globally. Examples are provided of analyzing social media data for companies like banks and Legoland park.
This document provides an overview of a course on data warehousing, filtering, and mining. The course is being taught in Fall 2004 at Temple University. The document includes the course syllabus which outlines topics like data warehousing, OLAP technology, data preprocessing, mining association rules, classification, cluster analysis, and mining complex data types. Grading will be based on assignments, quizzes, a presentation, individual project, and final exam. The document also provides introductory material on data mining including definitions and examples.
Fake News Detection using Passive Aggressive and Na誰ve BayesIRJET Journal
油
This document describes research on detecting fake news using machine learning algorithms. The researchers collected a dataset of labeled real and fake news articles and extracted features using bag-of-words and TF-IDF. They then trained two classification models, Naive Bayes and Passive-Aggressive, on the dataset. The Passive-Aggressive model achieved a higher accuracy of 93% compared to 89% for the Naive Bayes model. The researchers concluded the Passive-Aggressive algorithm is better able to differentiate between real and fake news, but note that further improving datasets and incorporating other media could enhance fake news detection.
This document discusses data mining and its applications. It defines data mining as using algorithms to discover patterns in large data sets beyond simple analysis. It then provides examples of data mining applications, including market basket analysis, education, manufacturing, customer relationship management, fraud detection, research analysis, criminal investigation, and bioinformatics. The document also outlines the typical stages of the data mining process: data understanding, data preparation, modeling, evaluation, and deployment.
This lecture gives various definitions of Data Mining. It also gives why Data Mining is required. Various examples on Classification , Cluster and Association rules are given.
The combination of analytic technology and fraud analytics techniques with human interaction which will help to detect the possible improper transactions like fraud or bribery either before the transaction is done or after the transaction is done
The document discusses explainability and bias in machine learning/AI models. It covers several topics:
1. Why explainability of models is important, including for laypeople using models and potential legal needs for explanations of decisions.
2. Methods for explainability including using interpretable models directly and post-hoc explainability methods like LIME and SHAP which provide feature attributions.
3. Issues with bias in machine learning models and different definitions of fairness. It also discusses techniques for measuring and mitigating bias, such as reweighting data or using adversarial learning.
The document outlines a data science roadmap that covers fundamental concepts, statistics, programming, machine learning, text mining, data visualization, big data, data ingestion, data munging, and tools. It provides the percentage of time that should be spent on each topic, and lists specific techniques in each area, such as linear regression, decision trees, and MapReduce in big data.
Provides a brief overview of what machine learning is, how it works (theory), how to prepare data for a machine learning problem, an example case study, and additional resources.
My presentation at The Richmond Data Science Community (Jan 2018). The slides are slightly different than what I had presented last year at The Data Intelligence Conference.
There are 100,000 applicants for loans. Who is likely to default? How to effectively offer a loan
There are 100,000 consumers who is likely to buy my product? How to effectively market my product?
There are more than 1,000,000,000 transactions in a day. How to identify the fraud transaction?
There are 1,000,000 claims every year. How to identify the fake claims
Tools and techniques adopted for big data analyticsJOSEPH FRANCIS
油
This document discusses tools and techniques for big data analytics. It begins by defining big data and explaining why big data analysis is important for businesses. It then outlines the characteristics and history of big data, as well as the challenges and phases of big data analysis. The document proceeds to describe several tools and techniques used for big data analytics, including machine learning, natural language processing, and visualization. It provides examples of how these tools and techniques have been applied through case studies of Indian elections, AirBnB, and Shoppers Stop.
The document provides an introduction to data mining. It states that 40 zettabytes of data will be created in 2019 and 90% of existing data has been created in the last two years. It defines data, information, and knowledge and explains what data mining is and some of its applications. The document discusses different types of data and data analysis techniques like classification, clustering, regression, and association rule mining. It provides examples of how these techniques can be applied to problems in business, science, and other domains.
This document is a project report submitted by four students to fulfill the requirements for a Bachelor of Technology degree in Information Technology. The report discusses steganography, which is hiding secret information within other information. Specifically, the report focuses on digital image steganography, where secret messages are hidden within digital images. The report provides an introduction to steganography, a literature review on related topics like cryptography, an analysis of requirements, descriptions of how image steganography works and algorithms used, system design diagrams, implementation details, applications of the system, and directions for future work.
details about brain tumor
literature survey on many reference papers related to brain tumor detection using various techniques
our proposed novel methodology for brain tumor detection
HEALTH PREDICTION ANALYSIS USING DATA MININGAshish Salve
油
Data mining techniques are used for a variety of applications. In healthcare industry, datamining plays an important
role in predicting diseases. For detecting a disease number of tests should be required from the patient. But using data
mining technique the number of tests can be reduced. This reduced test plays an important role in time and performance.
This report analyses data mining techniques which can be used for predicting different types of diseases. This report reviewed
the research papers which mainly concentrate on predicting various disease
Big Data Sources PowerPoint Presentation 際際滷s 際際滷Team
油
Manage the big data efficiently using our big data sources PowerPoint deck. These 25 slides presentation deck gives you the access to highlight the information related to the most complex topic in the most desired manner. This PPT deck has been designed by our creative and proficient designers who have great designing skills and knowledge about the concept of big data. Our PPT deck helps you make your people understand that from where the data comes and where it gets stored. Every individual today is using one or another device to accomplish their day to day tasks but only few know how they can assure to protect their data and how it is saved in the cloud storage. To make a presentation where you have to share the information about big data sources you can use this big data sources PowerPoint deck. Some key presentation slides that are included in the deck are cloud, web, internet of things, media, databases, data warehouse appliances etc. Disprove exaggerated claims with our Big Data Sources PowerPoint Presentation 際際滷s. Force the boastful to accept their error.
This document discusses anomaly and fraud detection using machine learning. It outlines different applications of anomaly detection such as cybersecurity and fraud detection. It compares supervised versus unsupervised learning approaches for financial sector applications. Specific algorithms discussed for unsupervised anomaly detection include isolation forest, DBSCAN, HDBSCAN, local outlier factor, and Gaussian mixture models.
This slide will try to communicate via pictures, instead of going technical mumbo-jumbo. We might go somewhere but slide is full of pictures. If you dont understand any part of it, let me know.
This document provides an overview of a data science course. It discusses topics like big data, data science components, use cases, Hadoop, R, and machine learning. The course objectives are to understand big data challenges, implement big data solutions, learn about data science components and prospects, analyze use cases using R and Hadoop, and understand machine learning concepts. The document outlines the topics that will be covered each day of the course including big data scenarios, introduction to data science, types of data scientists, and more.
Fake News Detection using Passive Aggressive and Na誰ve BayesIRJET Journal
油
This document describes research on detecting fake news using machine learning algorithms. The researchers collected a dataset of labeled real and fake news articles and extracted features using bag-of-words and TF-IDF. They then trained two classification models, Naive Bayes and Passive-Aggressive, on the dataset. The Passive-Aggressive model achieved a higher accuracy of 93% compared to 89% for the Naive Bayes model. The researchers concluded the Passive-Aggressive algorithm is better able to differentiate between real and fake news, but note that further improving datasets and incorporating other media could enhance fake news detection.
This document discusses data mining and its applications. It defines data mining as using algorithms to discover patterns in large data sets beyond simple analysis. It then provides examples of data mining applications, including market basket analysis, education, manufacturing, customer relationship management, fraud detection, research analysis, criminal investigation, and bioinformatics. The document also outlines the typical stages of the data mining process: data understanding, data preparation, modeling, evaluation, and deployment.
This lecture gives various definitions of Data Mining. It also gives why Data Mining is required. Various examples on Classification , Cluster and Association rules are given.
The combination of analytic technology and fraud analytics techniques with human interaction which will help to detect the possible improper transactions like fraud or bribery either before the transaction is done or after the transaction is done
The document discusses explainability and bias in machine learning/AI models. It covers several topics:
1. Why explainability of models is important, including for laypeople using models and potential legal needs for explanations of decisions.
2. Methods for explainability including using interpretable models directly and post-hoc explainability methods like LIME and SHAP which provide feature attributions.
3. Issues with bias in machine learning models and different definitions of fairness. It also discusses techniques for measuring and mitigating bias, such as reweighting data or using adversarial learning.
The document outlines a data science roadmap that covers fundamental concepts, statistics, programming, machine learning, text mining, data visualization, big data, data ingestion, data munging, and tools. It provides the percentage of time that should be spent on each topic, and lists specific techniques in each area, such as linear regression, decision trees, and MapReduce in big data.
Provides a brief overview of what machine learning is, how it works (theory), how to prepare data for a machine learning problem, an example case study, and additional resources.
My presentation at The Richmond Data Science Community (Jan 2018). The slides are slightly different than what I had presented last year at The Data Intelligence Conference.
There are 100,000 applicants for loans. Who is likely to default? How to effectively offer a loan
There are 100,000 consumers who is likely to buy my product? How to effectively market my product?
There are more than 1,000,000,000 transactions in a day. How to identify the fraud transaction?
There are 1,000,000 claims every year. How to identify the fake claims
Tools and techniques adopted for big data analyticsJOSEPH FRANCIS
油
This document discusses tools and techniques for big data analytics. It begins by defining big data and explaining why big data analysis is important for businesses. It then outlines the characteristics and history of big data, as well as the challenges and phases of big data analysis. The document proceeds to describe several tools and techniques used for big data analytics, including machine learning, natural language processing, and visualization. It provides examples of how these tools and techniques have been applied through case studies of Indian elections, AirBnB, and Shoppers Stop.
The document provides an introduction to data mining. It states that 40 zettabytes of data will be created in 2019 and 90% of existing data has been created in the last two years. It defines data, information, and knowledge and explains what data mining is and some of its applications. The document discusses different types of data and data analysis techniques like classification, clustering, regression, and association rule mining. It provides examples of how these techniques can be applied to problems in business, science, and other domains.
This document is a project report submitted by four students to fulfill the requirements for a Bachelor of Technology degree in Information Technology. The report discusses steganography, which is hiding secret information within other information. Specifically, the report focuses on digital image steganography, where secret messages are hidden within digital images. The report provides an introduction to steganography, a literature review on related topics like cryptography, an analysis of requirements, descriptions of how image steganography works and algorithms used, system design diagrams, implementation details, applications of the system, and directions for future work.
details about brain tumor
literature survey on many reference papers related to brain tumor detection using various techniques
our proposed novel methodology for brain tumor detection
HEALTH PREDICTION ANALYSIS USING DATA MININGAshish Salve
油
Data mining techniques are used for a variety of applications. In healthcare industry, datamining plays an important
role in predicting diseases. For detecting a disease number of tests should be required from the patient. But using data
mining technique the number of tests can be reduced. This reduced test plays an important role in time and performance.
This report analyses data mining techniques which can be used for predicting different types of diseases. This report reviewed
the research papers which mainly concentrate on predicting various disease
Big Data Sources PowerPoint Presentation 際際滷s 際際滷Team
油
Manage the big data efficiently using our big data sources PowerPoint deck. These 25 slides presentation deck gives you the access to highlight the information related to the most complex topic in the most desired manner. This PPT deck has been designed by our creative and proficient designers who have great designing skills and knowledge about the concept of big data. Our PPT deck helps you make your people understand that from where the data comes and where it gets stored. Every individual today is using one or another device to accomplish their day to day tasks but only few know how they can assure to protect their data and how it is saved in the cloud storage. To make a presentation where you have to share the information about big data sources you can use this big data sources PowerPoint deck. Some key presentation slides that are included in the deck are cloud, web, internet of things, media, databases, data warehouse appliances etc. Disprove exaggerated claims with our Big Data Sources PowerPoint Presentation 際際滷s. Force the boastful to accept their error.
This document discusses anomaly and fraud detection using machine learning. It outlines different applications of anomaly detection such as cybersecurity and fraud detection. It compares supervised versus unsupervised learning approaches for financial sector applications. Specific algorithms discussed for unsupervised anomaly detection include isolation forest, DBSCAN, HDBSCAN, local outlier factor, and Gaussian mixture models.
This slide will try to communicate via pictures, instead of going technical mumbo-jumbo. We might go somewhere but slide is full of pictures. If you dont understand any part of it, let me know.
This document provides an overview of a data science course. It discusses topics like big data, data science components, use cases, Hadoop, R, and machine learning. The course objectives are to understand big data challenges, implement big data solutions, learn about data science components and prospects, analyze use cases using R and Hadoop, and understand machine learning concepts. The document outlines the topics that will be covered each day of the course including big data scenarios, introduction to data science, types of data scientists, and more.
Come diventare data scientist - Si ringrazie per le slide Paolo Pellegrini, Senior Consultant presso P4I (Partners4Innovation) e referente di tutte le progettualit relative alle tematiche Data Science e Big Data Analytics. Owner del primo gruppo in Italia dedicato dai Data Scientist.
Big Data e Business Intelligence. Intervento del Prof. Pozzan nell'ambito dell'open day organizzato dalla Fondazione ITS Kennedy di Pordenone, evento del 13 settembre 2014 in cui sono stati presentati i temi per i corsi in partenza a novembre 2014.
The document describes a 10 module data science course covering topics such as introduction to data science, machine learning techniques using R, Hadoop architecture, and Mahout algorithms. The course includes live online classes, recorded lectures, quizzes, projects, and a certificate. Each module covers specific data science topics and techniques. The document provides details on the course content, objectives, and topics covered in module 1 which includes an introduction to data science, its components, use cases, and how to integrate R and Hadoop. Examples of data science applications in various domains like healthcare, retail, and social media are also presented.
Plan van aanpak - Het plan beschrijft de manier waarop de sociaaleconomische kracht van het Rijk van Nijmegen in beeld wordt gebracht, wat het beoogde resultaat is van het traject en wie er bij betrokken zijn en worden.
S-Coin
Monezile virtuale devin din ce in ce mai cunoscute si folosite, ele reprezentand acum viitorul in segmentul instrumentelor de plata. Scoin este moneda pe care coinspace o pregateste de lansare pe piata criptovalutelor. Acest grup este destinat informarii si prezentarii acestei oportunitati.
Network Marketing Digital 3.0
- Nu este nevoie sa stii cum se vinde
- Nu exist nici o obligaie de a plti 樽n fiecare lun
- Venit garantat din 樽nchirierea de miners
- Performana crete datorit unei re樽nnoiri anuale
Le opportunita dei Big Data - Palazzolo Digital Festival 2013 (PDF13)Vincenzo Manzoni
油
I social network e i canali tradizionali di acquisizione delle informazioni, costituiscono le principali fonti di accumulo dati. Luso appropriato del dato e lutilizzo nei processi aziendali incrementano il valore del business. Ma cosa si pu嘆 realizzare con i big data concretamente? Unesperienza sul campo.
際際滷 mostrate durante il Palazzolo Digital Festival 2013 (PDF13). Per maggiori informazioni sullevento: http://www.palazzolodigitalfestival.it/
-l'uso dei big data a supporto della statistica -
Intervento di Daniela fusco - Istat Campania
al Seminario del #23maggio 2019 tenutosi presso
l'Universit degli Studi #UniParthenope di Napoli
Dipartimento di Studi aziendali e quantitativi
Via Generale Parisi, 13
Abstract:
Nuovi dati e nuove fonti: le statistiche sperimentali e i big data 竪 Il seminario, curato dall Ufficio territoriale Istat
per la Campania e la Basilicata e dall'Universit degli studi di Napoli Parthenope con lobiettivo di consolidare la
conoscenza delle nuove fonti dati ad uso statistico, utilizzate dallIstat. In particolare in questo evento il focus 竪 sulle
statistiche sperimentali e i big data.
I destinatari delliniziativa sono ricercatori e studenti universitari che intendono utilizzare o approfondire la conoscenza
dei principali sistemi di diffusione dellIstat e le modalit di interrogazione dei principali Open Data forniti
dallIstituto.
Link: https://lnkd.in/dqAmRRW
Digital Transformation: Big Data, User Targeting ed Etica - Project Work Mast...Free Your Talent
油
Digital Transformation: Big Data, User Targeting ed Etica - Project Work a cura degli studenti del Master ISTUD in Marketing Management Alex Caruso, Federica Ferrara e Riccardo Pavesi
Breve panoramica sui Big Data, per chi ne ha solo sentito parlare ma non sa bene cosa siano.
La presentazione non 竪 pensata per un pubblico tecnico e segue questa agenda:
1. definizione di Big Data delle 3 V
2. esempi di progetti realmente effettuati
3. tecnologie
4. riflessioni varie
Intervento Presidente Istituto Nazionale di Statistica - Il valore dei dati nell'era dei Big Data - Universit di Napoli Federico II Dipartimento di Scienze Politiche Napoli, 2 Marzo 2016
Massimo Rosso - Social Media e Prodotti TV: esperienze di "Extended Audience"...Cultura Digitale
油
Nell'ambito della discussione circa i Big Data e l'analisi delle conversazioni CrossMediali correlate agli eventi televisivi, appare di estremo interesse l'approfondimento dell'opportunit' di utilizzo, dei pericoli, e dei ritorni del cosiddetto "canale di ritorno" abilitato dai social media in internet. A seguito della mancata affermazione del flusso informativo proveniente dai decoder (apparati che hanno caratterizzato la transizione al Digitale Terrestre), le moderne media company italiane sono attualmente impegnate a valutare i potenziali ritorni derivanti dall'accesso, trattamento ed analisi delle informazioni pubblicate, in primis, sulle piattaforme di social networking. Il presidio del nuovo "canale di ritorno" cos狸 definito promette di abilitare analisi di marketing innovative e valutazioni delle performance di prodotto di elevata puntualit ed affidabilit; in una parola vantaggio competitivo. L'intervento proposto illustrer, dal punto di vista del Dipartimento ICT di RAI, le esperienze di recente conduzione (sperimentazione di strumenti di "extended audience" in occasione ad esempio del Festival di San Remo) ed approfondir i principali elementi da considerare per una pertinente gestione dei rischi potenziali.
Big Data, Open Data e Open Information:
-Individuarli, Analizzarli e Gestirli: Benefici e Vantaggi;
-Open Data e Open Information: Definizioni e Quadro Normativo;
-Come avviare il processo di apertura dei dati;
130 FN 90 Febbraio 2017 - Tavola Rotonda L'analisi tanto attesa - Fieldbus & ...Cristian Randieri PhD
油
VEDIAMO QUI I VANTAGGI CHE SI POSSONO OTTENERE CON LA BIG DATA ANALYSIS, NONCH GLI STRUMENTI A DISPOSIZIONE E LE MODALIT CON CUI TRASFORMARE I DATI IN DECISIONI UTILI AL BUSINESS
Abbiamo chiesto ad alcuni dei principali attori del mondo dellautomazione industriale di fare luce sullampio tema della big data analysis, partendo dal suo significato per conoscere poi quali applicazioni siano state messe in campo dalle aziende da loro rappresentate.
Per Cristian Randieri, presidente e CEO di Intellisystem Technologies (www.intellisystem.it), quando si parla di big data si fa riferimento a una collezione eterogenea di dati grezzi che di per s辿 non hanno alcun valore se non analizzati e quindi rielaborati mediante le pi湛 moderne tecniche, meglio definite col termine data mining. Questa tecnica pu嘆 essere definita come lattivit di estrazione dellinformazione da una miniera di dati grezzi. Per capire meglio questo concetto occorre approfondire il significato di alcune parole. Il dato 竪 lelemento base potenzialmente informativo, le cui caratteristiche sono note ma non ancora organizzate o classificate, in quanto costituito da simboli che devono essere elaborati prima di poter essere compresi. Linformazione 竪 il risultato dellelaborazione di pi湛 dati che restituisce una serie di dati aggregati e organizzati in modo significativo. La conoscenza 竪 una serie di informazioni che, aggregate tra loro, consentono di diffondere sapere, comprensione, cultura o esperienza. Di conseguenza, qualsiasi operazione di big data analysis consiste in tutte le attivit che hanno come obiettivo lestrazione di informazioni da una quantit di dati indefinita, ovvero tutto ci嘆 che attraverso ricerca, analisi e organizzazione genera sapere o conoscenza a partire da dati non strutturati. Si tratta di una serie di tecniche e metodologie molto simili alla statistica ma con una grande differenza: la prima 竪 usata per fotografare lo stato temporale dei dati, mentre il data mining 竪 pi湛 usato per cercare correlazioni tra variabili a scopi predittivi.
Data Science nel manufacturing: l'esperienza di TenarisVincenzo Manzoni
油
L'intervento si 竪 svolto all'interno dell'evento: "Industria 4.0: quale evoluzione per professioni, competenze e formazione?" organizzato dall'Osservatorio Industria 4.0 della School of Management del Politecnico di Milano il 16 febbraio 2017.
http://www.osservatori.net/it_it/eventi/jobs-skills-4-0
Introduzione ai Big Data e alla scienza dei dati - Big DataVincenzo Manzoni
油
Lezione 5 del corso di analisi dati tenuto al Palazzolo Digital Hub (Palazzolo sull'Oglio, Brescia) nel 2014. In questa quinta e ultima lezione si introducono le tecnologie dei Big Data.
Introduzione ai Big Data e alla scienza dei dati - Sistemi di raccomandazioneVincenzo Manzoni
油
Lezione 4 del corso di analisi dati tenuto al Palazzolo Digital Hub (Palazzolo sull'Oglio, Brescia) nel 2014. In questa quarta lezione si introducono i sistemi di raccomandazione.
Introduzione ai Big Data e alla scienza dei dati - Exploratory Data AnalysisVincenzo Manzoni
油
Lezione 2 del corso di analisi dati tenuto al Palazzolo Digital Hub (Palazzolo sull'Oglio, Brescia) nel 2014. Seconda lezione dedicata all'Exploratory Data Analysis.
Introduzione ai Big Data e alla scienza dei dati - Machine LearningVincenzo Manzoni
油
Lezione 3 del corso di analisi dati tenuto al Palazzolo Digital Hub (Palazzolo sull'Oglio, Brescia) nel 2014. In questa terza lezione si introducono alcuni algoritmi di machine learning.
Introduzione ai Big Data e alla scienza dei dati - I formati datiVincenzo Manzoni
油
Lezione 1 del corso di analisi dati tenuto al Palazzolo Digital Hub (Palazzolo sull'Oglio, Brescia) nel 2014. Il tema di questa prima lezione sono i formati dati.
6. Big Data
n. Computing data of a very large size, typically to the extent that its manipulation and
management present significant logistical challenges; (also) the branch of computing
involving such data.
Oxford English Dictionary, 2013
7. LORIGINE DEL TERMINE
Usato per la prima volta nel 2008 nel
Computing Community Consortium. Italia
Stati Uniti
9. VOLUME
Volume
Velocit
Variet
Informazione prodotta in un giorno
2.5 milioni di TB
(il 20% di tutta la conoscenza umana
nel 1999!)
532.000.000 DVD
Se impilati, 640 km!
In un anno, raggiungerebbero il 60%
della distanza Terra - Luna
Fonte: Harvard Business Review,Big Data: the management revolution, ottobre 2012.
17. Migliore informazione
Nuove analisi dati
Informazioni real-time
In鍖usso sui dati dalle nuove tecnologie
Forme non tradizionali di media
Grandi quantit di dati
Lultima moda
Social media
0% 5% 9% 14% 18%
7%
8%
10%
13%
13%
15%
16%
18%
IN COSA CONSISTONO?
SECONDO GLI EXECUTIVE INTERVISTATI DA IBM
Fonte: IBM,Analytics:The real-world use of big data, 2012.
18. LE FONTI
SECONDO GLI EXECUTIVE INTERVISTATI DA IBM
Transazioni
Log
Eventi
E-mail
Social network
Sensori
RFID e POS
Testo libero
Geogra鍖ci
Audio
Fotogra鍖e / video
0% 23% 45% 68% 90%
24%
38%
40%
41%
41%
42%
43%
57%
59%
73%
88%
Fonte: IBM,Analytics:The real-world use of big data, 2012.
26. LE OPPORTUNIT
1.I big data applicati alla sanit
possono far risparmiare agli Stati
Uniti 300 B$ in ef鍖cienza.
2.LEuropa pu嘆 risparmiare 149 B$
in costi di amministrazione e
governo.
3.Solo negli Stati Uniti serviranno
nel breve periodo 1.5+ M di data
scientist e data manager.
28. I DATA PRODUCT
Cose che si conoscono
Cose che non si conoscono
Domande che ci si
fanno
Domande che non ci
si fanno
Business
intelligence
Data
Discovery
Analista dati
Data Scientist
38. [] Renzi ha spiegato di non avere in programma un aumento del prelievo fiscale e di
volere anzi combattere levasione anche attraverso innovazione digitale e incrocio
dei dati
Il Corriere della Sera Online, 21 marzo 2014
46. MANUFACTURING:TENARIS
FASE 2: USO DEL DATO
Aumento improvviso e non
giusti鍖cato di un parametro
di qualit del prodotto
Il processo 竪 andato
fuori controllo
e con il parametro di
qualit ritorno in controllo.
Viene fatto un intervento e il
processo ritorna in controllo
54. IL ROI DEI BIG DATA
0.00油
1.00油
2.00油
3.00油
4.00油
Ritorno per 1 investito Ritorno atteso in 3-5 anni
3.50油
0.55油
Fonte:Wikibon,Enterprise struggling to derive maximum value form Big Data, 2013.
57. I CASI DI INSUCCESSO
1.Assenza di 鍖gure con la professionalit necessaria
2.Uso di tecnologie grezze e immature.
3.Mancanza di un business case speci鍖co!
58. I CASI DI SUCCESSO
1.Progetti non sponsorizzati da IT, ma da dipartimenti di line-
of-business come marketing o logistica.
2.Focus su un caso duso piccolo, ma strategico.
3.Iterare e crescere sulla base dei risultati precedenti.
59. LALGORITMO PER IL SUCCESSO
Contesto di business ben de鍖nito
Giuste domande
Risposte
Sorgenti dati di valore
63. INTRODUZIONE AI BIG DATA
E ALLA SCIENZA DEI DATI
Ordine degli Ingegneri di Como, 30 gennaio 2016
Ing.Vincenzo Manzoni, PhD
me@vincenzomanzoni.com
際際滷 disponibili qui: http://www.vincenzomanzoni.com/corsi/