狠狠撸

狠狠撸Share a Scribd company logo
Data Storytelling with the
DIKW Pyramid
Angelica Lo Duca, PhD
Researcher @ IIT-CNR
● Researcher at the Institute of Informatics and
Telematics of the National Research Council in Italy
○ Data Science, with a focus on Data Storytelling
○ Web Applications
○ Data Engineering
● Professor of Data Journalism at the University of Pisa
● Author of the books
○ Data Storytelling with Python Altair and
Generative AI, Manning
○ Comet for Data Science, Packt Ltd.
○ Learning and Operating Presto, O’Reilly Media
● Contributor at Towards Data Science, KDNuggets,
and DataTalks.club
Angelica Lo Duca
angelica.loduca@iit.cnr.it
Data storytelling is building stories
supported by data,
allowing analysts and data scientists
to present and share their insights
to engage the audience and inspire them
to make decisions.
In this talk,
we will focus on
how to transform a data visualization chart
into a data story.
Consider the message to communicate
Country Year Life Expectancy
Aruba 1960 65.56936585
Afghanistan 1960 32.3285122
Angola 1960 32.98482927
Albania 1960 62.25436585
Andorra 1960
Arab World 1960 46.85277979
United Arab Emirates 1960 52.24321951
Argentina 1960 65.21553659
Armenia 1960 65.86346342
…
Consider the audience
DATA
INSIGHT
Numbers
Texts
…
Discovery of new
patterns in data
DATA
ANALYSIS Insight IS NOT
data
Insight IS NOT
intuition
DATA
INSIGHT
IDEA
Numbers
Texts
…
Discovery of new
patterns in data
DATA
ANALYSIS
Thought to create
something new
COMMUNICATION
Communicating
IS NOT
Informing
Data Storytelling is communicating insights in a way that
inspires the audience to act.
DATA
INSIGHT
IDEA
Numbers
Texts
…
Discovery of new
patterns in data
DATA
ANALYSIS
Thought to create
something new
COMMUNICATION
Communicating
IS NOT
Informing
INFORMING
* Effective Data Storytelling by Brent Dykes, 2020, Wiley
Speaker
Audience
Data Storytelling is communicating insights in a way that
inspires the audience to act.
DATA
INSIGHT
IDEA
Numbers
Texts
…
Discovery of new
patterns in data
DATA
ANALYSIS
Thought to create
something new
COMMUNICATION
Communicating
IS NOT
Informing
* Effective Data Storytelling by Brent Dykes, 2020, Wiley
COMMUNICATING
Speaker
Audience
DATA
INSIGHT
IDEA
NEW
VALUE
Numbers
Texts
…
Discovery of new
patterns in data
DATA
ANALYSIS
COMMUNICATION
Thought to create
something new
ACTION
Offer something new
to the public
From Data to Value?
The DIKW Pyramid
* Introduction to Data Visualization & Storytelling: A Guide For The Data Scientist
by Jose Berengueres and Marybeth Sandell. Independently published, 2019
DATA
INFORMATION
KNOWLEDGE
WISDOM
RAW
MEANING
CONTEXT
ACTION
Lorem ipsum
dolor sit
Lorem ipsum
dolor sit
Background
Background Next Steps
DATA
INFORMATION
KNOWLEDGE
WISDOM
RAW
MEANING
CONTEXT
ACTION
Lorem ipsum
dolor sit
Lorem ipsum
dolor sit
Background
Background Next Steps
Lorem ipsum dolor sit
Background Next Steps
Background
Main Event
Resolution
1
2
3
Python Altair
Altair
The Vega-Altair library (Altair, for short) is a declarative Python library for statistical
visualization based on the Vega-Lite visualization grammars.
Declarative libraries specify what we want to see in a chart. We can specify the
data and the type of visualization we want, and the library creates the visualization
for us automatically.
Imperative libraries focus on building a visualization manually, for example
specifying the desired axis, size, legend, and labels (e.g. Matplotlib)
Altair
Python Interface
Vega-lite JSON
Altair parameters
● Marks: define the type of chart we want to build (e.g. bar chart, line chart, …)
● Channels: visual properties to represent, such as axes, colors, size, …
● Encodings: mapping of channels to data columns in the DataFrame
pip install altair
A first example
import pandas as pd
import altair as alt
df = pd.DataFrame({
'value' : [3,2,4],
'category' : ['M','N','O']
})
chart = alt.Chart(df
).mark_bar(
).encode(
x = 'value:Q',
y = 'category:N'
)
chart
Case Study
Scenario
Imagine that you work for a humanitarian organization that wants to apply for
funding from a Foundation to help reduce the homeless people problem in Italy.
Humanitarian interventions can be applied to up to four Italian regions.
The call for funds involves preparing a data visualization chart motivating the
selected regions and why to fund the proposal.
Your boss asks you to do a study to figure out which regions to invest in and
motivate your choice
ITTER107 Region Sex Age Citizenship Value
ITC1
Piemonte M TOTAL ITL 4218
ITC1
Piemonte F TOTAL ITL 1496
ITC1
Piemonte T TOTAL ITL 5714
ITC2
Valle d'Aosta M TOTAL ITL 41
ITC2
Valle d'Aosta F TOTAL ITL 17
http://dati-censimentipermanenti.istat.it/Index.aspx?DataSetCode=DCSS_SENZA_TETTO
ITTER107 Region Sex Age Citizenship Value
ITC1
Piemonte M TOTAL ITL 4218
ITC1
Piemonte F TOTAL ITL 1496
ITC1
Piemonte T TOTAL ITL 5714
ITC2
Valle d'Aosta M TOTAL ITL 41
ITC2
Valle d'Aosta F TOTAL ITL 17
http://dati-censimentipermanenti.istat.it/Index.aspx?DataSetCode=DCSS_SENZA_TETTO
ITTER107 Region Sex Age Citizenship Value
ITC1
Piemonte M TOTAL ITL 4218
ITC1
Piemonte F TOTAL ITL 1496
ITC1
Piemonte T TOTAL ITL 5714
ITC2
Valle d'Aosta M TOTAL ITL 41
ITC2
Valle d'Aosta F TOTAL ITL 17
http://dati-censimentipermanenti.istat.it/Index.aspx?DataSetCode=DCSS_SENZA_TETTO
Data Storytelling with the DIKW Pyramid.pdf
WHAT YOUR BOSS
EXPECTED TO SEE
Specific answer
WHAT YOUR BOSS
ACTUALLY SAW
Generic answer
Region A
Region B
Region C
Region D
Choose these four
regions because…
Homelessness is more
concentrated in the
North-ovest, in Lazio,
and in some South
regions
The eastern side is less
affected, with the
exception of Puglia
Data Storytelling with the DIKW Pyramid.pdf
Data Storytelling with the DIKW Pyramid.pdf
BACKGROUND
A brief description of the
homelessness problem
in your community.
NEXT STEPS
A list of objectives that
will help you achieve your
goal
MAIN POINT
There are four regions where
the problem is urgent to solve
Data Storytelling with the DIKW Pyramid.pdf
Data Storytelling with the DIKW Pyramid.pdf
Data Storytelling with the DIKW Pyramid.pdf
Data Storytelling with
Python Altair and
Generative AI
Discount code (35%):
au35duc
https://www.manning.com/boo
ks/data-storytelling-with-python
-altair-and-generative-ai
Let’s look at the code
https://github.com/alod83/Data-Storytelling-with-Python-Altair-and-Generati
ve-AI/tree/main
Thanks for your Attention!
https://alod83.medium.com/
https://www.linkedin.com/in/angelicaloduca/

More Related Content

Similar to Data Storytelling with the DIKW Pyramid.pdf (20)

PDF
Telling a Story – or Even Propaganda – Through Data Visualization
Demetris Trihinas
?
DOCX
TED Wiley Visualizing .docx
ssuserf9c51d
?
PPTX
Introduction to Data Visualization_Day 1.pptx
krittika26
?
PPTX
APLIC 2014 - Impact? Intrigue? Value-add? The ins and outs of Data Visualization
APLICwebmaster
?
PDF
Data Visualization: Impact, Intrigue, Value Add for APLIC 2014
Amanda Makulec
?
PPTX
DATA VISUALIZATION PRESENTATION AT ODS DUABI SEPTEMBER 2019
Jose Berengueres
?
PPTX
Leveraging Data Storytelling in visualization
Kevin X. Zhao
?
PPT
From Useless Numbers to Meaningful Stories: An Intro to Data Visualization
Heather Yandow
?
PPTX
Hands on Approaches to Data Storytelling
rahulbot
?
PDF
Storytelling with Data - Approach | Skills
Amit Kapoor
?
PPTX
Code for KC Learn Night 10/21/19: mySidewalk presents Data Storytelling
KC Digital Drive
?
PPTX
Data Storytelling: The Last Mile (A Curration)
Brittne Kakulla, Ph.D.
?
PPT
Data vis
hollywillis
?
PPTX
Data vispresupdate.pptx
Fan Feng
?
PDF
Telling stories with data slideshare
Cathie Howe
?
PDF
Explore Data: Data Science + Visualization
Roelof Pieters
?
PDF
Data Visualization for Beginners
Nika Aleksejeva
?
PDF
Denver Event - 2013 - Floodlight and Data Engine User Survey
KDMC
?
PDF
From Data to Visualization, what happens in between?
Krist Wongsuphasawat
?
Telling a Story – or Even Propaganda – Through Data Visualization
Demetris Trihinas
?
TED Wiley Visualizing .docx
ssuserf9c51d
?
Introduction to Data Visualization_Day 1.pptx
krittika26
?
APLIC 2014 - Impact? Intrigue? Value-add? The ins and outs of Data Visualization
APLICwebmaster
?
Data Visualization: Impact, Intrigue, Value Add for APLIC 2014
Amanda Makulec
?
DATA VISUALIZATION PRESENTATION AT ODS DUABI SEPTEMBER 2019
Jose Berengueres
?
Leveraging Data Storytelling in visualization
Kevin X. Zhao
?
From Useless Numbers to Meaningful Stories: An Intro to Data Visualization
Heather Yandow
?
Hands on Approaches to Data Storytelling
rahulbot
?
Storytelling with Data - Approach | Skills
Amit Kapoor
?
Code for KC Learn Night 10/21/19: mySidewalk presents Data Storytelling
KC Digital Drive
?
Data Storytelling: The Last Mile (A Curration)
Brittne Kakulla, Ph.D.
?
Data vis
hollywillis
?
Data vispresupdate.pptx
Fan Feng
?
Telling stories with data slideshare
Cathie Howe
?
Explore Data: Data Science + Visualization
Roelof Pieters
?
Data Visualization for Beginners
Nika Aleksejeva
?
Denver Event - 2013 - Floodlight and Data Engine User Survey
KDMC
?
From Data to Visualization, what happens in between?
Krist Wongsuphasawat
?

More from Angelica Lo Duca (10)

PDF
Inclusive Data Literacy for Blind and Visually Impaired Students
Angelica Lo Duca
?
PDF
PyNarrative: A Python Library for Data Storytelling
Angelica Lo Duca
?
PDF
An Empirical Study to Use Large Language Models to Extract Named Entities fro...
Angelica Lo Duca
?
PDF
Towards a Framework for AI-Assisted Data Storytelling
Angelica Lo Duca
?
PDF
3 Strategies to Organize your Data Science Projects.pdf
Angelica Lo Duca
?
PDF
How to organize your data science project with Comet.pdf
Angelica Lo Duca
?
PDF
A deep dive into neuton
Angelica Lo Duca
?
PDF
Relational vs Non Relational Databases
Angelica Lo Duca
?
PDF
How to model digital objects within the semantic web
Angelica Lo Duca
?
PDF
Linked Data
Angelica Lo Duca
?
Inclusive Data Literacy for Blind and Visually Impaired Students
Angelica Lo Duca
?
PyNarrative: A Python Library for Data Storytelling
Angelica Lo Duca
?
An Empirical Study to Use Large Language Models to Extract Named Entities fro...
Angelica Lo Duca
?
Towards a Framework for AI-Assisted Data Storytelling
Angelica Lo Duca
?
3 Strategies to Organize your Data Science Projects.pdf
Angelica Lo Duca
?
How to organize your data science project with Comet.pdf
Angelica Lo Duca
?
A deep dive into neuton
Angelica Lo Duca
?
Relational vs Non Relational Databases
Angelica Lo Duca
?
How to model digital objects within the semantic web
Angelica Lo Duca
?
Linked Data
Angelica Lo Duca
?
Ad

Recently uploaded (20)

PPTX
Generative AI Boost Data Governance and Quality- Tejasvi Addagada
Tejasvi Addagada
?
PDF
Loading Data into Snowflake (Bulk & Stream)
Accentfuture
?
PDF
顿补迟à补补补补补补补补补别苍驳颈苍别别别别别别别别别别别别别别别别别别别别别别别
juadsr96
?
PDF
TCU EVALUATION FACULTY TCU Taguig City 1st Semester 2017-2018
MELJUN CORTES
?
PPTX
Data Analytics using sparkabcdefghi.pptx
KarkuzhaliS3
?
PPTX
Krezentios memories in college data.pptx
notknown9
?
PPTX
RESEARCH-FINAL-GROUP-3, about the final .pptx
gwapokoha1
?
PPTX
Artificial intelligence Presentation1.pptx
SaritaMahajan5
?
PDF
Kafka Use Cases Real-World Applications
Accentfuture
?
PDF
A Web Repository System for Data Mining in Drug Discovery
IJDKP
?
PPTX
Module-2_3-1eentzyssssssssssssssssssssss.pptx
ShahidHussain66691
?
PPTX
microservices-with-container-apps-dapr.pptx
vjay22
?
PDF
Blood pressure (3).pdfbdbsbsbhshshshhdhdhshshs
hernandezemma379
?
PDF
ilide.info-tg-understanding-culture-society-and-politics-pr_127f984d2904c57ec...
jed P
?
PPTX
办理学历认证滨苍蹿辞谤尘补迟颈肠蝉尝别迟迟别谤新加坡英华美学院毕业证书,滨苍蹿辞谤尘补迟颈肠蝉成绩单
Taqyea
?
PPTX
Comparative Study of ML Techniques for RealTime Fraud Detection System.pptx
Debolina Ghosh
?
PPTX
covid 19 data analysis updates in our municipality
RhuAyungon1
?
PPTX
Monitoring Improvement ( Pomalaa Branch).pptx
fajarkunee
?
PPTX
美国史蒂文斯理工学院毕业证书调厂滨罢学费发票厂滨罢录取通知书皑哪里购买
Taqyea
?
Generative AI Boost Data Governance and Quality- Tejasvi Addagada
Tejasvi Addagada
?
Loading Data into Snowflake (Bulk & Stream)
Accentfuture
?
顿补迟à补补补补补补补补补别苍驳颈苍别别别别别别别别别别别别别别别别别别别别别别别
juadsr96
?
TCU EVALUATION FACULTY TCU Taguig City 1st Semester 2017-2018
MELJUN CORTES
?
Data Analytics using sparkabcdefghi.pptx
KarkuzhaliS3
?
Krezentios memories in college data.pptx
notknown9
?
RESEARCH-FINAL-GROUP-3, about the final .pptx
gwapokoha1
?
Artificial intelligence Presentation1.pptx
SaritaMahajan5
?
Kafka Use Cases Real-World Applications
Accentfuture
?
A Web Repository System for Data Mining in Drug Discovery
IJDKP
?
Module-2_3-1eentzyssssssssssssssssssssss.pptx
ShahidHussain66691
?
microservices-with-container-apps-dapr.pptx
vjay22
?
Blood pressure (3).pdfbdbsbsbhshshshhdhdhshshs
hernandezemma379
?
ilide.info-tg-understanding-culture-society-and-politics-pr_127f984d2904c57ec...
jed P
?
办理学历认证滨苍蹿辞谤尘补迟颈肠蝉尝别迟迟别谤新加坡英华美学院毕业证书,滨苍蹿辞谤尘补迟颈肠蝉成绩单
Taqyea
?
Comparative Study of ML Techniques for RealTime Fraud Detection System.pptx
Debolina Ghosh
?
covid 19 data analysis updates in our municipality
RhuAyungon1
?
Monitoring Improvement ( Pomalaa Branch).pptx
fajarkunee
?
美国史蒂文斯理工学院毕业证书调厂滨罢学费发票厂滨罢录取通知书皑哪里购买
Taqyea
?
Ad

Data Storytelling with the DIKW Pyramid.pdf

  • 1. Data Storytelling with the DIKW Pyramid Angelica Lo Duca, PhD Researcher @ IIT-CNR
  • 2. ● Researcher at the Institute of Informatics and Telematics of the National Research Council in Italy ○ Data Science, with a focus on Data Storytelling ○ Web Applications ○ Data Engineering ● Professor of Data Journalism at the University of Pisa ● Author of the books ○ Data Storytelling with Python Altair and Generative AI, Manning ○ Comet for Data Science, Packt Ltd. ○ Learning and Operating Presto, O’Reilly Media ● Contributor at Towards Data Science, KDNuggets, and DataTalks.club Angelica Lo Duca angelica.loduca@iit.cnr.it
  • 3. Data storytelling is building stories supported by data, allowing analysts and data scientists to present and share their insights to engage the audience and inspire them to make decisions.
  • 4. In this talk, we will focus on how to transform a data visualization chart into a data story.
  • 5. Consider the message to communicate
  • 6. Country Year Life Expectancy Aruba 1960 65.56936585 Afghanistan 1960 32.3285122 Angola 1960 32.98482927 Albania 1960 62.25436585 Andorra 1960 Arab World 1960 46.85277979 United Arab Emirates 1960 52.24321951 Argentina 1960 65.21553659 Armenia 1960 65.86346342 …
  • 8. DATA INSIGHT Numbers Texts … Discovery of new patterns in data DATA ANALYSIS Insight IS NOT data Insight IS NOT intuition
  • 9. DATA INSIGHT IDEA Numbers Texts … Discovery of new patterns in data DATA ANALYSIS Thought to create something new COMMUNICATION Communicating IS NOT Informing
  • 10. Data Storytelling is communicating insights in a way that inspires the audience to act. DATA INSIGHT IDEA Numbers Texts … Discovery of new patterns in data DATA ANALYSIS Thought to create something new COMMUNICATION Communicating IS NOT Informing INFORMING * Effective Data Storytelling by Brent Dykes, 2020, Wiley Speaker Audience
  • 11. Data Storytelling is communicating insights in a way that inspires the audience to act. DATA INSIGHT IDEA Numbers Texts … Discovery of new patterns in data DATA ANALYSIS Thought to create something new COMMUNICATION Communicating IS NOT Informing * Effective Data Storytelling by Brent Dykes, 2020, Wiley COMMUNICATING Speaker Audience
  • 12. DATA INSIGHT IDEA NEW VALUE Numbers Texts … Discovery of new patterns in data DATA ANALYSIS COMMUNICATION Thought to create something new ACTION Offer something new to the public
  • 13. From Data to Value?
  • 14. The DIKW Pyramid * Introduction to Data Visualization & Storytelling: A Guide For The Data Scientist by Jose Berengueres and Marybeth Sandell. Independently published, 2019
  • 17. Lorem ipsum dolor sit Background Next Steps Background Main Event Resolution 1 2 3
  • 19. Altair The Vega-Altair library (Altair, for short) is a declarative Python library for statistical visualization based on the Vega-Lite visualization grammars. Declarative libraries specify what we want to see in a chart. We can specify the data and the type of visualization we want, and the library creates the visualization for us automatically. Imperative libraries focus on building a visualization manually, for example specifying the desired axis, size, legend, and labels (e.g. Matplotlib)
  • 21. Altair parameters ● Marks: define the type of chart we want to build (e.g. bar chart, line chart, …) ● Channels: visual properties to represent, such as axes, colors, size, … ● Encodings: mapping of channels to data columns in the DataFrame pip install altair
  • 22. A first example import pandas as pd import altair as alt df = pd.DataFrame({ 'value' : [3,2,4], 'category' : ['M','N','O'] }) chart = alt.Chart(df ).mark_bar( ).encode( x = 'value:Q', y = 'category:N' ) chart
  • 24. Scenario Imagine that you work for a humanitarian organization that wants to apply for funding from a Foundation to help reduce the homeless people problem in Italy. Humanitarian interventions can be applied to up to four Italian regions. The call for funds involves preparing a data visualization chart motivating the selected regions and why to fund the proposal. Your boss asks you to do a study to figure out which regions to invest in and motivate your choice
  • 25. ITTER107 Region Sex Age Citizenship Value ITC1 Piemonte M TOTAL ITL 4218 ITC1 Piemonte F TOTAL ITL 1496 ITC1 Piemonte T TOTAL ITL 5714 ITC2 Valle d'Aosta M TOTAL ITL 41 ITC2 Valle d'Aosta F TOTAL ITL 17 http://dati-censimentipermanenti.istat.it/Index.aspx?DataSetCode=DCSS_SENZA_TETTO
  • 26. ITTER107 Region Sex Age Citizenship Value ITC1 Piemonte M TOTAL ITL 4218 ITC1 Piemonte F TOTAL ITL 1496 ITC1 Piemonte T TOTAL ITL 5714 ITC2 Valle d'Aosta M TOTAL ITL 41 ITC2 Valle d'Aosta F TOTAL ITL 17 http://dati-censimentipermanenti.istat.it/Index.aspx?DataSetCode=DCSS_SENZA_TETTO
  • 27. ITTER107 Region Sex Age Citizenship Value ITC1 Piemonte M TOTAL ITL 4218 ITC1 Piemonte F TOTAL ITL 1496 ITC1 Piemonte T TOTAL ITL 5714 ITC2 Valle d'Aosta M TOTAL ITL 41 ITC2 Valle d'Aosta F TOTAL ITL 17 http://dati-censimentipermanenti.istat.it/Index.aspx?DataSetCode=DCSS_SENZA_TETTO
  • 29. WHAT YOUR BOSS EXPECTED TO SEE Specific answer WHAT YOUR BOSS ACTUALLY SAW Generic answer Region A Region B Region C Region D Choose these four regions because… Homelessness is more concentrated in the North-ovest, in Lazio, and in some South regions The eastern side is less affected, with the exception of Puglia
  • 32. BACKGROUND A brief description of the homelessness problem in your community. NEXT STEPS A list of objectives that will help you achieve your goal MAIN POINT There are four regions where the problem is urgent to solve
  • 36. Data Storytelling with Python Altair and Generative AI Discount code (35%): au35duc https://www.manning.com/boo ks/data-storytelling-with-python -altair-and-generative-ai
  • 37. Let’s look at the code https://github.com/alod83/Data-Storytelling-with-Python-Altair-and-Generati ve-AI/tree/main
  • 38. Thanks for your Attention! https://alod83.medium.com/ https://www.linkedin.com/in/angelicaloduca/