DataKind SG sharing on our first DataDive with Humanitarian Organization for Migration Economics (HOME) and Earth Hour.
Know of other non-profits we can help? Reach out to singapore@datakind.org or drop me a note =)
2. Agenda
1. DataKind Singapore updates
2. DataDive Overview
3. DataDive Sharing: Earth Hour
4. DataDive Sharing: HOME
5. Small group discussions on data
handling best practices (if time allows)
3. Agenda
1. DataKind Singapore updates
2. DataDive Overview
3. DataDive Sharing: Earth Hour
4. DataDive Sharing: HOME
5. Data handling best practices
5. Updates
Project Accelerator coming up on 5 Sept!
- Please help spread the word to any social change organizations you know
who do good work.
- Sign up form here: http://goo.gl/forms/0TbDySVFi7
- Sign up by Friday, Aug 14.
6. Other data4good stuff
http://unitedwehack.com/
- August 15-16, 24 hour hackathon organized with UN Women
- A Hackathon to Promote Financial Literacy and Economic Empowerment for
Women Around the Globe.
- Access to partner APIs
http://blog.datalook.io/openimpact/
- Until August 31
- DataLook is a directory of reusable data-driven projects for social good. Join
our replication marathon and bring a project to your city.
7. Agenda
1. DataKind Singapore updates
2. DataDive Overview
3. DataDive Sharing: Earth Hour
4. DataDive Sharing: HOME
5. Data handling best practices
9. Get to know the
participating
organizations
Select and learn
about a problem
and the data
Determine a
specific task or
team that you
can work on
Data Dive in!
Repeat!
Coordinate with the teams
Data Ambassadors & Project
Managers ! code
final presentation
analysis
Contribute
Results
10. DataDive Retrospective
- Took place over weekend of 23 - 25 Apr
- More than 70 participants
- 2 non-profit organizations
- Earth Hour
- HOME
- Intro to the orgs on Friday and socialize,
working through Saturday, final
presentations on Sunday at 1pm.
11. DataDive Key Learnings
- Full involvement from partner orgs is important, and we need to
emphasize this from the very beginning
- Trello will be mandatory to avoid duplication of effort
- Grant access to data on Friday night and help participants to
start setting up data and tools, so that they can start right away
on Saturday morning
- Remind people that final presentations will be from Google
Presentation and set a hard deadline for getting content in, so
that there is time to vet
12. Agenda
1. DataKind Singapore updates
2. DataDive Overview
3. DataDive Sharing: Earth Hour
4. DataDive Sharing: HOME
5. Data handling best practices
13. Tweet Data Analysis - EARTH HOUR
Main Objectives
- Identify influencers on twitter
- Sentiment Analysis
- Word Cloud Analysis
14. Tweets Analysis Prep - EARTH HOUR
1. Understanding the data
anon
_id
created
_at
Text No Comma coordi
nates
lang RT_
count
fav_
count
reply_to
_user_id place
8393
7561
27/03/
2015
RT @JimHarris: SHAME: Canada Ranks LAST
Among OECD Countries for #ClimateChange
Performance #cdnpoli #climate #gls15 http://t.
co/0DD7S6oy7h
en 152 0
8423
3936
28/03/
2015
RT @earthhour: It's not about which country
you're from; it's about what planet we're from.
Join us for #EarthHour! 28th March; 8.30pm.
en 360 493 8423393
6
8069
6055
27/03/2
015
@nenshi will you be joining the #earthhour
party? Plz retweet to encourage lights out?
#earthhourcalgary #yourpower http://t.
co/68TblYiW2Y
-114.0
66491
4;....
en 9 4 [-114.0
59111
818;....
15. Tweet Data Analysis Prep - EARTH HOUR
2. Identify preliminary tasks to support other analysis
- Identify which tweets are retweets
- Identify which tweets contain which EH hashtags
- For retweets, identify which user is being retweeted
16. Tweets Analysis Prep - EARTH HOUR
3. Creating additional variables (I)
- Turn data into a table (tblTweets)
- Create binary variable to identify retweets
... Text No Comma ... is_retweet
RT @JimHarris: SHAME:
Canada Ranks LAST
Among OECD Countries
for #ClimateChange
Performance #cdnpoli
#climate #gls15 http://t.
co/0DD7S6oy7h
=IF(ISNUMBER
(SEARCH("RT
@",[@[Text No
Comma]])),1,0)
What does the formula do?
Check if RT @ is found in the tweet text
Case 1: String is found
1. SEARCH returns start character of string
2. ISNUMBER evaluates to true as search
returned a number
3. IF returns 1 as isnumber is true
Case 2: String is not found
1. SEARCH returns #Value error
2. ISNUMBER will evaluate to false as
search did not return a number
3. IF returns 0 as isnumber is false
17. Tweets Analysis Prep - EARTH HOUR
4. Secondary use for the is_retweet variable
- Understand lasting impact of campaign and event
18. Tweets Analysis Prep - EARTH HOUR
5. Creating additional variables (II)
- The motto of EH was use your power to change climate change
- EH # crawled: #EarthHour, #climatechange, #yourpower, #useyourpower
- Create binary variables for each hashtag
... Text No Comma ... EarthHour ...
RT @JimHarris: SHAME: Canada
Ranks LAST Among OECD
Countries for #ClimateChange
Performance #cdnpoli #climate
#gls15 http://t.co/0DD7S6oy7h
=IF(ISNUMBER(SEARCH
("#"&tblTweets[[#Headers],
[EarthHour]],[@[Text No
Comma]])),1,0)
...
20. Tweets Analysis Prep - EARTH HOUR
Main Takeaways
- Coding knowledge (e.g. R / Python) is not required to
contribute during a Data Dive
- Preparatory tasks can yield useful insights as well
- Excel can be helpful but may not be the most suitable
tool for large data files
21. Identify Influencers - EARTH HOUR
Problem
Identify influencers
Solution
Analyse tweet data to identify most retweeted users
22. Identify influencers - EARTH HOUR
1. Creating additional variables
- extract which user is being retweeted
... Text No Comma ... is_ret
weet
original_tweeter
RT @earthhour: It's not about
which country you're from; it's
about what planet we're from. Join
us for #EarthHour! 28th March;
8.30pm.
1 =IF([@[is_retweet]]=0,"original",RIGHT(LEFT([@
[Text No Comma]],FIND(":",[@[Text No Comma]]
&":")-1),LEN(LEFT([@[Text No Comma]],FIND(":",
[@[Text No Comma]]&":")-1))-3))
@earthhour
23. Identify influencers - EARTH HOUR
1. Creating additional variables
What does the formula do in this example?
=IF([@[is_retweet]]=0,"original",RIGHT(LEFT([@[Text No Comma]],FIND(":",[@[Text No Comma]]&":")-1),
LEN(LEFT([@[Text No Comma]],FIND(":",[@[Text No Comma]]&":")-1))-3))
[@[is_retweet]] = 1
[@[Text no comma]] = RT @earthhour: It's not about which country you're from; it's about what planet
we're from. Join us for #EarthHour! 28th March; 8.30pm.
The formula can be broken down into a few parts:
1. Check if it is a retweet - if it is, go to Point 2, otherwise mark it as original
2. Find the first occurrence of : in the text, return the character where it appears - 1
3. Start at the left of the tweet text, truncate the string after [Point 2] characters
4. From the length of the string in [Point 3], subtract 3 (for RT )
5. Start at the right of the string in [Point 3], truncate after [Point 4] characters
24. Identify influencers - EARTH HOUR
1. Create additional variables
2. Select all tweets in dataset with retweets >=500
3. Check for extraction errors (if re-tweets > followers, manual investigation)
4. Check for parsing errors (if length of text >= 180 char, marked as error)
5. From remaining set: For users who were retweeted, assess profile
information, number of followers, country where available
25. Identify influencers - EARTH HOUR
6. Outcome: Users who were most retweeted
User who is
retweeted
Nr. RT
>=500
Total RT
Count
User info Nr. Of
Followers
Country /
Region
@earthhour 5 46,947 EarthHour 143,000 Global
@LeoDiCaprio 5 7,627 Leonardo di Caprio
- Actor, WWF Ambassador
12,800,000 US
@AstroSamantha 2 2,750 Sam Cristoforetti
- Italian Astronaut on ISS
510,000 Italy
26. Sentiment Analysis - Tweets
WHAT IS SENTIMENT ANALYSIS
Sentiment analysis aims to determine the attitude of a speaker or a writer
with respect to some topic or the overall contextual polarity of a
document.
WHAT WE USED FOR SENTIMENT ANALYSIS
We used the Python package VADER, a lexicon and rule-based sentiment
analysis tool that is specifically attuned to sentiments expressed in social
media, and works well on texts from other domains.
More information on VADER can be found in
http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdfhttp:
//comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf
https://github.com/cjhutto/vaderSentiment
Installed VADER thru PIP
27. Sentiment Analysis - Tweets
HOW TO USE VADER IN YOUR CODE
from vaderSentiment import sentiment as vaderSentiment
sentences = [
"VADER is smart, handsome, and funny.", # positive sentence
example
"VADER is smart, handsome, and funny!", # punctuation emphasis
handled correctly (sentiment intensity adjusted)
].
for sentence in sentences:
print sentence,
vs = vaderSentiment(sentence)
print "nt" + str(vs)
28. Sentiment Analysis - Tweets
HOW VADER ANALYZES SOME OF THE INPUTS
VADER is smart, handsome, and funny.
{'neg': 0.0, 'neu': 0.254, 'pos': 0.746, 'compound': 0.8316}
VADER is smart, handsome, and funny!
{'neg': 0.0, 'neu': 0.248, 'pos': 0.752, 'compound': 0.8439}
VADER is VERY SMART, really handsome, and INCREDIBLY FUNNY!!!
{'neg': 0.0, 'neu': 0.294, 'pos': 0.706, 'compound': 0.9469}
A really bad, horrible book.
{'neg': 0.791, 'neu': 0.209, 'pos': 0.0, 'compound': -0.8211}
29. Sentiment Analysis - Tweets
HOW VADER ANALYZES SOME OF THE INPUTS
At least it isn't a horrible book.
{'neg': 0.0, 'neu': 0.637, 'pos': 0.363, 'compound': 0.431}
:) and :D
{'neg': 0.0, 'neu': 0.124, 'pos': 0.876, 'compound': 0.7925}
{'neg': 0.0, 'neu': 0.0, 'pos': 0.0, 'compound': 0.0}
Today sux
{'neg': 0.714, 'neu': 0.286, 'pos': 0.0, 'compound': -0.3612}
Today SUX!
{'neg': 0.779, 'neu': 0.221, 'pos': 0.0, 'compound': -0.5461}
30. Sentiment Analysis - Tweets
HOW WE PERFORMED SENTIMENT ANALYSIS ON EARTH HOUR DATA
DEFINE RANGES FOR COMPOUND VALUE TO CATEGORIZE THE
SENTIMENT OF TWEET
VERY NEGATIVE, NEGATIVE, NEUTRAL, POSITIVE AND VERY
POSITIVE
OPEN FILE
READ A RECORD
PARSE AND EXTRACT TWEET
PASS TWEET TEXT TO VADER METHOD
31. Sentiment Analysis - Tweets
HOW WE PERFORMED SENTIMENT ANALYSIS ON EARTH HOUR DATA
PARSE OUTPUT TO EXTRACT COMPOUND
BASED ON COMPOUND VALUE , DETERMINE SENTIMENT OF THE TWEET
STORE THE CATEGORICAL VALUE OF SENTIMENT IN A VARIABLE
ADD A DUMMY VARIABLE WITH VALUE OF 1
GO TO STEP 3 TILL EOF
WRITE THE OUTPUT TO A FILE
32. Sentiment Analysis - Tweets
HOW WE PREPARED CHARTS FOR SENTIMENT ANALYSIS
OPEN THE OUTPUT FILE CREATED AFTER APPLYING VADER
READ RECORDS INTO DATAFRAMES OF PANDAS (A powerful Python
data analysis toolkit)
PERFORM GROUPING (think of as GROUPBY in SQL) AND SUMMARIZE
THE DUMMY VARIABLE
PRESENT THE OUTPUT BY PIE CHARTS (using Python package
MATPLOTLIB)
33. Sentiment Analysis - Tweets
1) Using all valid records
- Apply Python package:
-vaderSentiment
- Categorize sentiment scores:
-Very positive: >0.55
-Positive: >=0.10 to <=.54
-Neutral: >= (-0.10) to <=0.09
-Negative: >=-0.55 to <=-0.11
-Very negative: <=-0.56
Input: Cleaned Hashtag Tweets
35. Questions?
1. Which keywords are most retweeted?
2. Is there a relation among the topics containing each of the four
keywords of Earth Hour?
3. Which words represent different sentiments?
37. Create WordCloud
library(wordcloud)
library(tm)
# import the data
lords <- Corpus(DirSource("./Wordcloud_Input/"))
# transform and prepare the data for the word cloud
lords <- tm_map(lords, stripWhitespace)
lords <- tm_map(lords, content_transformer(tolower))
lords <- tm_map(lords, removeWords, c("amp"))
lords <- tm_map(lords, stemDocument)
# Word cloud graph
wordcloud(lords, scale=c(3,0.3), max.words=100, random.order=FALSE,
rot.per=0.35, use.r.layout=FALSE, colors=brewer.pal(8, "Dark2"))
R library
Load input file
Further text
processing
Graph
options
38. Word Cloud Analysis
Using Unique Tweets Only
- Remove the four EH hashtags
- Apply R package: wordcloud
- This package works on word-stems
Input: Cleaned Hashtag Tweets
41. Country assignment to subscribers - EARTH HOUR
Problem
Find out country of ActiveCampaign subscribers
Solution
Merge information from different columns, such as city,
country, lat-long
42. Parallelized human cleaning process (PHCP)
"id","email","sdate","cdate","city","state_province","zip","country1","country2","country3","location","language","address","age","discover"
"100011","email@0accounts.com","2014-01-26 15:35:28","2014-01-26 02:35:28","","","","Netherlands","","","","","","",""
"100012","email@163.com","2012-11-29 02:01:46","2012-11-28 13:01:46","","","","China","","","","","","",""
"100013","email@student.hcmus.edu.vn","2015-03-21 20:32:55","2015-03-21 07:32:55","ho chi minh","","","Viet Nam","","","","","","",""
"100014","email@gmail.com","2014-03-14 14:59:15","2014-03-14 01:59:15","Coimbatore","","","","","","","","","",""
"100015","email@","2013-09-27 10:25:29","2013-09-26 21:25:29","","","","China, Guangdong Province, Foshan City","","","","","","",""
"100016","email@yahoo.com.tw","2012-11-29 02:41:50","2012-11-28 13:41:50","","","","Taipei","","","","","","",""
"100017","email@gmail.com","2013-03-12 11:36:39","2013-03-11 22:36:39","","","","12`3123","","","","","","",""
ActiveCampaign crawler and conversion from JSON
Result
# of subscribers: 321,704
# of subs. with country: 188,462 58.6%
43. Campaign response rate comparison of users -
EARTH HOUR
Problem
Rank users based on historical campaign response
data
Solution
Use expected open rates to compare users with
different number of campaigns received
46. Campaign counts
Users chance of engagement varies; they receive different amount of emails
Jan 2014 Jan 2015
Bridget
Alan
47. Who is better?
Response rate and campaign count together define how interested a user is
Alan Bridget
Response rate = 50%
Campaign count = 2
Response rate = 40%
Campaign count = 5
48. Uncertainty
To make users comparable suppose both users receive the same count
Alan Bridget
Response rate = 20-
80%
Campaign count = 5
Response rate = 40%
Campaign count = 5
?
?
?
50. Using evidence of open count
We already know Alan opened one e-mail and did not open another
OpenCount >= 1
UnopenCount >= 1 UnopenCount = CampaignCount - OpenCount
51. Generalize to N campaigns
Expected open rate is weighted average of the conditional expected values
M maximum campaign count any user received in the dataset
j number of campaigns that Alan received
53. Final scoring
0.75 * Click-through score + 0.25 * Open score
Best userDistribution of score
54. Globe Visualisation - EARTH HOUR
Problem
Make a sexy visualisation for eye candy
Solution
Pre-canned javascript package!
55. Globe Visualisation
http://datakind-sg.github.io/
- Height of each bar is related to number of Twitter followers of Earth Hour,
and color is related to temperature change from 1970s to 1990s
- WebGL Globe is used: https://www.chromeexperiments.com/globe
- Essentially, you just need to pass in a JSON formatted array with the
following form:
[lat1, long1, height1, colour1, lat2, long2, height2, colour2, ]
- You can use the code as a template: https://github.com/DataKind-
SG/datakind-sg.github.io
56. Globe Visualisation - Colour
- The temperature data is available from the IPCC: http://www.ipcc-data.
org/cgi-bin/ddc_nav/
- This is a (bad) proxy for degree of climate change at each lat/long, and
Earth Hour suggested a broader measure.
- The temp difference between the 1970s and 1990s was scaled to be
between 0 and 3 so that blue corresponds to the biggest decrease in
temp during that period, and red corresponds to the biggest increase in
temp.
- As homework for you change the color map so that there isnt green in
the middle.
57. Globe Visualisation - Height
- The height of each bar is ~ log(1 + n), where n is the number of Twitter
followers at the lat/long (with the exact locations rounded to the nearest
quarter degree in order to bucket).
- So the difficult part is finding the lat/long.
- Twitter profiles have free text for the location, and this needs to be
converted into a lat/long.
- Geocoding hack: try to match with given list of:
cities http://download.geonames.org/export/dump/
or countries https://developers.google.com/public-
data/docs/canonical/countries_csv
58. Geocoding hack
- Code is here https://github.com/oliverxchen/geovis, quasi-pseudo code below.
- Well look at what happens to a few examples of inputs:
- "t: 10.253083,-67.585859"
- "01.350750086, 103.951196586"
- "Bristol, UK"
- "between sky and earth"
- "CALAMBA, laguna, philippines"
- Singapore
- "Xalapa, Veracruz, Mexico"
59. Geocoding hack
The program is basically just a big loop through all of the free text locations and
applying the following in sequence.
A) standardize the string (change to lower case, replace multiple spaces with single spaces)
- "t: 10.253083,-67.585859"
- "01.350750086, 103.951196586"
- "between sky and earth"
- "bristol, uk"
- "calamba, laguna, philippines"
- "singapura"
- "xalapa, veracruz, mexico"
60. Geocoding hack
B) if the string starts with "t:", usually followed by an actual lat/long which can be
directly used
In the example strings,
"t: 10.253083,-67.585859" is mapped to [10.253083,-67.585859]
C) split remaining strings by commas
- ["01.350750086", "103.951196586"]
- ["between sky and earth"]
- ["bristol", "uk"]
- ["calamba", "laguna", "philippines"]
- ["singapura"]
- ["xalapa", "veracruz", "mexico"]
61. Geocoding hack
D) if single string after split and theres no match yet, try to match with country list
- ["singapura"] is matched to "singapore"
- ["between sky and earth"] is not mapped
E) if two strings after split and theres no match yet, try to parse to a lat/long:
- ["01.350750086", "103.951196586"] is mapped to [1.350750086, 103.951196586]
- ["bristol", "uk"]: float parse fails
62. Geocoding hack (cont.)
F) if there isnt a match yet
try to match zeroth string to list of cities
- ["bristol", "uk"] is mapped to "bristol, united states" (Whoops!)
- ["between sky and earth"] is not mapped
- ["calamba", "laguna", "philippines"] is mapped to "calamba, philippines"
- ["xalapa", "veracruz", "mexico"] is not mapped
63. Geocoding hack (cont.)
G) if there still isnt a match yet
try to match the last string to list of countries
- ["between sky and earth"] is not mapped
- ["xalapa", "veracruz", "mexico"] is mapped to "mexico"
H) if still no match, youre out of luck
- ["between sky and earth"]
64. Geocoding hack (cont.)
- To deal with typos and interesting spellings, match to cities and countries is done
using a Jaro-Winkler measure (similar to Levenshtein edit distance, but higher
weight on letters early in the word).
- Largest Jaro-Winkler value is used and needs to be above a threshold to be
considered a match.
- Python package is python-levenshtein
- Other logic to use previous results if standardized strings match
- Many improvements are possible! Eg:
- non-uniqueness of city names is not handled
- splitting on not just commas
- etc.
65. Agenda
1. DataKind Singapore updates
2. DataDive Overview
3. DataDive Sharing: Earth Hour
4. DataDive Sharing: HOME
5. Data handling best practices
66. How many cases does HOME receive each year, segmented
by domestic and non-domestic workers?
- Problems faced
- Dates had to be
reformatted
while reading
into R
- Some outlier
cases had to be
removed (e.g.,
year > 2016)
- Demo on how plot
was created
67. Across nationalities, whats the proportion of workers placed
by each agency?
- Problems faced
- Some country
data was in free
text and had to
be integrated
- Demo on how plot
was created
68. HOME: Worker Salary Data
Whats the salary range for each nationality?
Overall
Domestic Workers
Non-domestic Workers
69. HOME: Worker Salary Data
Key Challenges:
Missing salary
Inconsistent units (per hour/day/week/month) or missing units
Different currencies (SGD, USD, MYR)
Ranges given (eg 6-8)
Converting hours/days/weeks to months
How it was handled:
Using Excel, divide into two columns: values and units
Standardize to monthly salary in third column
Discovered on the second day that a filter hadnt been applied
correctly, so the columns became misaligned quick fix was
applied, but this should be checked.
71. HOME: Worker Salary Data
Key Lessons
One graph may not be sufficient to convey
the intended information.
Be careful of unknowingly comparing
apples to oranges.
72. Postal Code Problem - HOME
- Problem
- Postal Codes were not present in every record.
- Solution
- Use Google maps API, OneMap API, OpenStreetMap API
to map address to Postal Code
73. Postal Code Problem - HOME
- Method
- Retrieve Postal code using 3 APIs.
- Each API returned more than 1 Postal Code for one
address, as each address could map to different Postal
codes.
- Eg {"postalcode": ["089057", "088542", "079716", "088541",
"079911", "079120"], "address": "fuji xerox"}{"postalcode":
["039801", "039594", "039797", "039596", "039799"], "address":
"raffles"}{"postalcode": ["310031", "310035"], "address": "toa
payoh"}
77. Postal Code Problem - HOME
- Problem
- All 3 API may or may not returned same set of Postal
codes.
- Solution
- Use polling method to decide which Postal code to pick.
78. Polling/Voting Algorithm
1. Collect all zips across all data source.
2. Weigh each zip by the number of times it appears in a data source.
3. Select the highest weighted zip.
4. Random select if there multiple highest weighted zips.
5. Sensor integration.
79. Polling/Voting Algorithm (Precisely)
1. Let A := i
Ai
be the set of all unique zips where Ai
is the set of zips from
data source i.
2. For each a A, compute the weight wa
= i Ai
(a).
3. Select the zip a* where a : wa*
wa
.
1. A1
:= (44, 34, 26, 17), A2
:= (34, 45, 17), A3
:= (17)
2. A = (44, 34, 26, 17, 45)
3. w44
= 1, w34
= 2, w45
= 1, w17
= 3
4. a*
= 17
80. Issues
1. Due to privacy concerns, we never saw the full address.
2. This means that we have no feel for the data.
3. Potential for errors.
81. Postal Code Problem - HOME
- Map based visualization
- Problem : Couldnt find a geo data to highlight districts in
Singapore.
- Solution : Find the Center Lat Long of each district and
show the results with a marker.
- Tools : Used leaflet.js for map visualization.
- Geo data for the map was used from openstreet
maps.
82. Postal Code Problem - HOME
source - https://www.ura.gov.sg/realEstateIIWeb/resources/misc/list_of_postal_districts.htm
84. Example DC.js plot.
Interactive Data Visualization
Problem:
HOME may need to do analysis in
future to see if the situation has
improved/changed
Solution:
Build an interactive data
visualization tool to support self-
serviced investigations
Tools:
Used DC.js for data visualization
87. Data anonymization
Problems:
lot of sensitive data:
first order: name, home address, passport number, birthday,
contact number, FIN
second order: current/previous employer nfo, case created
by, agency contact nfo
HOME data had a lot of free text fields that had various level of private
information:
Do you want me to treat you like <NAME>?!
On <exact date>...
His friend, <NAME>, application.
88. Data anonymization
real anonymization:
un-anonymized data should not leave its usual working environment
un-anonymized data should be only handled by authorized users
this requires a highly portable & easy to use utility:
python - what about Windows?
R - dont get me started
compiled CLI utility: so many things can go wrong (apart from which
OS, arch)
browsers are pretty bloated SW products; you can do video editing with it.
https://github.com/DataKind-SG/HOME
90. Agenda
1. DataKind Singapore updates
2. DataDive Overview
3. DataDive Sharing: Earth Hour
4. DataDive Sharing: HOME
5. Data handling best practices
91. Data handling best practices
Break up into small groups for discussions on below topics, and appoint a
spokesperson to tell the larger group your thoughts
Excel is a versatile tool that many people can use, but it has its
drawbacks. Should future DataLearns cover coding to replace Excel?
Should we discourage using Excel or would that discourage some
people from participating?
In the heat of the DataDive, its easy to forget to document the steps
that were taken in data cleaning and transformations. Any ideas on the
fastest, most painless way to document?
Open for any suggestions!