際際滷

際際滷Share a Scribd company logo
蠖蠑碁 一危 企
Week3
OpenRefine
Chapter1
Getting started with OpenRefine
OpenRefine
1 Diving into OpenRefine
OpenRefine
OpenRefine
1. http://openrefine.org/download.html 1. http://openrefine.org/download.html
2. Mac kit れ企
3. dmg  願 OpenRefine 伎
application 启襦 蠏
4. ろ
2. ZIP  れ企蠍
3. OpenRefine ろ
OpenRefine
OpenRefine Google Chrome襷 ろ 螳
Internet Explorer  X
OpenRefine
2.Introducing OpenRefine
OpenRefine
覦 一危磯ゼ る蠍一 螻殊 蟲
Free & OpenSource
Desktop based
Facet朱 一危一  れ螳  螳
れ input & output 一危 襷
一危 覲 
API襯 牛 一危 覲 螳
OpenRefine
Google Refine Spreadsheets Databases
1. Batch editing of rows and
columns possible 
2. Used for exploring and
transforming data. 
3. No Schema Required 
4. Data is always visible 
at each step of editing. 
5. More interactive and 
visual. 
1. Editing of one cell at a time. 
2. Used for entering data and
performing calculations,
functions. 
3. No Schema Required 
4. Data is always visible 
5. Visual is not impressive. 
1. Schema and programming 
language required for editing. 
2. Data is out of sight unless 
script is run to view it. 
觜訣 蟲り骸 觜蟲
OpenRefine
2.Creating & Setting a new project
襦 襦 襷り鍵
OpenRefine
Facebook Group
sample_data.zip襯
れ企 覦譯殊語
OpenRefine
Create Project 殊 Next
Creating & Setting a new project
1
2
3
れ旧 : RatesSweden.csv
OpenRefine
一危    誤 螻
CSV, TSV れ螻 覓語 語
∬係襴螻 襦 語蟆 れ
襦碁 れ
 Main setting
OpenRefine
 Check List
 Row 螳 
  伎 覈 螳 row襯 覲 蟆語?
 previous / next 伎 れ
 Open Project
 Export
OpenRefine
3 Correct bad formatting
覈視  覦襦♀鍵
replace(orginal value, replaceable text, new text)
To Number
Numeric Facet
 : RatesSweden.csv
OpenRefine
replace(orginal value, replaceable text, new text)
To Number
Numeric Facet
OpenRefine
 rate 蟆曙 狩螳  襷豺襦 蟲覿蟆 覲危ク
 蠏碁る 企至 覯 狩襷 襷豺襦 覦蠖  蟾?
 Edit cells - transform
replace(value, , , .)
OpenRefine
replace(value, ,, .)
OpenRefine
replace(orginal value, replaceable text, new text)
To Number
Numeric Facet
OpenRefine
Edit Cells - Common Transforms - To number
∬螳 蟆 轟朱 覲伎 襦 語
OpenRefine
replace(orginal value, replaceable text, new text)
To Number
Numeric Facet
OpenRefine
Numeric Facet襯 牛 一危一  
Scale 譟一 牛  蟲螳  覦 sorting
OpenRefine
4 Correct Misspellings
覈視 る 覦襦♀鍵
Trim
To Uppercase or To lowercase
Clustering
Merge columns
 : slategundeath.csv
OpenRefine
Trim
To Uppercase or To lowercase
Clustering
Merge columns
OpenRefine
CSV朱 伎
OpenRefine
Facet - Text facet
facet襯 牛 city 一危
OpenRefine
狩 一危郁 るジ 一危磯 語
 伎郁鍵螳  蠍 覓語
狩 一危磯 語貅 譴  (trim)
OpenRefine
city - Edit cells
value.trim()
OpenRefine
city - Edit cells - value.trim()
Albuquerque襯 觜襦 伎郁鍵螳  螳れ 襴 
Albuquerque 螳螳 2,071螳 -> 2,063襦 覦
OpenRefine
Trim
To Uppercase or To lowercase
Clustering
Concatenation
OpenRefine
覩瑚記 54螳 譯?
覩瑚記 燕 D.C襯 豎 51螳 伎 
ろ伎れ手 /覓語襦 誤 るジ 一危磯 語
∬係 覦 るジ 譯朱 /覓語 蟲覲 讌  一危
OpenRefine
Edit cells - Common transform - To uppercase
覓語襦 旧
1
2
3
OpenRefine
51螳 譯朱
OpenRefine
Trim
To Uppercase or To lowercase
Clustering
Concatenation
OpenRefine
Oklahoma City vs. Oklahoma city
Cluster and edit 覃企ゼ 牛 企ろ磯
OpenRefine 伎 螻襴讀  覿
1
2
3
OpenRefine
企ろ磯 牛 螳れ 覲伎譴
燕,  煙 覿覦覯朱 覿
蟆 一危一   , 覲伎朱
OpenRefine
Trim
To Uppercase or To lowercase
Clustering
Concatenation
OpenRefine
覩瑚記 譯朱 豐蠍一蟇伎 觜蟲
9 貊朱る 72蟇
∬骸 蠏碁願?
OpenRefine
狩  企 Columbus 企 譯朱 OH, GA, IN るゴ
讀  襷讌  豐蠍一蟇伎螳 豺伎危  蟆朱   
∬係る, city + state襯 覲  蟲 襷れ伎
OpenRefine
City - Edit cells - transform
value + , + cells[state].value
OpenRefine
Text facet朱 れ 覿
 蟆郁骸 襴  覿
OpenRefine
5 Invalid values and duplicates
讌  螳螻 譴覲糾
 : (re)titanic.csv
一危一  危企 蠏狩蠍
る一危一  伎 覿螻
OpenRefine
 : (re)titanic.csv
一危一  危企 蠏狩蠍
る一危一  伎 覿螻
OpenRefine
Titanic 轟 覲 一危 る 蟾?
一 sex facet襯 牛 一危
OpenRefine
炎骸 煙 覃?
∬鍵覲語朱 一危 る手 豢
OpenRefine
煙 企  Miss
煙朱 螳譯
OpenRefine
female企手
OpenRefine
 : (re)titanic.csv
一危一  危企 蠏狩蠍
る一危一  伎 覿螻
OpenRefine
Facet - Numeric facet
Scale 覯 譟一朱 る 一危 ′願鍵
490 る 一危
49伎企手 ロ伎  蟆 覈 ロり 
49企  , number襦 れ
OpenRefine
6 Exporting a project
襦 企慨願鍵
OpenRefine
TSV
CSV
HTML table
Excel
MQL Write
OpenRefine
Practice
hokuk.xls 殊 openrefine  覲伎語
Concatenation, value.round(), trim() 煙
OpenRefine
Ad

Recommended

[week4] Cleaning data with openrefine2
[week4] Cleaning data with openrefine2
neuroassociates
[Week6] Getting started with R
[Week6] Getting started with R
neuroassociates
[Week1] 求メ メ午梶 メ午過
[Week1] 求メ メ午梶 メ午過
neuroassociates
[Week10] R graphics
[Week10] R graphics
neuroassociates
[week14] Getting started with D3.js
[week14] Getting started with D3.js
neuroassociates
Bloter レろ 襴讀 れ触 螳襭
Bloter レろ 襴讀 れ触 螳襭
neuroassociates
蠖磯 D3.js螳 2譯殊姶
蠖磯 D3.js螳 2譯殊姶
Songyi Lim
D3.js workshop
D3.js workshop
Anton Katunin
[Week4] Google refine
[Week4] Google refine
neuroassociates
RHive tutorial 4: RHive 過求メ 4 - UDF, UDTF, UDAF
RHive tutorial 4: RHive 過求メ 4 - UDF, UDTF, UDAF
Aiden Seonghak Hong
3 一危 讌, 覿蟾讌
3 一危 讌, 覿蟾讌
Hyochan PARK
矧-姻鞄庄厩艶
矧-姻鞄庄厩艶
Yunsu Lee
一危磯螻殊襴讀 覿蟾讌
一危磯螻殊襴讀 覿蟾讌
Gee Yeon Hyun
PyCon Korea 2015: 朱 一危 覿蠍
PyCon Korea 2015: 朱 一危 覿蠍
Hyeshik Chang
[PyCon KR 2018] 企ゼ 譴企 Data Feature る蠍
[PyCon KR 2018] 企ゼ 譴企 Data Feature る蠍
Joeun Park
12. Application - Python + Pandas
12. Application - Python + Pandas
merry7
誤蠏碁曙 一危磯螻 襴讀 3 一危一讌, 覿蟾讌
誤蠏碁曙 一危磯螻 襴讀 3 一危一讌, 覿蟾讌
Han Woo PARK
Reproducible research(2)
Reproducible research(2)
蟇伎 覓
r project_pt2
r project_pt2
Joonho Lee
the art of data science_ eda
the art of data science_ eda
Chisung Song
RHive tutorial 2: RHive 求メ 2 - 給蓋
RHive tutorial 2: RHive 求メ 2 - 給蓋
Aiden Seonghak Hong
R 襦蠏碁 危伎 v1.1
R 襦蠏碁 危伎 v1.1
happychallenge
RHive tutorial 5: RHive 過求メ 5 - apply 襷給Μ
RHive tutorial 5: RHive 過求メ 5 - apply 襷給Μ
Aiden Seonghak Hong
≡求 AI 求梶 釈_ks0014_求戟п釈.pdf
≡求 AI 求梶 釈_ks0014_求戟п釈.pdf
yeongkikim2
Hive 覓 覦 襭
Hive 覓 覦 襭
beom kyun choi
豕覦レ 朱企 襷ろ 企
豕覦レ 朱企 襷ろ 企
R project_pt1
R project_pt1
Joonho Lee
4.representing data and engineering features
4.representing data and engineering features
Haesun Park
[Week20] D3.js_Mapping
[Week20] D3.js_Mapping
neuroassociates
[week17] D3.js_Tooltip
[week17] D3.js_Tooltip
neuroassociates

More Related Content

Similar to [Week3] clean & correct data with OpenRefine (20)

[Week4] Google refine
[Week4] Google refine
neuroassociates
RHive tutorial 4: RHive 過求メ 4 - UDF, UDTF, UDAF
RHive tutorial 4: RHive 過求メ 4 - UDF, UDTF, UDAF
Aiden Seonghak Hong
3 一危 讌, 覿蟾讌
3 一危 讌, 覿蟾讌
Hyochan PARK
矧-姻鞄庄厩艶
矧-姻鞄庄厩艶
Yunsu Lee
一危磯螻殊襴讀 覿蟾讌
一危磯螻殊襴讀 覿蟾讌
Gee Yeon Hyun
PyCon Korea 2015: 朱 一危 覿蠍
PyCon Korea 2015: 朱 一危 覿蠍
Hyeshik Chang
[PyCon KR 2018] 企ゼ 譴企 Data Feature る蠍
[PyCon KR 2018] 企ゼ 譴企 Data Feature る蠍
Joeun Park
12. Application - Python + Pandas
12. Application - Python + Pandas
merry7
誤蠏碁曙 一危磯螻 襴讀 3 一危一讌, 覿蟾讌
誤蠏碁曙 一危磯螻 襴讀 3 一危一讌, 覿蟾讌
Han Woo PARK
Reproducible research(2)
Reproducible research(2)
蟇伎 覓
r project_pt2
r project_pt2
Joonho Lee
the art of data science_ eda
the art of data science_ eda
Chisung Song
RHive tutorial 2: RHive 求メ 2 - 給蓋
RHive tutorial 2: RHive 求メ 2 - 給蓋
Aiden Seonghak Hong
R 襦蠏碁 危伎 v1.1
R 襦蠏碁 危伎 v1.1
happychallenge
RHive tutorial 5: RHive 過求メ 5 - apply 襷給Μ
RHive tutorial 5: RHive 過求メ 5 - apply 襷給Μ
Aiden Seonghak Hong
≡求 AI 求梶 釈_ks0014_求戟п釈.pdf
≡求 AI 求梶 釈_ks0014_求戟п釈.pdf
yeongkikim2
Hive 覓 覦 襭
Hive 覓 覦 襭
beom kyun choi
豕覦レ 朱企 襷ろ 企
豕覦レ 朱企 襷ろ 企
R project_pt1
R project_pt1
Joonho Lee
4.representing data and engineering features
4.representing data and engineering features
Haesun Park
[Week4] Google refine
[Week4] Google refine
neuroassociates
RHive tutorial 4: RHive 過求メ 4 - UDF, UDTF, UDAF
RHive tutorial 4: RHive 過求メ 4 - UDF, UDTF, UDAF
Aiden Seonghak Hong
3 一危 讌, 覿蟾讌
3 一危 讌, 覿蟾讌
Hyochan PARK
矧-姻鞄庄厩艶
矧-姻鞄庄厩艶
Yunsu Lee
一危磯螻殊襴讀 覿蟾讌
一危磯螻殊襴讀 覿蟾讌
Gee Yeon Hyun
PyCon Korea 2015: 朱 一危 覿蠍
PyCon Korea 2015: 朱 一危 覿蠍
Hyeshik Chang
[PyCon KR 2018] 企ゼ 譴企 Data Feature る蠍
[PyCon KR 2018] 企ゼ 譴企 Data Feature る蠍
Joeun Park
12. Application - Python + Pandas
12. Application - Python + Pandas
merry7
誤蠏碁曙 一危磯螻 襴讀 3 一危一讌, 覿蟾讌
誤蠏碁曙 一危磯螻 襴讀 3 一危一讌, 覿蟾讌
Han Woo PARK
Reproducible research(2)
Reproducible research(2)
蟇伎 覓
r project_pt2
r project_pt2
Joonho Lee
the art of data science_ eda
the art of data science_ eda
Chisung Song
RHive tutorial 2: RHive 求メ 2 - 給蓋
RHive tutorial 2: RHive 求メ 2 - 給蓋
Aiden Seonghak Hong
R 襦蠏碁 危伎 v1.1
R 襦蠏碁 危伎 v1.1
happychallenge
RHive tutorial 5: RHive 過求メ 5 - apply 襷給Μ
RHive tutorial 5: RHive 過求メ 5 - apply 襷給Μ
Aiden Seonghak Hong
≡求 AI 求梶 釈_ks0014_求戟п釈.pdf
≡求 AI 求梶 釈_ks0014_求戟п釈.pdf
yeongkikim2
豕覦レ 朱企 襷ろ 企
豕覦レ 朱企 襷ろ 企
R project_pt1
R project_pt1
Joonho Lee
4.representing data and engineering features
4.representing data and engineering features
Haesun Park

More from neuroassociates (20)

[Week20] D3.js_Mapping
[Week20] D3.js_Mapping
neuroassociates
[week17] D3.js_Tooltip
[week17] D3.js_Tooltip
neuroassociates
[week16] D3.js_Transition
[week16] D3.js_Transition
neuroassociates
[week8] 一危一曙伎朱
[week8] 一危一曙伎朱
neuroassociates
[Week15] D3.js_Scatter_Chart
[Week15] D3.js_Scatter_Chart
neuroassociates
[Week14] D3.js_Scale and Axis (覲伎匡襭)
[Week14] D3.js_Scale and Axis (覲伎匡襭)
neuroassociates
[Week14] D3.js_Scale and Axis
[Week14] D3.js_Scale and Axis
neuroassociates
[Week13] D3.js_Bar Chart
[Week13] D3.js_Bar Chart
neuroassociates
[Week12] D3.js_Basic2
[Week12] D3.js_Basic2
neuroassociates
[week7] 一危一曙伎朱
[week7] 一危一曙伎朱
neuroassociates
[week6] 一危一曙伎朱
[week6] 一危一曙伎朱
neuroassociates
[week12] D3.js_Basic
[week12] D3.js_Basic
neuroassociates
[week11] R_ggmap, leaflet
[week11] R_ggmap, leaflet
neuroassociates
[week9]R_statics
[week9]R_statics
neuroassociates
[Week8]R_ggplot2
[Week8]R_ggplot2
neuroassociates
[week7]R_Wrangling(2)
[week7]R_Wrangling(2)
neuroassociates
[week6]R_Wrangling
[week6]R_Wrangling
neuroassociates
畏安艶艶一5液悌δ議ツ求衣估メ估畍孃導悌=戟估
畏安艶艶一5液悌δ議ツ求衣估メ估畍孃導悌=戟估
neuroassociates
[Week5]R_scraping
[Week5]R_scraping
neuroassociates
畏安艶艶一4液悌δ議ツ求衣估メ估畍孃導悌=戟估
畏安艶艶一4液悌δ議ツ求衣估メ估畍孃導悌=戟估
neuroassociates
[Week20] D3.js_Mapping
[Week20] D3.js_Mapping
neuroassociates
[week17] D3.js_Tooltip
[week17] D3.js_Tooltip
neuroassociates
[week16] D3.js_Transition
[week16] D3.js_Transition
neuroassociates
[week8] 一危一曙伎朱
[week8] 一危一曙伎朱
neuroassociates
[Week15] D3.js_Scatter_Chart
[Week15] D3.js_Scatter_Chart
neuroassociates
[Week14] D3.js_Scale and Axis (覲伎匡襭)
[Week14] D3.js_Scale and Axis (覲伎匡襭)
neuroassociates
[Week14] D3.js_Scale and Axis
[Week14] D3.js_Scale and Axis
neuroassociates
[Week13] D3.js_Bar Chart
[Week13] D3.js_Bar Chart
neuroassociates
[Week12] D3.js_Basic2
[Week12] D3.js_Basic2
neuroassociates
[week7] 一危一曙伎朱
[week7] 一危一曙伎朱
neuroassociates
[week6] 一危一曙伎朱
[week6] 一危一曙伎朱
neuroassociates
[week11] R_ggmap, leaflet
[week11] R_ggmap, leaflet
neuroassociates
[week7]R_Wrangling(2)
[week7]R_Wrangling(2)
neuroassociates
畏安艶艶一5液悌δ議ツ求衣估メ估畍孃導悌=戟估
畏安艶艶一5液悌δ議ツ求衣估メ估畍孃導悌=戟估
neuroassociates
畏安艶艶一4液悌δ議ツ求衣估メ估畍孃導悌=戟估
畏安艶艶一4液悌δ議ツ求衣估メ估畍孃導悌=戟估
neuroassociates
Ad

[Week3] clean & correct data with OpenRefine