際際滷

際際滷Share a Scribd company logo
Grabbing Director Data
Scoda openrefine-directordata
Scoda openrefine-directordata
Scoda openrefine-directordata
Scoda openrefine-directordata
Scoda openrefine-directordata
Scoda openrefine-directordata
Scoda openrefine-directordata
forEach(value.parseJson()['results']['company']['officers'], v
, [v.officer.id, v.officer.name, v.officer.position, v.officer.star
t_date, v.officer.end_date].join('::')).join('||')
Scoda openrefine-directordata
Scoda openrefine-directordata
Scoda openrefine-directordata
Scoda openrefine-directordata
Scoda openrefine-directordata
(It would probably make sense to rename the newly created columns.)
Starting with the first row,
Fill down will fill blank rows
in a column with the value
In the preceding row

(so we can fill down company names and ID
columns for each corresponding director)
SchoolOfData.org

More Related Content

More from Tony Hirst (20)

PPTX
Ili 16 robot
Tony Hirst
PDF
Jupyternotebooks ou.pptx
Tony Hirst
PDF
Virtual computing.pptx
Tony Hirst
PPTX
ouseful-parlihacks
Tony Hirst
PDF
Gors appropriate
Tony Hirst
PPTX
Gors appropriate
Tony Hirst
PPTX
Robotlab jupyter
Tony Hirst
PDF
Fco open data in half day th-v2
Tony Hirst
PPTX
Notes on the Future - ILI2015 Workshop
Tony Hirst
PPTX
Community Journalism Conf - hyperlocal data wire
Tony Hirst
PPTX
Residential school 2015_robotics_interest
Tony Hirst
PPTX
Data Mining - Separating Fact From Fiction - NetIKX
Tony Hirst
PPTX
Week4
Tony Hirst
PPTX
A Quick Tour of OpenRefine
Tony Hirst
PPTX
Conversations with data
Tony Hirst
PPTX
Data reuse OU workshop bingo
Tony Hirst
PPTX
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Tony Hirst
PDF
Lincoln jun14datajournalism
Tony Hirst
PPTX
Lincoln Journalism Research Day - Data Journalism
Tony Hirst
PDF
Calrg14 tm351
Tony Hirst
Ili 16 robot
Tony Hirst
Jupyternotebooks ou.pptx
Tony Hirst
Virtual computing.pptx
Tony Hirst
ouseful-parlihacks
Tony Hirst
Gors appropriate
Tony Hirst
Gors appropriate
Tony Hirst
Robotlab jupyter
Tony Hirst
Fco open data in half day th-v2
Tony Hirst
Notes on the Future - ILI2015 Workshop
Tony Hirst
Community Journalism Conf - hyperlocal data wire
Tony Hirst
Residential school 2015_robotics_interest
Tony Hirst
Data Mining - Separating Fact From Fiction - NetIKX
Tony Hirst
Week4
Tony Hirst
A Quick Tour of OpenRefine
Tony Hirst
Conversations with data
Tony Hirst
Data reuse OU workshop bingo
Tony Hirst
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Tony Hirst
Lincoln jun14datajournalism
Tony Hirst
Lincoln Journalism Research Day - Data Journalism
Tony Hirst
Calrg14 tm351
Tony Hirst

Recently uploaded (20)

PDF
Simplify Your FME Flow Setup: Fault-Tolerant Deployment Made Easy with Packer...
Safe Software
PPTX
Paycifi - Programmable Trust_Breakfast_PPTXT
FinTech Belgium
PDF
Scaling i.MX Applications Processors Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
PDF
Supporting the NextGen 911 Digital Transformation with FME
Safe Software
PDF
How to Visualize the Spatio-Temporal Data Using CesiumJS
SANGHEE SHIN
PDF
Pipeline Industry IoT - Real Time Data Monitoring
Safe Software
PDF
Bridging CAD, IBM TRIRIGA & GIS with FME: The Portland Public Schools Case
Safe Software
PPTX
Mastering Authorization: Integrating Authentication and Authorization Data in...
Hitachi, Ltd. OSS Solution Center.
PDF
Bitkom eIDAS Summit | European Business Wallet: Use Cases, Macroeconomics, an...
Carsten Stoecker
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
PPTX
2025 HackRedCon Cyber Career Paths.pptx Scott Stanton
Scott Stanton
PDF
Kubernetes - Architecture & Components.pdf
geethak285
PPTX
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
PDF
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) 際際滷s
Ravi Tamada
PPTX
Smart Factory Monitoring IIoT in Machine and Production Operations.pptx
Rejig Digital
PDF
Lets Build Our First Slack Workflow! .pdf
SanjeetMishra29
PDF
Optimizing the trajectory of a wheel loader working in short loading cycles
Reno Filla
PDF
''Taming Explosive Growth: Building Resilience in a Hyper-Scaled Financial Pl...
Fwdays
PPTX
Enabling the Digital Artisan keynote at ICOCI 2025
Alan Dix
PPSX
Usergroup - OutSystems Architecture.ppsx
Kurt Vandevelde
Simplify Your FME Flow Setup: Fault-Tolerant Deployment Made Easy with Packer...
Safe Software
Paycifi - Programmable Trust_Breakfast_PPTXT
FinTech Belgium
Scaling i.MX Applications Processors Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
Supporting the NextGen 911 Digital Transformation with FME
Safe Software
How to Visualize the Spatio-Temporal Data Using CesiumJS
SANGHEE SHIN
Pipeline Industry IoT - Real Time Data Monitoring
Safe Software
Bridging CAD, IBM TRIRIGA & GIS with FME: The Portland Public Schools Case
Safe Software
Mastering Authorization: Integrating Authentication and Authorization Data in...
Hitachi, Ltd. OSS Solution Center.
Bitkom eIDAS Summit | European Business Wallet: Use Cases, Macroeconomics, an...
Carsten Stoecker
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
2025 HackRedCon Cyber Career Paths.pptx Scott Stanton
Scott Stanton
Kubernetes - Architecture & Components.pdf
geethak285
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) 際際滷s
Ravi Tamada
Smart Factory Monitoring IIoT in Machine and Production Operations.pptx
Rejig Digital
Lets Build Our First Slack Workflow! .pdf
SanjeetMishra29
Optimizing the trajectory of a wheel loader working in short loading cycles
Reno Filla
''Taming Explosive Growth: Building Resilience in a Hyper-Scaled Financial Pl...
Fwdays
Enabling the Digital Artisan keynote at ICOCI 2025
Alan Dix
Usergroup - OutSystems Architecture.ppsx
Kurt Vandevelde
Ad

Scoda openrefine-directordata

Editor's Notes

  • #2: A recipe for grabbing director information from OpenCorporatesusing OpenRefinegiven an OpenCorporates company ID or OpenCorporates company page URL For more information, contact: schoolOfData.org
  • #3: Heres the start of thing were starting with a list of companies
  • #4: Heres the sort of thing we want lists of directors associated with each company (where that information is available).
  • #5: The first step is to create a web address/URL to call the OpenCorporates API and ask it for data about a particular company. OpenRefine can create a new column populated with the contents of calls made to a URL contained in, or generated from, another column.
  • #6: The URLs should take the form:http://api.opencorporates.com/companies/JURISDICTION/COMPANY_IDIf you already have company page URLs in a column, add column based on that column using:value.replace(http://,http://api)If you have JURISDICTION/COMPANY_ID in a column, use the formula:http://api.opencorporates.com/companies/+value
  • #7: The data comes back as JSON data, which we will need to process.Each JSON result contains the data for a single company. The data relating to the directors can be found as a list down the path value.parseJson()['results']['company']['officers]
  • #8: Lets parse the JSON data an put the directors information into another column
  • #9: What we are aiming for is a contrivance based on the form:32866743::SIMON ALAN CONSTANT-GLEMAS::director::2010-04-07::null32866744::KARIN JACQUELINE HAWKINS::director::2006-01-17::2012-02-2232866745::ANDREW WILLIAM LONGDEN::director::2003-11-03::nullwhere we list director ID, name, position, appointment date, termination date.
  • #10: This function will parse the data into string with the form:32866743::SIMON ALAN CONSTANT-GLEMAS::director::2010-04-07::null||32866744::KARIN JACQUELINE HAWKINS::director::2006-01-17::2012-02-22||32866745::ANDREW WILLIAM LONGDEN::director::2003-11-03::null||The function reads as follows: for each officer, join their ID, name, position, start date and end data with ::, then join each of these director descriptions using ||.The use of two different and hopefully unique delimiters means we can split the data on each delimiter type separately.
  • #11: The parsed data is put into a new column in this combined list form.
  • #12: We can then split the data so that we create a new row for each director using the delimiter we defined: ||
  • #13: Note that values from the other columns will not be copied into any newly created rows we will have to do that ourselves either now, or later.
  • #14: For each director, we now want to split their details out across several columns, one for each data field (ID, name, position, appointment date, termination date).
  • #15: We can do this by splitting on the other separator type we used: ::
  • #16: The newly created columns are labeled with automatically generated names. It would probably make sense to rename them to something slightly more convenient.
  • #17: Finally, we can do a little more tidying. For any columns we want to export, such as company name, or company ID, we can Fill down using the corresponding values from the original row the directors information was pulled from.
  • #18: If you want to know more, contact us