Loading [MathJax]/extensions/AssistiveMML.js
Library Carpentry: OpenRefine
- OpenRefine is ‘a tool for working with messy data’
- OpenRefine works best with data in a simple tabular format
- OpenRefine can help you split data up into more granular parts
- OpenRefine can help you match local data up to other data sets
- OpenRefine can help you enhance a data set with data from other
sources
- Use the
Create Project
option to import data
- You can control how data imports using options on the import
screen
- Several files types may be imported into OpenRefine.
- OpenRefine uses rows and columns to display data
- Most options to work with data in OpenRefine are accessed through a
drop down menu at the top of a data column
- When you select an option in a particular column (e.g. to make a
change to the data), it will affect all the cells in that column
- OpenRefine has a Records mode which links together multiple rows
into a single record
- Split and join multi-valued cells to modify the individual values
within them
- When creating multi-valued cells in your data, choose a separator
that will not appear in the data values
- You can use facets and filters to explore your data
- You can use facets and filters work with a subset of data in
OpenRefine
- You can correct common data issues from a Facet
- Clustering is a way of finding variant forms of the same piece of
data within a dataset (e.g. different spellings of a name)
- There are a number of different Clustering algorithms that work in
different ways and will produce different results
- The best clustering algorithm to use will depend on the data
- Using clustering you can replace varying forms of the same data with
a single consistent value
- You can reorder, rename and remove columns in OpenRefine
- Sorting in OpenRefine always sorts all rows
- The original order of rows in OpenRefine is maintained during a sort
until you use the option to Reorder Rows Permanently from the Sort
drop-down menu
- You can use Undo and Redo to retrace ones’ steps
- You can save and apply a set of steps to a new set of data using the
‘Extract’ and ‘Apply’ features
- OpenRefine can look up custom URLs to fetch data based on what’s in
an OpenRefine project
- Such API calls can be custom built, or one can use existing
Reconciliation services to enrich data
- OpenRefine can be further enhanced by installing extensions