Introduction to OpenRefine
- OpenRefine is ‘a tool for working with messy data’
- OpenRefine works best with data in a simple tabular format
- OpenRefine can help you split data up into more granular parts
- OpenRefine can help you match local data up to other data sets
- OpenRefine can help you enhance a data set with data from other sources
Importing data into OpenRefine
- Use the
Create Project
option to import data - You can control how data imports using options on the import screen
- Several files types may be imported into OpenRefine.
Layout of OpenRefine, Rows vs Records
- OpenRefine uses rows and columns to display data
- Most options to work with data in OpenRefine are accessed through a drop down menu at the top of a data column
- When you select an option in a particular column (e.g. to make a change to the data), it will affect all the cells in that column
- OpenRefine has a Records mode which links together multiple rows into a single record
- Split and join multi-valued cells to modify the individual values within them
- When creating multi-valued cells in your data, choose a separator that will not appear in the data values
Faceting and filtering
- You can use facets and filters to explore your data
- You can use facets and filters work with a subset of data in OpenRefine
- You can correct common data issues from a Facet
Clustering
- Clustering is a way of finding variant forms of the same piece of data within a dataset (e.g. different spellings of a name)
- There are a number of different Clustering algorithms that work in different ways and will produce different results
- The best clustering algorithm to use will depend on the data
- Using clustering you can replace varying forms of the same data with a single consistent value
Working with columns and sorting
- You can reorder, rename and remove columns in OpenRefine
- Sorting in OpenRefine always sorts all rows
- The original order of rows in OpenRefine is maintained during a sort until you use the option to Reorder Rows Permanently from the Sort drop-down menu
Introduction to Transformations
- Common transformations are available through the Menu option
Writing Transformations
- You can alter data in OpenRefine based on specific instructions
- You can preview the results of your GREL expression
Transformations - Undo and Redo
- You can use Undo and Redo to retrace ones’ steps
- You can save and apply a set of steps to a new set of data using the ‘Extract’ and ‘Apply’ features
Transforming Strings, Numbers, Dates and Booleans
- You can alter data in OpenRefine based on specific instructions
- You can expand the data editing functions that are built-in into OpenRefine by building your own
Transformations - Handling Arrays
- Arrays cannot appear directly in an OpenRefine cell
- Arrays can be used in many ways using GREL expressions
Exporting data
- You can export your data in a variety of formats
Looking Up Data
- OpenRefine can look up custom URLs to fetch data based on what’s in an OpenRefine project
- Such API calls can be custom built, or one can use existing Reconciliation services to enrich data
- OpenRefine can be further enhanced by installing extensions