Transforming Strings, Numbers, Dates and Booleans
Last updated on 2023-05-26 | Edit this page
Estimated time: 20 minutes
Overview
Questions
- How do I use transformations to programmatically edit my data?
- How do I transform the various data types?
Objectives
- Name and describe 4 types of data - String, Number, Date and Boolean
- Transform dates for further analysis
- Use Boolean to identify information recorded in a different format
- Create and run transformations based on Boolean Values
Data types
Understanding data types can help you write a wider variety of transformations using GREL.
Data types in OpenRefine
Every piece of data in OpenRefine has a ‘type’. The most common ‘type’ is a ‘string’ - that is a piece of text. However there are other data types available and transformations let you convert data from one type to another where appropriate. The data types supported are:
- String
- Number
- Date
- Boolean
- Array (covered in the next lesson)
Dates and Numbers
So far we’ve been looking only at ‘String’ type data. Much of the time it is possible to treat numbers and dates as strings. For example in the Date column we have the date of publication represented as a String. However, some operations and transformations only work on ‘number’ or ‘date’ typed data, such as sorting values in numeric or date order. To carry out these functions we need to convert the values to a date or number first.
Reformat the Date
- Make sure you remove all Facets and Filters
- On the Date column use the dropdown menu to select
Edit cells -> Transform
- In the ‘Expression’ box type the GREL expression
value.toDate("dd/MM/yyyy")
and press OK. - Note how the values are now displayed in green and follow a standard convention for their display format (ISO 8601) - this indicates they are now stored as date data types in OpenRefine. We can now carry out functions that are specific to Dates
- On the Date column dropdown select
Edit column->Add column based on this column
. Using this function you can create a new column, while preserving the old column - In the ‘New column name’ type “Formatted-Date”
- In the ‘Expression’ box type the GREL expression
value.toString("dd MMMM yyyy")
Specifying Date Formatting in GREL Expressions
GREL allow us to specify date and time using
pattern strings
, which are letters that have some specific
representation in the function call.
Pattern strings are case sensitive, therefore capital and lower case letters have a different meaning and usage.
The table below shows letters related to date and time representation.
Letter | Date or Time Representation |
---|---|
y |
Year |
M |
Month in year |
D |
Day in year |
d |
Day in month |
F |
Day of week in month |
E |
Day name in week |
u |
Day number of week |
a |
AM/PM marker |
The table below presents examples on how to use the patterns as input and the obtained output.
Date and Time Pattern Input | Output |
---|---|
"yyyy-MM-dd" |
2022-06-05 |
"dd MMM yyyy" |
05 Jun 2022 |
"EEE, MMM d, ''yy" |
Mon, Jun 5, ’22 |
"yyyy.MMMM.dd hh:mm a" |
2022.June.05 12:10 PM |
"EEE, d MMM yyyy HH:mm:ss" |
Mon, 5 Jun 2022 12:10:10 |
For a more detailed explanation checkout OpenRefine Documentation.
Booleans
A ‘Boolean’ is a binary value that can either be ‘true’ or ‘false’. Boolean values can be used directly in OpenRefine cells, but are more often used in transformations as part of a GREL expression. For example the GREL expression
value.contains("test")
generates a boolean value of either ‘true’ or ‘false’ depending on whether the current value in the cell contains the text ‘test’ anywhere.
Such tests can be combined with other GREL expressions to create more
complex transformations. For example, to carry out a further
transformation only if a test is successful. The GREL transformation
if(value.contains("test"),"Test data",value)
replaces a
cell value with the words “Test data” only if the value in the
cell contains the string “test” anywhere.
Key Points
- You can alter data in OpenRefine based on specific instructions
- You can expand the data editing functions that are built-in into OpenRefine by building your own