Project:OpenRefine

From CODECS Wikibase
Jump to navigation Jump to search

Reconciliation

  • To create new items, the first column must be used to unqiuely identify each new item. Use: Reconcile > Actions > Create a new item for each cell
  • Do not forget to fill down' before exporting data. Otherwise additional rows will be ignored.
'Gotchas'
  • A column reconciled for Wikidata identifiers (used for Property:P5) still translates to the underlying text values not their identifiers, which is what we need instead! Reconcile > Add identity identifiers column
  • This is not necessary for our target Wikibase, which will translate fine to item identifiers in the QuickStatements output.
  • The autocomplete feature may have issues finding the right match for you
    • For reasons which are unclear, it may help to do a partial match rather than a full match.
    • There may be too many options to choose from and the number of matches in the dropdown is restricted, e.g. for Ballynakill. In some cases it may help to use a different language, e.g. An Choill Naofa for Hollywood, or alternative label.
    • You can also search Wikidata directly, copy the item number (beg. with Q) and paste it in the autocomplete box. OpenRefine should be able to recognise it and show you the appropriate label/description.

Working with arrays

If the cells of a column contains text with delimiters, how can we make sure they are transformed into proper arrays and if necessary, allow for unique reconciliation?

Prepare
  • Go to the column options, "Edit cells" and "Split multi-valued cells". Pick your separator and clean up any trailing whitespace.
  • Each row with multi-valued cells will add new rows below it.
  • OpenRefine has a "rows" and "records" view. Click "records" and check that those additional rows are grouped together under a single record.
Export
  • We are not ready to export to QuickStatements just yet: if we hit 'Export to QuickStatements', all additional rows are simply ignored! According to this comment from Wikiversity, we need to use "Rows" view and "click the arrow above the column that contains your identifier → edit cells → Fill down. This will ensure that all values can be correctly connected to the corresponding items."
  • Click "Export to WikiStatements"
  • Since I don't like what the 'Fill down' option does to my data - the duplication makes for a messier workflow - I am inclined to undo the change after running the export.

GREL

OpenRefine supports the use of GREL to transform and clean up text. Here are just some examples I considered useful recording for future re-uses. For tips and tricks, see /GREL.

Schema

Multi-line records with dedicated qualifiers

How to match up qualifiers to the right one?

Example represented schematically:

P:Has reference :: ÓRiain2010
 [with qualifier] location: 20-32

P:Has reference :: Márkus1999
 [again] location: 67, 89

etc.

Some useful links:

OpenRefine-Wikikbase