...
For now, the main interest is in reusing the technologies underpinning Frictionless. That list has expanded significantly between June and November 2017 with libraries added for all major programming languages. Ones we might use or borrow code from are:
Data Curator
Desktop CSV editor to help describe, validate and share usable open data.
goodtables-py
Validate and process tabular data in Python.
Stenci.la coming soon
The office suite for reproducible research
Import for Google Spreadsheets experimental
Import Tabular Data Packages into Google Spreadsheets.
Data Package Pipelines
Framework for processing data packages in pipelines of modular components.
datapackage-py/js/...
A library for working with Data Packages.
tableschema-py/js/...
A library for working with Table Schema.
tabulator-py
Consistent interface for stream reading and writing tabular data (csv/xls/json/etc).
Where to use Frictionless
There are a few areas where Frictionless can be used.
HDX Utilities library
Tabulator-py is already in use in the HDX Utilities library and through that in the HDX Python API for uploading to the HDX datastore and also in the Chatham House project.
HXL Proxy
Tabulator-py could also be used in the HXL Proxy to replace the stream reading code, the
...
advantages including the consequent reduction in the amount of code to be maintained and that improvements coded by others
...
will automatically be available to the HXL Proxy. The main
...
disadvantages are the time needed to refactor the HXL Proxy to use it and to identify any missing features needed.
Migration Tool for Organisations
Import for Google Spreadsheets could be used to enable organisations to easily move from local Excel spreadsheets to Google Spreadsheets in which we can embed a trigger to determine if the data has changed for freshness purposes.
HDX UI
datapackage-js could be used to enable the export of HDX datasets as Frictionless data packages should the standard take off.
Data Check
goodtables-py, Data Curator and Stenci.la (looks like a cross between Word and Pandas) could provide code and ideas for this tool. This would be the most significant
...
of the areas presented here for where Frictionless could be used in HDX and shapes how much effort should be put into further prototyping. The decision that needs to be made is whether to make improvements to the HXL proxy or to use and probably contribute to Frictionless libraries.
Advantages to using HXL Proxy
- Familiarity
- In house knowledge
- Speed in the beginning