Table of Contents |
---|
There are various enhancements to HDX that we can consider to improve the user experience, simplify the quality assurance work of Data Partnerships and support the development of dashboards and other visualisations. I want to document these enhancements here as so we consider can think about if and how they might fit into our development plans and wherefor HDX.
Problems
Issues that were already identified
The following issues have been identified:
archiving of "old" datasets (where data is from a long time ago, but is as up to date as it can be eg. ebola 2013) - this is done
tag cleanup - dataset tags should not be freeform, they should be from a fixed list with the facility to request to add more - some of this is done
fixing data URLs so that charts, reports etc. don't break - USAID have asked for this and there is a Jira Epic for it (the Fixed Data URLs Idea goes further than what USAID have asked for)
a workflow that tries to alert a contributor when an update to a resource they are making has unexpected field names, data type changes, etc. - USAID have asked for this and there is a Jira Epic for it
a system whereby automated users (and maybe normal users as well) can register to receive important information about a dataset they are using eg. a breaking change to the format, no longer being updated etc. - USAID have asked for this and there is a Jira Epic for it
we need to be able to distinguish data resources from auxiliary ones - helpful to DP's work on QAing datasets
distinguishing API resources from others
resources can't keep growing indefinitely - we need a way to split a resource once it grows beyond a certain size
keeping a history of data (versioning) - newly added data may contain errors so it may be helpful to be able to fall back to a previous version of the data eg. if a dashboard cannot load latest/xxx.csv, it could try 1/xxx.csv
a service whereby if a contributor uploads data in a particular format/structure that we specify, then the data is served out disaggregated in multiple ways
finding data by API needs to be simpler. Currently the limitation there is the capabilities of the CKAN API search. It can be helped by adding more metadata into the dataset for example the list of fields and HXL tags in the data
more generally a search tailored to what users want to search for eg. if a user types "wheat price kandahar" they would like to get back that price.
How data is currently structured
...
The data cube should be available from the HDX add dataset dialog, perhaps as a third type?
Aggregated data should be provided in standardised form with HXL hashtags (to be defined)
The service potentially generates multiple datasets on HDX
General metadata that will be used for all generated datasets should be added using a UI similar to the add public dataset UI on HDX
The contributor can select to have the full aggregated dataset put into HDX?
The service looks at the HXL hashtags
It determines the columns which are suitable for disaggregation by looking at the HXL hashtags
It offers them as suggestions to the contributor?
If the contributor selects #country or we just disaggregate along every suitable column we detect, we can:
It splits the dataset by country, creating a dataset per country in HDX pointing back to the cube data
The metadata for each dataset is based on the general metadata
It will need to add country information into the dataset title etc.
A similar process can be applied for #indicator etc.