This document is intended as a collection of procedures and resources to guide the curation of Data Completeness instances (henceforth, Data Grids) which can be activated for any location page on HDX (by a sysadmin). This document and others linked from it, should evolve to capture best practices and any other useful info leanred as the data grid curators do their work.
Once activated for a given location page, the Data Grid will appear and will be using a default recipe (based on tags) to fill the data grid. However, tags are seldom enough to accurately gauge if a dataset meets the requirements of a given data grid. Curation, then, is the process of customizing a specific location's data grid so that the datasets included in the data grid meet the defined requirements for the subcategory. That customization is done by editing the recipe yaml file (which is format that is friendly to both humans and machines).
Resources
- Procedure document (this document)
- Data Completeness Definitions Document
- Quality Checklist (below)
- YAML editing examples (below)
- Github Repository
Process Overview
The basic curation process is outlined below:
Data Grid Instances to be Curated
There may be more on the feature server for testing purposes, but the ones listed below should be the only active ones on the production server.
Quality Checks Process
Each dataset that is a candidate for data grid has to be evaluated to determine if it fully meets the requirements to be included, partially meets the requirements, or does not meet them at all. The outcome determines what actions have to be taken in the YAML file to inlcude or exclude the file, and any comments to be recorded for users to understand where the dataset falls short. Below the process diagram, you will find more details on each quality check.
Details on each quality check
How much of the required information specified in the definition does the resource contain?
Is the resource in an acceptable format?
Is the resource geographically complete or “as complete as possible”?
Are location references explicit in the resource or joinable to an available location reference that also appears in the data grid?
For example, does a listing of health facilities explicitly claim to try to be complete or as complete as possible (then yes), or does it have declared gaps noted in the metadata or is it’s completeness ambiguous (then no).
For an indicator broken down by admin units, are all admin units represented (then yes), or if not, is the meaning of their absence explicitly explained in the metadata (then no