Introduction
...
Four tables are presented below, one table for quality issues that are common to both community generated and curated datasets, the second table for dataset quality issues unique to community generated datasets, the third table for dataset quality issues unique to curated datasets and the fourth table lists the potential data quality issues that are checked as part of the HDX Quality Assurance Framework (QAF).
A mindmap graphic visualizes these issues:
...
Table 1. Common Dataset Quality Assurance Procedures
ID | Potential Quality Issue | When to check | How to check | Corrective action |
101 | Unrelated item in related items tab | -When a related item is added | Using a script, identify datasets with a related item and evaluate the related item for relevance | Delete related item Engage responsible user |
102 | Dataset metadata missing or incomplete | -When dataset is made public -When dataset is revised -During routine data QA process | Manual or scripted evaluation of the dataset | Engage org admin |
ID | Potential Quality Issue | When to check | How to check | Corrective action |
201 | Dataset has no resources (files or links) | -When dataset is made public -During routine data QA process | Manual or scripted count of the number of resources in the dataset | Make dataset private Engage org admin |
202 | Dataset has a broken resource link | -When dataset is made public -During routine data QA process | Manual or scripted check of resource link | Engage org admin Check for 201 |
203 | Dataset contains no relevant humanitarian data | -When dataset is made public | Manual evaluation of the data | Make dataset private Engage org admin |
204 | Dataset contains test data | -When dataset is made public | Manual evaluation of the data | Make dataset private Engage org admin |
205 | Dataset contains sensitive data (PII, DII, CII) | -When dataset is made public | Manual evaluation of the data | Make dataset private/ Remove dataset from platform Engage org admin Refer for management review |
206 | Dataset contains inappropriate or otherwise objectionable content | -When dataset is made public | Manual evaluation of the data | Make dataset private Revoke user editing privileges Refer for management review Engage org admin |
207 | Dataset contains the COD tag | -When dataset is made public | Manual or scripted check for the COD tag in the data | Remove the tag Engage the data provider |
208 | Dataset contains individual survey data | -When dataset is made public | Manual evaluation of the data, special check for PII, DII or CII Look for data dictionary | Make dataset private Engage the data provider |
209 | Dataset contains a PDF resource that is not considered metadata | -When dataset is made public | Manual or scripted check for PDF document | Remove PDF or make dataset private if dataset no longer viable Engage data provider |
210 | Dataset source is non existent or unclear | -When dataset is made public | Manual evaluation of the dataset | Make private Engage data provider |
...