Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction

...

Four tables are presented below, one table for quality issues that are common to both community generated and curated datasets, the second table for dataset quality issues unique to community generated datasets, the third table for dataset quality issues unique to curated datasets and the fourth table lists the potential data quality issues that are checked as part of the HDX Quality Assurance Framework (QAF).

A mindmap graphic visualizes these issues:

Table 1. Common Dataset Quality Assurance

...

Procedures 


ID

Potential Quality Issue

When to check

How to check

Corrective action

101

Unrelated item in related items tab

-When a related item is added

Using a script, identify datasets with a related item and evaluate the related item for relevance

Delete related item

Engage responsible user

102

Dataset metadata missing or incomplete

-When dataset is made public

-When dataset is revised

-During routine data QA process

Manual or scripted evaluation of the dataset

Engage org admin

...



ID

Potential Quality Issue

When to check

How to check

Corrective action

201

Dataset has no resources (files or links)

-When dataset is made public

-During routine data QA process

Manual or scripted count of the number of resources in the dataset

Make dataset private

Engage org admin

202

Dataset has a broken resource link

-When dataset is made public

-During routine data QA process

Manual or scripted check of resource link

Engage org admin

Check for 201

203

Dataset contains no relevant humanitarian data

-When dataset is made public

Manual evaluation of  the data

Make dataset private

Engage org admin

204

Dataset contains test data

-When dataset is made public

Manual evaluation of the data

Make dataset private

Engage org admin

205

Dataset contains sensitive data (PII, DII, CII)

-When dataset is made public

Manual evaluation of the data

Make dataset private/ Remove dataset from platform

Engage org admin

Refer for management review

206

Dataset contains inappropriate or otherwise objectionable content

-When dataset is made public

Manual evaluation of the data

Make dataset private

Revoke user editing privileges

Refer for management review

Engage org admin

207

Dataset contains the COD tag

-When dataset is made public

Manual or scripted check for the COD tag in the data

Remove the tag

Engage the data provider

208

Dataset contains individual survey data

-When dataset is made public

Manual evaluation of the data, special check for PII, DII or CII

Look for data dictionary

Make dataset private

Engage the data provider

209

Dataset contains a PDF resource that is not considered metadata

-When dataset is made public

Manual or scripted check for PDF document

Remove PDF or make dataset private if dataset no longer viable

Engage data provider

210

Dataset source is non existent or unclear

-When dataset is made public

Manual evaluation of the dataset

Make private

Engage data provider

Table 3. Quality Assurance Procedures (for Curated Datasets)

...

ID

...

Potential Quality Issue

...

When to check

...

How to check

...

Corrective action

...

301

...

No data in XLXS/CSV file of curated indicator centric dataset

...

-When the curated data for the affected indicator is known to have been updated

-During routine data QA process

...

Manual or scripted check for data on each affected indicator centric dataset

...

Make the dataset private

Investigate data and take appropriate action

...

302

...

No data in XLXS/CSV file of curated country centric dataset

...

-When the curated data for ANY baseline/fts/rw etc indicator is known to have been updated

-During routine data QA process

...

Manual or scripted check for data on each affected country centric dataset

...

Make the dataset private

Investigate data and take appropriate action

...

303

...

All observations in XLXS/CSV file of curated indicator or country centric dataset are zero valued (Similar to no data)

...

-When the curated data for ANY baseline/fts/rw etc indicator is known to have been updated

-During routine data QA process

...

Manual or scripted check for at least one non-zero observation on each affected indicator or country centric dataset

...

Make the dataset private

Investigate data and take appropriate action

...

304

...

Missing indicator centric dataset (may have been deleted or hidden)

...

-When the curated data for ANY baseline/fts/rw etc indicator is known to have been updated

-During routine data QA process

...

Manual or scripted comparison of list of indicators that have curated data with list of indicator centric datasets

...

Create the missing indicator centric dataset or make public if dataset was private

...

305

...

Missing country centric dataset (may have been deleted or hidden)

...

-When a curated data update is made where it is possible that data has become available for a country where that data was not previously available

...

Manual or scripted comparison of the countries listed in the indicator centric dataset data and the country centric datasets for the baseline/fts/rw files

...

Create the missing country centric dataset or make public if dataset was private

...

306

...

Missing indicator in XLXS/CSV file of curated baseline/fts/rw country centric dataset

...

-When a new indicator (baseline/fts/rw etc) is curated

-During routine data QA process

...

Manual or scripted identification of the use of each indicator that has data in at least one country centric baseline/rw/fts dataset

...

Refer to development team

...

307

...

Spelling error

...

-When dataset made public

-When dataset revised

...

Manual evaluation of the dataset

...

Correct identified spelling error

...

308

...

Missing sub-national data from country centric dataset

...

-During routine data QA process

...

Manual evaluation of the dataset

...

Refer to development team

Table 4. Quality Assurance Procedures (HDX QAF)

ID

Potential Quality Issue

When to check

How to check

Corrective action

401

Dataset contains data that is out of range (affects accuracy)

-Before importing curated data

Automatic checking against pre-defined range for the data series

Investigate data and take appropriate action

402

Dataset contains data that changes dramatically from period to period (affects accuracy)

-Before importing curated data

Check data for significant period to period changes

Investigate data and take appropriate action

403

Dataset contains with an unexpected data type (affects accuracy)

-Before importing curated data

Automatic checking against the pre-defined data type for the data series

Investigate data and take appropriate action

404

Dataset contains data that participates in a part/whole relationship and the sum of the parts do not add up to the whole (affects accuracy)

-Before importing the curated data

Automatic checking for variations in values of indicators participating in part/whole relationship by adding up the parts and comparing to the whole

Investigate data and take appropriate action

405

Dataset contains data that is inconsistent among different sources (affects comparability)

-During the curation decision making process

Manual comparison of  data series from different sources

Investigate data and take appropriate action

406

Dataset contains data that is sparse/incomplete (affects accuracy)

-Before importing the curated data

Automatic checking of the number of observations contained in the data against an expected number of observations for the data series

Investigate data and take appropriate action

407

Dataset contains a data series that is a duplication of another data series (affects accuracy)

-Before importing the curated data

Automatic checking for data series with correlations close to unity

Investigate data and take appropriate action

408

Dataset contains outdated data (affects timeliness)

-During routine data QA process

Compare the dataset date of the observations in the data series with the latest data from the source

Update the data

409

Dataset contains data that is not relevant for humanitarians (affects relevance)

-During the curation decision making process

Check if the indicator has been defined in the Common Humanitarian Dataset

Remove the data

410

Dataset contains data without metadata (affects interpretability)

- During routine data QA process

Check that all fields for the predefined metadata are fulfilled

Set dataset to private until metadata is completed

411

Dataset contains data hard to get (affects accessibility)

-During the curation decision making process

The importing process is lengthy

Develop an API or automated process