...
Details on each quality check
Is the dataset subnational?
This one is straightforward. National level statistics are by definition excluded from data grid. The extent to which it needs to be subnational is handled further down this list.
How much of the required information specified in the definition does the resource contain?
Look at the sub-category definition and determine if the definition is met fully, partially, or not at all. If a lack of clear field names and/or a data dictionary make it hard to be sure, the dataset can be excluded or included as partially meeting the requirements.
Note: if t complete coverage can be obtained by combining several datasets (for example: several different 3Ws, one for each cluster with all clusters being covered), then all the datasets can be included but marked as "incomplete" with the same comment. The logic here is that someone should be combining these datasets.
Suggested comment language for a partial fit:
- The dataset contains data about A but does not include the expected information about B.
- The dataset appears to contain information about A but the units are not clear from the field name or metadata.
- The dataset contains data about all active clusters except Protection.
Is the resource in an acceptable format?
For tabular data, the the dataset should be tidy in the sense that field names and data rows should be easy to determine. There shouldn't be subtotal rows interspersed with data rows. For a format like xls or xlsx, the required data for a single sub-category should be on the same tab, and if not this should be noted in the comments. For tabular data with coordinates, the x and y columns (usually longitude and latitude) should be in decimal degree format and separated into two columns, and if not, this should be noted in the comments.
For geographic data, the data should be zipped shapefile, geojson, or geodatabase. Other somewhat common formats (kml, kmz, but not raster formats) could be accepted, with a comment.
Suggested comment language for a partial fit:
- The required fields are present, but not consolidated into a single tab.
- Non-data rows (sub-totals) are included within the data.
- The latitude and longitude coordinates are combined in a single field.
- The latitude and longitude coordinates are not presented in decimal degree format.
- The geopackage format is not a preferred format.
Is the resource geographically complete or “as complete as possible”?
This one is trickier than it sounds. There are two ways to assess completeness:
- If there is a comprehensive list of locations to compare against (such as admin units), does the resource provide the necessary information for all admin units at whatever levels are being covered. Missing values for one or more admin divisions can be acceptable if the meaning of a missing value is defined in the metadata somewhere. For example, "districts where cholera has never been recorded are assumed to be 0 and not included in the data". The value of "missing" is defined as having values 0, so the data is complete. However something like "districts not included in the data are still being evaluated" would indicate no value for the missing districts. Such a dataset should be included as "incomplete", but with the gap noted as a comment.
- If there is not a comprehensive list of locations to compare against (such as a list of health facilities or security incidents), does the dataset claim to be complete, or at least "as complete as possible"? If so, the dataset can be included with no comment. However is no such claim is made, or significant caveats are given about incompleteness, then the dataset should be included as "incomplete" with a comment. For example, any OSM extract will almost certainly be considered "incomplete" for this test and OSM does not usually make claims about completeness (though there could be exceptions for some locations where concerted efforts have been made by groups like HOT).
Suggested comment language for a partial fit:
- The data does not appear to cover all admin X units and is therefore assumed to be incomplete.
- Some "no data" values occur, but the meaning of these values not defined in the metadata.
- It is not clear from metadata if this dataset attempts comprehensive coverage and is therefore assumed to be incomplete.
- This dataset is not considered complete by its contributor.
Are location references explicit in the resource or joinable to an available location reference that also appears in the data grid?
For example, does a listing of health facilities explicitly claim to try to be complete or as complete as possible (then yes), or does it have declared gaps noted in the metadata or is it’s completeness ambiguous (then no).
For an indicator broken down by admin units, are all admin units represented (then yes), or if not, is the meaning of their absence explicitly explained in the metadata (then noIf a dataset contains references to location, are those locations defined in the dataset (such as latitude and longitude columns)? If not, then do p-codes or some other identifier make it possible to join this dataset to a location reference that is available in data grid (such as a COD admin boundary or a facilities list)? Datasets with partially successful joins should be included in data grid as incomplete with a comment.
Suggested comment language for a partial fit:
- The data contains references to admin level X, however not all rows successfully join to the available dataset for that admin level.
- The data contains the names of individual locations, but no corresponding dataset defining those locations is available.
If disaggregated by administrative division, does it use the lowest-used level?
If most of the data for the country uses admin level 3, but the dataset in question is only disaggregated to admin level 1, the dataset can be included as "incomplete" with a comment.