HDX Resource Date Coverages, Timelines and Fixed URLs

Consider how a timeline and resource date coverages would work for the following datasets (taken from HDX Enhancements):

  1. Dataset containing data in xlsx and csv formats as separate resources eg. https://data.humdata.org/dataset/afghan-voluntary-repatriation
    We need to encourage the removal of 2019 from the dataset title. How? 
    The 2 resources would have the same coverage date range (2019).
    The latest url would work ok as there are 2 formats so:  
    https://data.humdata.org/dataset/afghan-voluntary-repatriation/latest/download/data.xlsx
    https://data.humdata.org/dataset/afghan-voluntary-repatriation/latest/download/data.xlsx

  2. Dataset with rolling updates of resource (ie. dataset end date should be DATE) eg. https://data.humdata.org/dataset/inso-key-data-dashboardhttps://data.humdata.org/dataset/indonesia-monthly-humanitarian-update
    The resource will have a coverage from the start date to an end date of DATE
    The timeline would reflect the rolling end date.
    The latest url would work ok

  3. Dataset with metadata in resource eg. https://data.humdata.org/dataset/global-airports
    Two of the resources will have a coverage from the start date to an end date of DATE as well as a latest url. 
    The timeline would reflect the rolling end date.
    One resource (metadata) won't have coverage and won't have a latest url.

  4. Dataset with tiff in a zip: https://data.humdata.org/dataset/malawi_national_vulnerability_index_2015 (note the 2015 in the url is incorrect as it is current)
    This resource updates each year (it is overwritten). Do we:
    1. Rely on the contributor remembering to update the coverage end date when they update the resource each year (which doesn't work well at the moment)
    2. On overwriting a resource, prompt contributors to consider the date coverage period (however this won't help for remote urls that get updated)
    3. Have a value of DATE that is updated automatically but which due to the annual nature of the data means "up until the end of the previous year"

  5. Dataset with pdfs, zips (on OneDrive and filestore), mbtiles, tiff : https://data.humdata.org/dataset/iom-npm-cox-bazar-uav-imagery
    There are many resources in this dataset.
    Many have dates in filenames - should we discourage this somehow (same problem at dataset level)?
    If filename gets auto populated, it may well fill in with a date - do we already need to consider changing filename to resource title?
    As many are maps, the coverage date is more like a date of validity as it is a single day.
    There will be many single day coverage periods in the timeline - how to make it easy to understand? 

  6. Dataset with JSON feed, HXLated JSON feed and xlsx (from automated output): https://data.humdata.org/dataset/migrant-deaths-by-month
    Resource coverage periods will be 2017. There should be new resources for 2018 and 2019.
    Latest url won't work as there are 2 JSON files.
    If we change filename to resource title there is a greater chance of consistent naming enabling latest to differentiate between resources

  7. Disaggregate by country into datasets and by indicator into resources eg. https://data.humdata.org/dataset/who-data-for-barbados
    Issue same as 4. Latest url is not relevant for this dataset.

  8. Disaggregate by date into datasets  eg. https://data.humdata.org/dataset/syria-idp-flow-and-returnee-data-october-2018https://data.humdata.org/dataset/syria-idp-flow-and-returnee-data-september-2018
    These should now be in one dataset with new resources for each month (which is the date coverage). 

  9. Disaggregate by date into resources within one dataset eg. https://data.humdata.org/dataset/nigeria-humanitarian-needs-overview
    Instead of dates in the filename, there will be coverage dates.

  10. Disaggregate by indicator into datasets eg. https://data.humdata.org/dataset/gender-development-index-female-to-male-ratio-of-hdi
    Coverage date is 2013

  11. Disaggregate by country into datasets and by date and region into resources eg. https://data.humdata.org/dataset/drc-displacement-data-baseline-assessment-iom-dtm
    There are dates in the resource filenames which would become coverage dates.
    Latest url would correspond to a region rather than being the latest for all regions. Until we can group by region, nothing much can be done.

  12. Disaggregate by country into datasets and by round into resources eg. https://data.humdata.org/dataset/nigeria-baseline-data-iom-dtm
    Each round corresponds to a date coverage period.
    Latest url is latest round.

  13. Disaggregate by country and emergency into datasets and by round into resources eg. https://data.humdata.org/dataset/indonesia-displacement-data-sulawesi-earthquake-site-assessment-iom-dtm
    No problems with this one

  14. Map data for a country at different admin levels for various dates eg. https://data.humdata.org/dataset/administrative-boundaries-of-bangladesh-as-of-2015 (note the 2015 in the url is incorrect as it is current)
    Issues same as 5

  15. Map and population data for a country with varying file formats and metadata in a pdf eg. https://data.humdata.org/dataset/bhutan-administrative-level-0-1-population-statistics
    Date should be removed from filename.
    Latest url should work because of different file types.

  16. National and subnational data per set of indicators per country eg. https://feature-data.humdata.org/dataset/dhs-data-for-democratic-republic-of-the-congo
    As with other scraper made datasets, the scraper will need updating to try to make coverage dates per resource rather than calculating per dataset.
    Latest url will be a problem for this dataset.


Related pages