Important fields
Field | Description | Purpose |
---|---|---|
data_update_frequency | Dataset suggested update frequency | Shows how often the data is expected to be updated or at least checked to see if it needs updating |
revision_last_updated | Resource last modified date | Indicates the last time the resource was updated irrespective of whether it was a major or minorchange |
dataset_date | Dataset date | The date referred to by the data in the dataset. It changes when data for a new date comes to HDX so may not need to change for minor updates |
Approach
- Determine the scope of our problem by calculating how many datasets are locally and externally hosted. Hopefully we can use the HDX to calculate this number.
- Collect frequency of updates based on interns work?
- Define the age of datasets by calculating: Today's date - last modified date
- Compare age with frequency and define the logic: how do we define an outdated dataset
Determining if a Resource is Updated
Number of Files Locally and Externally Hosted
Type | Number of Resources | Percentage | Example |
---|---|---|---|
File Store | 2,102 | 22% | |
CPS | 2,459 | 26% | |
HXL Proxy | 2,584 | 27% | |
ScraperWiki | 162 | 2% | |
Others | 2,261 | 24% | |
Total | 9,568 | 100% |
Classifying the Age of Datasets
Thought has previously gone into classification of the age of datasets and reviewing this work, the statuses used (up to date, due, overdue and delinquent) and formulae for determining those statuses is sound. Hence, using that work, we have:
Update Frequency | Dataset age state thresholds (how old must a dataset be for it to have this status) | |||
---|---|---|---|---|
Up-to-date | Due | Overdue | Delinquent | |
Daily | 0 days old | 1 day old due_age = f | 2 days old overdue_age = f + 2 | 3 days old delinquent_age = f + 3 |
Weekly | 0 - 6 days old | 7 days old due_age = f | 14 days old overdue_age = f + 7 | 21 days old delinquent_age = f + 14 |
Fortnightly | 0 - 13 days old | 14 days old due_age = f | 21 days old overdue_age = f + 7 | 28 days old delinquent_age = f + 14 |
Monthly | 0 -29 days old | 30 days old due_age = f | 44 days old overdue_age = f + 14 | 60 days old delinquent_age = f + 30 |
Quarterly | 0 - 89 days old | 90 days old due_age = f | 120 days old overdue_age = f + 30 | 150 days old delinquent_age = f + 60 |
Semiannually | 0 - 179 days old | 180 days old due_age = f | 210 days old overdue_age = f + 30 | 240 days old delinquent_age = f + 60 |
Annually | 0 - 364 days old | 365 days old due_age = f | 425 days old overdue_age = f + 60 | 455 days old delinquent_age = f + 90 |
Thoughts
...