Important fields
Field | Description | Purpose |
---|---|---|
data_update_frequency | Dataset suggested update frequency | Shows how often the data is expected to be updated or at least checked to see if it needs updating |
revision_last_updated | Resource last modified date | Indicates the last time the resource was updated irrespective of whether it was a major or minorchange |
dataset_date | Dataset date | The date referred to by the data in the dataset. It changes when data for a new date comes to HDX so may not need to change for minor updates |
Approach
- Determine the scope of our problem by calculating how many datasets are locally and externally hosted. Hopefully we can use the HDX to calculate this number.
- Collect frequency of updates based on interns work?
- Define the age of datasets by calculating: Today's date - last modified date
- Compare age with frequency and define the logic: how do we define an outdated dataset
Determining if a Resource is Updated
Number of Files Locally and Externally Hosted
Type | Number of Resources | Percentage | Example |
---|---|---|---|
File Store | 2,102 | 22% | |
CPS | 2,459 | 26% | |
HXL Proxy | 2,584 | 27% | |
ScraperWiki | 162 | 2% | |
Others | 2,261 | 24% | |
Total | 9,568 | 100% |
Classifying the Age of Datasets
Thought has previously gone into classification of the age of datasets and reviewing this . Reviewing that work, the statuses used (up to date, due, overdue and delinquent) and formulae for determining those statuses is sound . Hence, using that work, we haveand so we will build on that foundation:
Update Frequency | Dataset age state thresholds (how old must a dataset be for it to have this status) | |||
---|---|---|---|---|
Up-to-date | Due | Overdue | Delinquent | |
Daily | 0 days old | 1 day old due_age = f | 2 days old overdue_age = f + 2 | 3 days old delinquent_age = f + 3 |
Weekly | 0 - 6 days old | 7 days old due_age = f | 14 days old overdue_age = f + 7 | 21 days old delinquent_age = f + 14 |
Fortnightly | 0 - 13 days old | 14 days old due_age = f | 21 days old overdue_age = f + 7 | 28 days old delinquent_age = f + 14 |
Monthly | 0 -29 days old | 30 days old due_age = f | 44 days old overdue_age = f + 14 | 60 days old delinquent_age = f + 30 |
Quarterly | 0 - 89 days old | 90 days old due_age = f | 120 days old overdue_age = f + 30 | 150 days old delinquent_age = f + 60 |
Semiannually | 0 - 179 days old | 180 days old due_age = f | 210 days old overdue_age = f + 30 | 240 days old delinquent_age = f + 60 |
Annually | 0 - 364 days old | 365 days old due_age = f | 425 days old overdue_age = f + 60 | 455 days old delinquent_age = f + 90 |
Thoughts
...