Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This document outlines the roadmap for the Freshness project.

Q4 2017

1. Dev Team: CKAN API change to allow rapid touching of datasets

Since it was proposed that freshness directly alter the HDX CKAN instance data, it needs a way to touch resources. Currently touching resources using resource_patch is very slow. A new method of fixing of resource_patch is needed. 

Jira Legacy
serverSystem JIRA
serverIdefab48d4-6578-3042-917a-8174481cd056
keyHDX-5579

2. Mike: Measure impact of the sending of overdue emails

Look at datasets for which an overdue email has been sent and which ones have a subsequent update and compare with period prior to the time emails were sent.   This will give us a rough estimate of the impact of the overdue emails. See Number of Dataset Updates before and after introduction of Overdue email.

3. CJ, Dev Team: Establish Mixpanel measures for freshness

Create events and funnel created in MixPanel for tracking use of freshness workflow.

4. Design, Dev Team: Expose per-dataset freshness info and tools to users via interface

Design and implement an indication of freshness in the HDX interface (and api). Key decisions:

...

  • What to expose?  Only mark "fresh"?  Overdue?  By how much? Delinquent?  
  • How to represent in the interface?
  • Sorting and/or filtering by freshness?
  • Add a "Make Fresh" button for data contributors?  If we do this, we may want to implement as an api call so that contributors could click a link in the overdue email to have the same effect.
  • Related to this is this old Jira about making revision_last_updated visible in the UI  
    Jira Legacy
    serverJIRA (humanitarian.atlassian.net)
    serverIdefab48d4-6578-3042-917a-8174481cd056
    keyHDX-4894
    .

5. Data Partnerships Team: Define freshness workflow for handling delinquent datasets and ones with broken urls

What is the policy for dealing with delinquent datasets?  What tools are needed to support DP team's work, such as:

  • a freshness dashboard listing delinquent datasets and tracking contacts
  • an overall freshness metric(s) so we can monitor freshness as an OKR (% fresh, %  overdue, % delinquent)
  • creating issues directly in Zoho

6. Mike: Implement emailer for broken urls 

Construct SQL queries and write Python code for emailer of broken urls.

T1 2018

7. Expose high level freshness metrics to HDX users

Consider exposing:

  • Overall HDX Freshness.  "HDX is 73% fresh today"
  • Per-org Freshness.  We could show an overall metric for an org to all users or just to org members.

Further ideas

  • As data freshness collects a lot of metadata, it could be used for more general reporting. If needed, the list of metadata collected could be extended. 
  • Even for datasets which have an update frequency of "never", there could be an argument for a very rare mail reminder just to confirm data really is static.
  • For the case where data is unchanged and we have sent an overdue email, we should give the option for contributors to respond directly to the automated mail to say so (perhaps by clicking a button in the message).
  • Another way is to provide guidance to data contributors so that as they consider how to upload resources, we steer them towards a particular technological solution that is helpful to us eg. using a Google spreadsheet with our update trigger added.  We could investigate a fuller integration between HDX and Google spreadsheets so that if a data provider clicks a button in HDX, it will create a resource pointing to a spreadsheet in Google Drive with the trigger set up that opens automatically once they enter their Google credentials. We may need to investigate other platforms for example creating document alerts in OneDrive for Business and/or macros in Excel spreadsheets (although as noted earlier, this might create a support headache).