Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. New Microdata is Added to the Platform: When a user uploads a resource on the HDX platform, they are asked to indicate if the resource contains microdata. We also manually verify whether a resource contains microdata as part of the standard quality assurance process that we perform on all new resources added to the platform. 

  2. Perform Quality Assurance Checks: Next, we perform a set of quality assurance checks, which includes assessing a review of the data to determine whether the dataset includes microdata or other potentially sensitive information. If we determine there is no the HDX team detects sensitive information, we mark the dataset 'under review' and perform a disclosure risk assessment. We also notify the contributor via HDX and email at this stage to let them know that their data will be made available on HDX. Otherwise, the dataset will remain ‘under review’ will we perform our disclosure risk assessment. temporarily be unavailable for download while we complete our assessment.

  3. Assess Disclosure Risk: If we determine that there is potentially sensitivity data in the file, the next step is the disclosure risk assessment. This is done using sdcMicro. Our team will first develop disclosure scenarios, then select key variables and finally run the disclosure risk assessment. If the dataset has a ‘global risk’ of less than three percent, then it is deemed safe to share on HDX and will be taken out of review and made public on the platform. If the global risk is higher than 3% we will talk to the contributor about how to proceed. It is important to note that there are a few different methods for quantifying disclosure risk -  global risk is just one of them. While our threshold for sharing data on HDX is a global risk of under three percent, we also look at the individual risk scores to ensure that no individual has a particularly high risk of disclosure. You can learn more about these methods and details on all the steps of the process by following our Disclosure Risk Assessment Learning Path.

  4. Inform Contributor: If we determine that the global risk of re-identification is above our threshold we will contact the contributor via email to share the results of the assessment to discuss how to reduce the disclosure risk. We may have recommendations for how they can use disclosure control techniques to reduce the disclosure risk, or, in some cases, may advise that they forgo this the SDC process to only and instead only share the metadata and make the data full dataset available ‘by request’

  5. Applying SDC: There are a number of perturbative and non-perturbative methods that can be used to reduce the risk of disclosure in data. Through perturbative methods, the data value is altered in order to create uncertainty about what the true value is. On the other hand, through non-perturbative methods, the data’s structure Non-perturbative methods the goal is to reduce the detail of the data by suppressing individual values or combining values by creating intervals or brackets (ie age from 19 → ‘18 - 25’). 

  6. Re-Assessing Risk and Quantifying Information Loss: Applying these methods will necessarily result in information loss. Through this process, our goal is to find a balance between limiting risk and maximising the utility of the data. Therefore, after the techniques have been applied and we are sure that we have successfully lowered the risk, we also assess the data utility of the treated data by quantifying the information loss. In some cases, if the steps we need to take to reduce the disclosure risk of the data result in too much information loss, we may advise that you share the original data ‘by request’ rather than share the treated data publicly. 

  7. Sharing Data Via HDX Connect: Finally, if we determine that data cannot be shared publicly, we provide the option to share only the metadata and make the dataset available ‘on request’. This option allows data contributors to control whether and how to share their data.

...