Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

 

Data Structure - Third Analysis

IOM suggested looking at baseline rather than site assessment data.

Yemen - IOM TFPM DTM Dataset (April 2016)

DTM Cameroon Round III - Baseline data

DTM Cameroon Round III - Baseline data

File: DTM_CMR_Baseline_Analysis_Round3_V3_ED_AA v4.xlsx

This is at an aggregated level and has multiple sheets with data in a main sheet called "dashboard" that has a single line header:


RegionADM1_CodeDepartmentADM2_CodeArrondissementADM3_CodeIDPHH_IDPInd_IDPUnregistered_RefugeesHH_Unregistered_RefugeesInd_Unregistered_RefugeesReturneesHH_ReturneesInd_ReturneesPopulation_LeftHH_Population_LeftInd_Population_LeftSpontaneous_ShelterHH_Spontaneous_ShelterCollective_ShelterHH_Collective ShelterHost Family_ShelterHH_Host Family_ShelterOpen_Air_ShelterHH_Open_Air_ShelterRented_ShelterHH_Rented_ShelterHH_RepatriatedInd_Repatriated


IOM DTM - Mosul crisis Baseline data

File: EmergencyTracking_DTM_IOM_IDPs_Dataset.xls

This is at an aggregated level and has just one sheet named "sheet" with single line header:

Reporting DateGovernorate Of DisplacementDistrict Of DisplacementSubdistrict Of DisplacementLocation NameLatitudeLongitudeNumber of IDP FamiliesUpdate DateGovernorate of OriginDistrit of OriginSubdistrict of OriginPrivate SettingsCritical Shelter ArrangementsCamps/Emergency SitesUnkown Shelter ArrangementsScreening Sites



Libya - IOM DTM Dataset (June 2016) - Baseline data

File: rd4_DTM_Master_List_Jun2016.xlsx

This is at an aggregated level and has multiple sheets with the main one "1-DTM Round 4 Dataset" containing the two line header:

Shabiya_Name_ENBaladiya_IDBaladiya_Name_ENBaladiya_Name_ARLatLongis area assessed by DTM? (Y,N)IDP HouseholdsIDP IndividualsIDP households displaced in 2011Type of Displacement 2011Baladiya of Origin 2011IDP households displaced 2012- mid-2014Type of Displacement 2012- mid-2014Baladiya of Origin 2012- mid-2014IDP households displaced after mid-2014Type of Displacement after mid-2014Baladiya of Origin after mid-2014Migrant Individuals in BaladiyaMigrant Individuals in Detention Centers in BaladiyaMigrant Individuals crossing BaladiyaReturnee HouseholdsReturnee Individualshouseholds displaced by general violence reasonshouseholds displaced by special security reasonshouseholds displaced by economic ReasonsArea have IDPs in Rented_House_PaidArea have IDPs in Rented_House_NotPaidArea have IDPs with Host community - relativesArea have IDPs with Host community - non-relativesArea have IDPs in schoolsArea have IDPs in Public_BuildingArea have IDPs SquattingArea have IDPs in Unfinished_BuildingArea have IDPs in Abandoned_ResortsArea have IDPs in Collective_NonFormal settlementsArea have IDPs where shelter type is unknown
ADM2_Shabiya_Name_ENADM3_Baladiya_IDADM3_Baladiya_Name_ENADM3_Baladiya_Name_ARLatitudeLongitudeArea assessed by DTMIDPs In Baladiya_HHIDPs In Baladiya_INDIDPs In Baladiya HH_2011Origin Type 2011Origin 2011IDPs In Baladiya_HH 2011_2014Origin Type 2011_2014Origin 2011_2014IDPs_In_Baladiya_HH 2014+Origin Type 2014+Origin 2014+Migrants in BaladiyaMigrants in Detention CenterCrossing MigrantsReturnees HHReturnees IndDisplacement for violenceDisplacement for SecurityDisplacement for EconomicRented accommodation (self-pay)Rented accommodation (paid by others)Host families who are relativesHost families who are not relativesSchoolsOther public buildingsSquatting on other people’s properties (e.g. in farms, flats, houses)In unfinished buildingsIn deserted resortsIn Informal Settings (e.g. tents, caravans, makeshift shelters)Unknown


Yemen - IOM TFPM DTM Dataset (April 2016) - Baseline data

File: r9_TFPM_Master_List_Apr2016.xlsx

This is at an aggregated level and has multiple sheets with data split across two main sheets:

"IDP Location lvl" containing the single line header:

Assessed GovernorateAssessed DistrictSite NameSite Name ASite PCodeLatitudeLongitudeYear of DisplacementMonth of displacement (2015 - 2016)Governorate of OrginDistrict of Orgin# Conflict IDP HHs# Conflict IDP Individuals# Natural Disaster IDP HHs# Natural Disaster IDP IndividualsTotal IDP HHsTotal IDP IndividualsAvg Family SizeCampsUsing Rented AccomodationWith Host Families Who are Relatives (no rent fee)With Host Families Who are not Relatives (no rent fee)Using Schools, Health Facilities, Religious BuildingUsing Private or Public BuildingIn Informal Settlement (Grouped Families) in Urban AreasIn Informal Settlement (Grouped Families) in Rural AreasOut of Settlement (Isolated Families)Main NeedInterview DateAssessment RoundSource Of Info.

"Returnees Location lvl" containing the single line header:

GovernorateGovernoratePCodeDistrictENDistrictPCodeOfficialPlaceENOfficialPlaceArabicPCodeLatitudeLongitudeReturnees Conflict HHsReturnees Conflict IndividualsReturnees Disaster HHsReturnees Disaster IndividualsTotal Returnees HHsTotal Returnees IndividualsSource Of info.


Common Fields

From the headers above, there are certain fields which are common to all. These are:

  1. Location Name (site name,  ADM3_Baladiya_Name_EN etc.)
  2. Families/Households

Similar Fields

The below fields communicate similar information but are not consistent between the spreadsheets:

  1. Shelter

Conclusion

Given the lack of any consistency between the DTM Master List spreadsheets, writing a one size fits all automated data checker and cleaner for all of them is challenging. It would involve placing a great deal of "intelligence" into the cleaning program with the possibility that errors are introduced during cleaning for example, by accidentally matching the wrong column heading when making an algorithm to match a very diverse range of names. It may be possible to write cleaners per country but the effort involved would be large.
A better approach is to try to encourage the different offices to use a similar template for their spreadsheets, as from this starting point, writing a cleaner would not be too onerous. If such a template were introduced, it could be HXLated from day one. How easy it would be to invent a template that covers the range of needs is debatable but it is likely to be much easier than trying to process greatly varying spreadsheet formats.

Appendix: IOM Global DTM information

The following files contain a data dictionary and partial sample for the in-progress global DTM, as supplied by IOM in Geneva.

View file
nameDTM_Data_Code_Mar_Oct_15_Cleaned_for-HDX.xlsx
height250
View file
nameIndicators for HXL.xlsx
height250

This Google Sheet contains proposed HXL hashtags for the global DTM:  httpshttps://docs.google.com/spreadsheets/d/1gifTnrz9A2fZ8Tuwg-EClsFvu4QEbC4OUtdgb61dXGs/edit?usp=sharing sharing