...
Data Structure - Third Analysis
IOM suggested looking at baseline rather than site assessment data.
DTM Cameroon Round III - Baseline data
File: DTM_CMR_Baseline_Analysis_Round3_V3_ED_AA v4.xlsx
This is at an aggregated level and has multiple sheets with data in a main sheet called "dashboard" that has a single line header:
Region | ADM1_Code | Department | ADM2_Code | Arrondissement | ADM3_Code | IDP | HH_IDP | Ind_IDP | Unregistered_Refugees | HH_Unregistered_Refugees | Ind_Unregistered_Refugees | Returnees | HH_Returnees | Ind_Returnees | Population_Left | HH_Population_Left | Ind_Population_Left | Spontaneous_Shelter | HH_Spontaneous_Shelter | Collective_Shelter | HH_Collective Shelter | Host Family_Shelter | HH_Host Family_Shelter | Open_Air_Shelter | HH_Open_Air_Shelter | Rented_Shelter | HH_Rented_Shelter | HH_Repatriated | Ind_Repatriated |
IOM DTM - Mosul crisis Baseline data
File: EmergencyTracking_DTM_IOM_IDPs_Dataset.xls
This is at an aggregated level and has just one sheet named "sheet" with single line header:
Reporting Date | Governorate Of Displacement | District Of Displacement | Subdistrict Of Displacement | Location Name | Latitude | Longitude | Number of IDP Families | Update Date | Governorate of Origin | Distrit of Origin | Subdistrict of Origin | Private Settings | Critical Shelter Arrangements | Camps/Emergency Sites | Unkown Shelter Arrangements | Screening Sites |
Libya - IOM DTM Dataset (June 2016) - Baseline data
File: rd4_DTM_Master_List_Jun2016.xlsx
This is at an aggregated level and has multiple sheets with the main one "1-DTM Round 4 Dataset" containing the two line header:
Shabiya_Name_EN | Baladiya_ID | Baladiya_Name_EN | Baladiya_Name_AR | Lat | Long | is area assessed by DTM? (Y,N) | IDP Households | IDP Individuals | IDP households displaced in 2011 | Type of Displacement 2011 | Baladiya of Origin 2011 | IDP households displaced 2012- mid-2014 | Type of Displacement 2012- mid-2014 | Baladiya of Origin 2012- mid-2014 | IDP households displaced after mid-2014 | Type of Displacement after mid-2014 | Baladiya of Origin after mid-2014 | Migrant Individuals in Baladiya | Migrant Individuals in Detention Centers in Baladiya | Migrant Individuals crossing Baladiya | Returnee Households | Returnee Individuals | households displaced by general violence reasons | households displaced by special security reasons | households displaced by economic Reasons | Area have IDPs in Rented_House_Paid | Area have IDPs in Rented_House_NotPaid | Area have IDPs with Host community - relatives | Area have IDPs with Host community - non-relatives | Area have IDPs in schools | Area have IDPs in Public_Building | Area have IDPs Squatting | Area have IDPs in Unfinished_Building | Area have IDPs in Abandoned_Resorts | Area have IDPs in Collective_NonFormal settlements | Area have IDPs where shelter type is unknown |
ADM2_Shabiya_Name_EN | ADM3_Baladiya_ID | ADM3_Baladiya_Name_EN | ADM3_Baladiya_Name_AR | Latitude | Longitude | Area assessed by DTM | IDPs In Baladiya_HH | IDPs In Baladiya_IND | IDPs In Baladiya HH_2011 | Origin Type 2011 | Origin 2011 | IDPs In Baladiya_HH 2011_2014 | Origin Type 2011_2014 | Origin 2011_2014 | IDPs_In_Baladiya_HH 2014+ | Origin Type 2014+ | Origin 2014+ | Migrants in Baladiya | Migrants in Detention Center | Crossing Migrants | Returnees HH | Returnees Ind | Displacement for violence | Displacement for Security | Displacement for Economic | Rented accommodation (self-pay) | Rented accommodation (paid by others) | Host families who are relatives | Host families who are not relatives | Schools | Other public buildings | Squatting on other people’s properties (e.g. in farms, flats, houses) | In unfinished buildings | In deserted resorts | In Informal Settings (e.g. tents, caravans, makeshift shelters) | Unknown |
Yemen - IOM TFPM DTM Dataset (April 2016) - Baseline data
File: r9_TFPM_Master_List_Apr2016.xlsx
This is at an aggregated level and has multiple sheets with data split across two main sheets:
"IDP Location lvl" containing the single line header:
Assessed Governorate | Assessed District | Site Name | Site Name A | Site PCode | Latitude | Longitude | Year of Displacement | Month of displacement (2015 - 2016) | Governorate of Orgin | District of Orgin | # Conflict IDP HHs | # Conflict IDP Individuals | # Natural Disaster IDP HHs | # Natural Disaster IDP Individuals | Total IDP HHs | Total IDP Individuals | Avg Family Size | Camps | Using Rented Accomodation | With Host Families Who are Relatives (no rent fee) | With Host Families Who are not Relatives (no rent fee) | Using Schools, Health Facilities, Religious Building | Using Private or Public Building | In Informal Settlement (Grouped Families) in Urban Areas | In Informal Settlement (Grouped Families) in Rural Areas | Out of Settlement (Isolated Families) | Main Need | Interview Date | Assessment Round | Source Of Info. |
"Returnees Location lvl" containing the single line header:
Governorate | GovernoratePCode | DistrictEN | DistrictPCode | OfficialPlaceEN | OfficialPlaceArabic | PCode | Latitude | Longitude | Returnees Conflict HHs | Returnees Conflict Individuals | Returnees Disaster HHs | Returnees Disaster Individuals | Total Returnees HHs | Total Returnees Individuals | Source Of info. |
Common Fields
From the headers above, there are certain fields which are common to all. These are:
- Location Name (site name, ADM3_Baladiya_Name_EN etc.)
- Families/Households
Similar Fields
The below fields communicate similar information but are not consistent between the spreadsheets:
- Shelter
Conclusion
Given the lack of any consistency between the DTM Master List spreadsheets, writing a one size fits all automated data checker and cleaner for all of them is challenging. It would involve placing a great deal of "intelligence" into the cleaning program with the possibility that errors are introduced during cleaning for example, by accidentally matching the wrong column heading when making an algorithm to match a very diverse range of names. It may be possible to write cleaners per country but the effort involved would be large.
A better approach is to try to encourage the different offices to use a similar template for their spreadsheets, as from this starting point, writing a cleaner would not be too onerous. If such a template were introduced, it could be HXLated from day one. How easy it would be to invent a template that covers the range of needs is debatable but it is likely to be much easier than trying to process greatly varying spreadsheet formats.
...