Overview
...
Spatial data requires additional processing to tabular data. The following provides information about what data problems to look for and how to fix them. For specific information about CODs please see: Common Operational Dataset Processing page, COD-PS Standards and Process , COD-AB Standards and Process
...
The six themes outlined below should be considered before using and disseminating geographic data. If the data do not meet the criteria defined by these themes, and/or the data cannot be cleaned to meet these criteria, the sources for these data should be reviewed. If there is no other option to correct the problems, these issues should be documented in the metadata.
- They have a known source: data should not be used if the source is unknown because there is no guarantee of the verification of the data or the appropriate permission to use the data.
- They are complete in geographic scope: data need to span the entire country(s)/region(s) of interest. See example below of an incomplete dataset. See example in Figure 3. In this case, more research may be done to see if data are available which span the entire country.
- They have complete and accurate attribute information: if data do not have information about each geographic feature, there is an increased risk for the data to be used incorrectly. See more specific details on how much attribute information are is needed under the Data Cleaning topic.
- They have a known projection: Unknown or incorrect coordinate reference systems (including datum and projection) can prevent the data from being overlaid properly with other sources of geographic information and incorrect spatial analysis. If the data’s coordinate reference system is unknown, refer to the source of the data to see if the original coordinate reference system can be determined.
- They are up-to-date and relevant to the current situation: the information associated with the data must be up-to-date OR useful to the situation for analysis. See example below of administrative boundaries not reflecting the current situation. However, if updated data are ot not available, out of date data are better than none, but the problem should be documented in the metadata record.
- They have correct topology: the spatial properties of the data must be accurate for the data to be used correctly. See the example of topological errors in a polygon file in Figure 3. Topology is checked differently for polygon, arc and point files.
Point Files: all points are generally in the correct location. Two examples of files that do not pass the topology check are 1) a file where a type error was made in the latitude and/or longitude field(s) of the file and the point is not in the correct location or 2) the location for a populated place is obviously incorrect (e.g. located in the ocean or incorrect administrative unit).
Polygon and Arc Files: no gaps and/or overlaps between the lines that make up the arcs or polygons are present in the data.
Data Cleaning
...
The following are the most common types of processing that needs to be done. More information about spatial data through the COD material see the resource section.
...
The coordinate reference system (CRS), often referred to as the “projection” of a given dataset, is simply a defined set of values that describe how to interpret the X and Y coordinates stored in a dataset. For example, if we have a shapefile containing only one point and it’s its coordinates are X=34.2 and Y=45.7, we have no idea if these are degrees of latitude and longitude, meters in UTM, or state plane feet. It is the coordinate reference system that specifies degrees, meters, or feet as well as other necessary parameters such as the origin or various projection parameters.
...
- Projection (or possibly unprojected in the case of latitude-longitude coordinates) which defines how the spherical coordinates are projected to planar coordinates.
- Datum which defines the mathematical model of the Earth’s shape that is used in the CRS. Commonly this is WGS84, however, in some regions, other systems may be more commonly used.
If the CRS definition is missing from the dataset, many GIS softwares software will assume that it is in geographic (lat/long) based on the WGS84 datum. If this assumption is wrong, the data will not align with other datasets having correct CRS definitions.
...
External information is information which can be stored in a table and linked to the data using a unique ID. This information may be time sensitive (e.g. demographic data or a specific thematic such as % food insecure population, security risk, etc). The information should be stored in tables which share a unique ID to the geometry. Joins and relates can be used to link these data when needed. Keeping these data external to the geodataset geo-dataset avoids the inclusion of obsolete data in the geographic dataset.
...
Geodatabases allow for long attribute names, which is useful , but may become problematic when exporting to formats that do not support long filenames. Ideally, attribute names should be limited to 10 characters and not begin with numbers or contain spaces. The alias function in ArcCatalog can be used to assign a longer and more descriptive name to the attribute that will be visible in ArcGIS applications.
...
Example: Populated place gazetteer
In many cases, there is a separate shapefile or feature class associated with the different types of populated places. Consider the following:
Shapefile/Feature Class 1: National Capital
Shapefile/Feature Class 2: Administrative Capitals
Shapefile/Feature Class 3: Cities with population greater than 100,000
Shapefile/Feature Class 4: Cities with a population between 50,000 and 99,999
Shapefile/Feature Class 5: Cities with a population less than 49,999
Ideally, these separate layers are combined into one feature class for all populated places including small towns, large cities, administrative capitals, national capitals, etc. These files can easily be merged by creating an empty feature class with the following schema (essential attributes):
...
In this case, controlled vocabulary for the feature type defines the type of populated place (e.g. national capital, the administrative capital, etc.) OCHA does not have a defined schema for any particular dataset , but follows standards used by partners and the providers of the data.
...
In order to ensure data are used correctly and accurately in geospatial analysis and cartographic representation, topological errors should be fixed and features developed prior to use. Instructions on how to identify and repair common types of topological errors can be found in How to Check and Repair Topology using ArcGIS.
Create polyline from a polygon
It is recommended that all data repositories include polygons AND polylines of administrative and international boundaries. Boundaries should be represented with the polyline file and background landmass and labels should be represented with polygons. Preparing data repositories with both of these files in advance allows for quick map creation , and proper cartographic representation. Ideally, the line versions of the administrative boundaries should have the coastlines removed.
...
The polyline layer should always be created from same source and data that will be used for the polygon layer. A simple method for creating a line from polygon file in ArcGIS is outline outlined below
How to create polylines from polygons in ArcGIS
- Create the line file from the polygon in ArcCatalog
- Open ArcCatalog
- Right-click on the polygon layer and select Export > to coverage
- Maximize the coverage to see the arc, label, polygon, region and tic. Right-click on the arc and select Export > to Shapefile (single) or geodatabase
- Remove the outer border in ArcMap
- Open ArcMap
- Add the arc shapefile created from the coverage
- Go to Editor, and select > Start Editing
- Go to Selection in the main panel, and select Select by Attributes
- Set query “RIGHTPOLY” = 1
...
- Select Apply and OK
- You will see the outline highlighted. Now select Delete on your keyboard, and the outline is deleted
At any administrative level, boundaries that coincide with boundaries at a higher level in the hierarchy should be removed. The outer boundary is removed so that it does not conflict with the boundary of the administrative unit at a higher level. For example, first administrative level boundaries will be encompassed by international boundaries, and international boundaries will be encompassed by coastlines.
Formats
...
Consider the way in which spatial data is shared. The format it is shared in may impact who can use it (e.g. non GIS people can use tabular data). For details on how to change spatial data formats see: Steps For Data Format Conversion
...