Page Comparison

...

Page Contents

...

Table of Contents

...

exclude	(All Pages\|Page Contents)

Getting Started

Introduction

The HDX Python Library is designed to simplify using the HDX JSON API to enable you to easily develop code that interacts with the Humanitarian Data Exchange platform which is built on top of the CKAN . The underlying GET and POST requests are wrapped in Python methodsopen-source data management system. The major goal of the library is to make pushing and pulling data from HDX as simple as possible for the end user. There are several ways this is achieved. It provides a simple interface that communicates with HDX using the CKAN Python API, a thin wrapper around the CKAN JSON API. The HDX objects, such as datasets and resources, are represented by Python classes. The API documentation can be found here: http://mcarans.github.io/hdx-python-api/This should make the learning curve gentle and enable users to quickly get started with using HDX programmatically.

You can jump to the Getting Started page or continue reading below about the purpose and design philosophy of the library.

...

Keeping it Simple

The

...

The library avoids CKAN syntax instead using HDX terminology. Hence there is no reference to CKAN related items, only gallery items.
library hides CKAN's idiosyncrasies and tries to make the library match the HDX user interface experience. The user does not need to learn about CKAN and the library makes it easier to understand what will be the result in HDX when calling a Python method.
The class structure of the library should be as logical as possible (within the restrictions of the CKAN API it relies on). In HDX, a dataset can contain zero or more resources and
a gallery (consisting of gallery items
it can be in one or more showcases (which themselves can contain more than one dataset), so the library reflects this even though the
CKAN
showcase API
presents a different interface for gallery items to resources.
The UML diagram below shows the relationships between the major classes in the library.

DrawiobaseUrlhttps://humanitarian.atlassian.net/wikidiagramNameClasseswidth601pageId6356996height421revision3comes from a plugin and is not part of the core CKAN API.
Datasets, resources and gallery items showcases can use dictionary methods like square brackets to handle metadata which feels natural. (The HDXObject class extends UserDict.) eg.
Code Block
dataset['name'] = 'My Dataset'
Static metadata can be imported from a YAML file, recommended for being very human readable, or a JSON file eg.
Code Block
dataset.update_yaml([path])
Static metadata can be passed in as a dictionary on initialisation of a dataset, resource or gallery item showcase eg.
Code Block
dataset = Dataset(
configuration,
{
{
'name': slugified_name,
'title': title,
'dataset_date': dataset_date, # has to be MM/DD/YYYY
'groups': iso
})

The code is very well documented. Detailed API documentation (generated from Google style docstrings using Sphinx) can be found in the Introduction above.

def load_from_hdx(self, id_or_name: str) -> bool:
    """Loads the dataset given by either id or name from HDX

    Args:
        id_or_name (str): Either id or name of dataset

    Returns:
        bool: True if loaded, False if not

"""

The method arguments and return parameter have type hints. (Although this is a feature of Python 3.5, it has been backported.) Type hints enable sophisticated IDEs like PyCharm to warn of any inconsistencies in using types bringing one of the major benefits of statically typed languages to Python.
```
def merge_dictionaries(dicts: List[dict]) -> dict:
```
gives:
Image Removed
Default parameters mean that there is a very easy default way to get set up and going eg.
def update_yaml(self, path: Optional[str] = join('config', 'hdx_dataset_static.yml')) -> None:
})
There are functions to help with adding more complicated types like dates and date ranges, locations etc. eg.
Code Block
dataset.set_date_of_dataset('START DATE', 'END DATE')
There are separate country code and utility libraries that provide functions to handle converting between country codes, dictionary merging, loading multiple YAML or JSON files and a few other helpful tasks eg.
Code Block
Country.get_iso3_country_code_fuzzy('Czech Rep.')

Easy Configuration and Logging

Logging is something often neglected so the library aims to make it a breeze to get going with logging and so avoid the spread of print statements. A few handlers are created in the default configuration:

Code Block
console: class: logging.StreamHandler level: DEBUG formatter: color stream: ext://sys.stdout

Code Block
error_file_handler: class: logging.FileHandler level: ERROR formatter: simple filename: errors.log encoding: utf8 mode: w

If using the default logging configuration, then it is possible to also add the default email (SMTP) handler:

Code Block
error_mail_handler: class: logging.handlers.SMTPHandler level: CRITICAL formatter: simple mailhost: localhost fromaddr: noreply@localhost

Configuration is made as simple as possible with a Configuration class that handles the HDX API key and the merging of configurations from multiple YAML or JSON files or dictionaries:

Code Block
class Configuration(UserDict):


    """Configuration for HDX


    Args

:

:
        **kwargs: See below
        hdx_key_file (Optional[str]): Path to HDX key file. Defaults to ~/.hdxkey

**kwargs:

See below

hdx_config_dict (dict): HDX configuration dictionary OR


        hdx_config_json (str): Path to JSON HDX configuration OR


        hdx_config_yaml (str): Path to YAML HDX configuration. Defaults to library's internal hdx_configuration.yml.

scraper

project_config_dict (dict):

Scraper

Project configuration dictionary OR

scraper

project_config_json (str): Path to JSON

Scraper

Project configuration OR

scraper

project_config_yaml (str): Path to YAML

Scraper

Project configuration. Defaults to

internal scraper

config/project_configuration.yml.

"""

Logging is something often neglected so the library aims to make it a breeze to get going with logging and so avoid the spread of print statements. A few loggers are created in the default configuration:

console:
    class: logging.StreamHandler
    level: DEBUG
    formatter: color
    stream: ext://sys.stdout

error_file_handler:
    class: logging.FileHandler
    level: ERROR
    formatter: simple
    filename: errors.log
    encoding: utf8
    mode: w

error_mail_handler:
    class: logging.handlers.SMTPHandler
    level: CRITICAL
    formatter: simple
    mailhost: localhost
    fromaddr: noreply@localhost

There are utility functions to handle dictionary merging, loading multiple YAML or JSON files and a few other helpful tasks eg.

def script_dir_plus_file(filename: str, pyobject: Any, follow_symlinks: Optional[bool] = True) -> str:
    """Get current script's directory and then append a filename

    Args:
        filename (str): Filename to append to directory path
        pyobject (Any): Any Python object in the script
        follow_symlinks (Optional[bool]): Follow symlinks or not. Defaults to True.

    Returns:
        str: Current script's directory and with filename appended
 """

There are setup wrappers to which the scraper

The library itself uses logging at appropriate levels to ensure that it is clear what operation are being performed eg.

Code Block
WARNING - 2016-06-07 11:08:04 - hdx.data.dataset - Dataset exists. Updating acled-conflict-data-for-africa-realtime-2016

The library makes errors plain by throwing exceptions rather than returning a False or None (except where that would be more appropriate) eg.
Code Block
hdx.configuration.ConfigurationError: More than one project configuration file given!
Code Block
There are facades to simplify setup to which the project's main function is passed. They neatly cloak the setup of logging and one of them hides the required calls for pushing status into ScraperWiki (used internally in HDX) eg.
Code Block
from hdx.
collector
facades.scraperwiki import
wrapper
facade
Code Block
def main(
configuration
):
Code Block
dataset = generate_dataset(
configuration,
datetime.now())
Code Block
...
Code Block
if __name__ == '__main__':
wrapper
facade(main)

...

Documentation of the API

The

...

decide if the scraper will report status to ScraperWiki or not. Unless you are in the HDX team, you will use the simple wrapper (otherwise replace "simple" with "scraperwiki" in the code below):

from hdx.collector.simple import wrapper

def main(configuration):
    ***YOUR CODE HERE***

if __name__ == '__main__':
    wrapper(main)

The wrapper sets up both logging and HDX configuration passed to your main function in the "configuration" argument above.

The default configuration assumes an internal HDX configuration located within the library package, an API key file and a scraper configuration located at config/scraper_configuration.yml

It is possible to pass configuration parameters in the wrapper call eg.

wrapper(main, hdx_key_file = LOCATION_OF_HDX_KEY_FILE, hdx_config_yaml=PATH_TO_HDX_YAML_CONFIGURATION,

    scraper_config_dict = {'MY_PARAMETER', 'MY_VALUE'})

The logging configuration from the defaults

If you wish to change the logging configuration from the defaults

...

code is very well documented. Detailed API documentation (generated from Google style docstrings using Sphinx) is available and mentioned in the Getting Started guide.
Code Block
def load_from_hdx(self, id_or_name: str) -> bool: """Loads the dataset given by either id or name from HDX
Code Block
Args: id_or_name (str): Either id or name of dataset
Code Block
Returns: bool: True if loaded, False if not
Code Block
"""
Image Added
IDEs can take advantage of the documentation eg.
Image Added
The method arguments and return parameter have type hints. (Although this is a feature of Python 3.5, it has been backported.) Type hints enable sophisticated IDEs like PyCharm to warn of any inconsistencies in using types bringing one of the major benefits of statically typed languages to Python.
Code Block
def merge_dictionaries(dicts: List[dict]) -> dict:
gives:
Image Added
Default parameters mean that there is a very easy default way to get set up and going eg.
Code Block
def update_yaml(self, path: Optional[str] = join('config', 'hdx_dataset_static.yml')) -> None:

Version	Old Version 17	New Version Current
Changes made by	Michael Rans	Michael Rans
Saved on	06 Jun 2016	09 Nov 2021

Versions Compared

Key

Page Contents

Introduction

Keeping it Simple

Easy Configuration and Logging

Documentation of the API

Page Comparison

Versions Compared

Key

<span class="diff-html-changed" data-a11y-before="Start of changed content" data-a11y-after="End of changed content" id="changed-diff-0">[data-colorid=</span>

Page Contents

Introduction

Keeping it Simple

Easy Configuration and Logging

Documentation of the API