Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Number of Files Locally and Externally Hosted



Number of Resources

Percentage of Resources

Type09/201601/201704/201707/201710/2017
File Store

2102

22%

2472

24%

2771

26%

3036

17%

3651

9%

CPS

2459

26%

2449

24%

2449

23%

2449

14%

2449

6%

HXL Proxy

2584

27%

2584

25%

2584

25%

2584

15%

2584

6%

ScraperWiki

162

2%

158

2%

160

2%

160

1%

160

0%

Others

2261

24%

2544

25%

2537

24%

9203

53%

32578

79%

Total

9568

100%

10207

100%

10501

100%

17432

100%

41422

100%


The SQL queries are:

select count(*) from dbresources where run_number = xxx;
select count(*) from dbresources where run_number = xxx and url like '%data.humdata.org%';
select count(*) from dbresources where run_number = xxx and url like '%manage.hdx.rwlabs.org%';
select count(*) from dbresources where run_number = xxx and url like '%proxy.hxlstandard.org%';
select count(*) from dbresources where run_number = xxx and url like '%scraperwiki%';

Number of Dataset Updates before and after introduction of Overdue email

There is a 46% increase in updates happening after the overdue email was introduced, with many related to problems with automated systems eg. in HOTOSM.

The SQL queries are:

60 days of runs before overdue emails sent: 

SELECT DISTINCT a.id, c.title FROM dbdatasets a, dbdatasets b, dbinfodatasets c WHERE a.id = b.id AND a.run_number > b.run_number AND b.run_number > 89 AND b.run_number <= 149 AND a.fresh = 0 AND b.fresh = 2 AND a.id = c.id;

60 days of runs after overdue emails sent: 

SELECT DISTINCT a.id, c.title FROM dbdatasets a, dbdatasets b, dbinfodatasets c WHERE a.id = b.id AND a.run_number > b.run_number AND b.run_number > 149 AND b.run_number <= 209 AND a.fresh = 0 AND b.fresh = 2 AND a.id = c.id;

60 days of runs after overdue emails sent


BeforeAfterReason
HOTOSM1236Likely failed export
FTS51
Likely scraper failures during initial creation
WFP10
Wrong update frequency
IDMC
47Wrong update frequency
InterAction
36Wrong update frequency
Other2118
Total

94

137


Baseline Crisis Data and new/updated Data on Crisis Onset


CrisisDay0CountriesISO3BaselineUpdatesCreates
Rohingya Myanmar, BangladeshMMR, BGD981835
Irma/Maria Anguilla, Antigua and Barbuda, Barbados, Bahamas, Bonaire, Saint Eustatius and Saba (The Netherlands), British Virgin Islands, Cuba, Dominica, Dominican Republic, Guadeloupe (France), Haiti, Martinique (France), Montserrat, Puerto Rico (The United States of America), Saint Barthélemy (France), Saint Kitts and Nevis, Saint Lucia, Saint Martin (France), Saint Vincent and the Grenadines, Sint Maarten (The Netherlands), Turks and Caicos Islands, United States Virgin IslandsAIA, ATG, BHS, BRB, BES, VGB, CUB, DMA, DOM, GLP, HTI, MTQ, MSR, PRI, BLM, KNA, LCA, MAF, VCT, SXM, TCA, VIR19671114

The SQL queries for the Rohingya Crisis MMR and BGD (in the order Baseline, Number of Updates, List of Updates, Number of Creates, List of Creates) are:

select b.name, a.last_modified, b.location from dbdatasets a, dbinfodatasets b, dbruns c where a.id = b.id and a.run_number = c.run_number and b.location ~* 'mmr|bgd' and a.update_frequency != -1 and date(c.run_date) = '2017-08-24' and a.last_modified > '2016-02-24' order by a.last_modified;

with baseline as
(select a.id, b.name from dbdatasets a, dbinfodatasets b, dbruns c where a.id = b.id and a.run_number = c.run_number and b.location ~* 'mmr|bgd' and a.update_frequency != -1 and date(c.run_date) = '2017-08-24' and a.last_modified > '2016-02-24')
select e.name, min(d.last_modified) as last_modified from dbdatasets d, baseline e where d.id = e.id and d.last_modified > '2017-08-24' and d.last_modified < '2017-11-25' group by e.name;

with baseline as
(select a.id, b.name from dbdatasets a, dbinfodatasets b, dbruns c where a.id = b.id and a.run_number = c.run_number and b.location ~* 'mmr|bgd' and a.update_frequency != -1 and date(c.run_date) = '2017-08-24' and a.last_modified > '2016-02-24')
select distinct e.name, d.last_modified from dbdatasets d, baseline e where d.id = e.id and d.last_modified > '2017-08-24' and d.last_modified < '2017-11-25' order by e.name, d.last_modified;

select e.name, min(d.last_modified) as last_modified from dbdatasets d, dbinfodatasets e where e.location ~* 'mmr|bgd' and d.update_frequency != -1 and d.id not in (select distinct a.id from dbdatasets a, dbinfodatasets b, dbruns c where a.id = b.id and
a.run_number = c.run_number and b.location ~* 'mmr|bgd' and c.run_date < '2017-08-25') and d.id = e.id and d.last_modified > '2017-08-24' and d.last_modified < '2017-11-25' group by e.name;
select distinct e.name, d.last_modified from dbdatasets d, dbinfodatasets e where e.location ~* 'mmr|bgd' and d.update_frequency != -1 and d.id not in (select distinct a.id from dbdatasets a, dbinfodatasets b, dbruns c where a.id = b.id and 
a.run_number = c.run_number and b.location ~* 'mmr|bgd' and c.run_date < '2017-08-25') and d.id = e.id and d.last_modified > '2017-08-24' and d.last_modified < '2017-11-25' order by e.name, d.last_modified;


The SQL queries for the the Hurricane Irma and Maria Crises combined with affected countries taken from https://reliefweb.int/disaster/tc-2017-000125-dom and https://reliefweb.int/disaster/tc-2017-000136-atg (in the order Baseline, Updates, Creates):

select b.name, a.last_modified, b.location from dbdatasets a, dbinfodatasets b, dbruns c where a.id = b.id and a.run_number = c.run_number and b.location ~* 'aia|atg|bhs|brb|bes|vgb|cub|dma|dom|glp|hti|mtq|msr|pri|blm|kna|lca|maf|vct|sxm|tca|vir' and a.update_frequency != -1 and date(c.run_date) = '2017-09-02' and a.last_modified > '2016-03-02' order by a.last_modified;

with baseline as
(select a.id, b.name from dbdatasets a, dbinfodatasets b, dbruns c where a.id = b.id and a.run_number = c.run_number and b.location ~* 'aia|atg|bhs|brb|bes|vgb|cub|dma|dom|glp|hti|mtq|msr|pri|blm|kna|lca|maf|vct|sxm|tca|vir' and a.update_frequency != -1 and date(c.run_date) = '2017-09-02' and a.last_modified > '2016-03-02')
select e.name, min(d.last_modified) as last_modified from dbdatasets d, baseline e where d.id = e.id and d.last_modified > '2017-09-02' and d.last_modified < '2017-12-03' group by e.name;

with baseline as
(select a.id, b.name from dbdatasets a, dbinfodatasets b, dbruns c where a.id = b.id and a.run_number = c.run_number and b.location ~* 'aia|atg|bhs|brb|bes|vgb|cub|dma|dom|glp|hti|mtq|msr|pri|blm|kna|lca|maf|vct|sxm|tca|vir' and a.update_frequency != -1 and date(c.run_date) = '2017-09-02' and a.last_modified > '2016-03-02')
select distinct e.name, d.last_modified from dbdatasets d, baseline e where d.id = e.id and d.last_modified > '2017-09-02' and d.last_modified < '2017-12-03' order by e.name, d.last_modified;

select e.name, min(d.last_modified) as last_modified from dbdatasets d, dbinfodatasets e where e.location ~* 'aia|atg|bhs|brb|bes|vgb|cub|dma|dom|glp|hti|mtq|msr|pri|blm|kna|lca|maf|vct|sxm|tca|vir' and d.update_frequency != -1 and d.id not in (select distinct a.id from dbdatasets a, dbinfodatasets b, dbruns c where a.id = b.id and
a.run_number = c.run_number and b.location ~* 'aia|atg|bhs|brb|bes|vgb|cub|dma|dom|glp|hti|mtq|msr|pri|blm|kna|lca|maf|vct|sxm|tca|vir' and c.run_date < '2017-09-03') and d.id = e.id and d.last_modified > '2017-09-02' and d.last_modified < '2017-12-03' group by e.name;

select distinct e.name, d.last_modified from dbdatasets d, dbinfodatasets e where e.location ~* 'aia|atg|bhs|brb|bes|vgb|cub|dma|dom|glp|hti|mtq|msr|pri|blm|kna|lca|maf|vct|sxm|tca|vir' and d.update_frequency != -1 and d.id not in (select distinct a.id from dbdatasets a, dbinfodatasets b, dbruns c where a.id = b.id and
a.run_number = c.run_number and b.location ~* 'aia|atg|bhs|brb|bes|vgb|cub|dma|dom|glp|hti|mtq|msr|pri|blm|kna|lca|maf|vct|sxm|tca|vir' and c.run_date < '2017-09-03') and d.id = e.id and d.last_modified > '2017-09-02' and d.last_modified < '2017-12-03' order by e.name, d.last_modified;