Number of Files Locally and Externally Hosted
Number of Resources Percentage of Resources | |||||
Type | 09/2016 | 01/2017 | 04/2017 | 07/2017 | 10/2017 |
---|---|---|---|---|---|
File Store | 2102 22% | 2472 24% | 2771 26% | 3036 17% | 3651 9% |
CPS | 2459 26% | 2449 24% | 2449 23% | 2449 14% | 2449 6% |
HXL Proxy | 2584 27% | 2584 25% | 2584 25% | 2584 15% | 2584 6% |
ScraperWiki | 162 2% | 158 2% | 160 2% | 160 1% | 160 0% |
Others | 2261 24% | 2544 25% | 2537 24% | 9203 53% | 32578 79% |
Total | 9568 100% | 10207 100% | 10501 100% | 17432 100% | 41422 100% |
The SQL queries are:
select count(*) from dbresources where run_number = xxx;
select count(*) from dbresources where run_number = xxx and url like '%data.humdata.org%';
select count(*) from dbresources where run_number = xxx and url like '%manage.hdx.rwlabs.org%';
select count(*) from dbresources where run_number = xxx and url like '%proxy.hxlstandard.org%';
select count(*) from dbresources where run_number = xxx and url like '%scraperwiki%';
Number of Dataset Updates before and after introduction of Overdue email
There is a 46% increase in updates happening after the overdue email was introduced, with many related to problems with automated systems eg. in HOTOSM.
The SQL queries are:
60 days of runs before overdue emails sent:
SELECT DISTINCT a.id, c.title FROM dbdatasets a, dbdatasets b, dbinfodatasets c WHERE a.id = b.id AND a.run_number > b.run_number AND b.run_number > 89 AND b.run_number <= 149 AND a.fresh = 0 AND b.fresh = 2 AND a.id = c.id;
60 days of runs after overdue emails sent:
SELECT DISTINCT a.id, c.title FROM dbdatasets a, dbdatasets b, dbinfodatasets c WHERE a.id = b.id AND a.run_number > b.run_number AND b.run_number > 149 AND b.run_number <= 209 AND a.fresh = 0 AND b.fresh = 2 AND a.id = c.id;
60 days of runs after overdue emails sent
Before | After | Reason | |
---|---|---|---|
HOTOSM | 12 | 36 | Likely failed export |
FTS | 51 | Likely scraper failures during initial creation | |
WFP | 10 | Wrong update frequency | |
IDMC | 47 | Wrong update frequency | |
InterAction | 36 | Wrong update frequency | |
Other | 21 | 18 | |
Total | 94 | 137 |
Baseline Crisis Data and new/updated Data on Crisis Onset
Crisis | Day0 | Country | Baseline | Updates | Creates |
---|---|---|---|---|---|
Rohingya | MMR | 54 | 9 | 18 | |
Rohingya | BGD | 76 | 13 | 38 | |
Irma | |||||
Maria | |||||
The SQL queries for the Rohingya Crisis MMR and BGD (in the order Baseline, Updates, Creates) are:
select b.name, a.last_modified, b.location from dbdatasets a, dbinfodatasets b, dbruns c where a.id = b.id and a.run_number = c.run_number and b.location like '%mmr%' and a.update_frequency != -1 and date(c.run_date) = '2017-08-24' and a.last_modified > '2016-02-24' order by a.last_modified;
with baseline as
(select a.id, b.name from dbdatasets a, dbinfodatasets b, dbruns c where a.id = b.id and a.run_number = c.run_number and b.location like '%mmr%' and a.update_frequency != -1 and date(c.run_date) = '2017-08-24' and a.last_modified > '2016-02-24')
select e.name, max(d.last_modified) as last_modified from dbdatasets d, baseline e where d.id=e.id and d.last_modified > '2017-08-25' group by e.name;
select e.name, max(d.last_modified) as last_modified from dbdatasets d, dbinfodatasets e where e.location like '%mmr%' and d.update_frequency != -1 and d.id not in (select a.id from dbdatasets a, dbinfodatasets b, dbruns c where a.id = b.id and
a.run_number = c.run_number and b.location like '%mmr%' and a.update_frequency != -1 and date(c.run_date) = '2017-08-24' and a.last_modified > '2016-02-24') and d.id=e.id and d.last_modified > '2017-08-25' group by e.name;
The SQL queries for the the Hurricane Irma Crisis () (in the order Baseline, Updates, Creates) are:
select b.name, a.last_modified, b.location from dbdatasets a, dbinfodatasets b, dbruns c where a.id = b.id and a.run_number = c.run_number and b.location like '%%' and a.update_frequency != -1 and date(c.run_date) = '2017-09-02' and a.last_modified > '2016-03-02' order by a.last_modified;
with baseline as
(select a.id, b.name from dbdatasets a, dbinfodatasets b, dbruns c where a.id = b.id and a.run_number = c.run_number and b.location like '%%' and a.update_frequency != -1 and date(c.run_date) = '2017-09-02' and a.last_modified > '2016-03-02')
select e.name, max(d.last_modified) as last_modified from dbdatasets d, baseline e where d.id=e.id and d.last_modified > '2017-09-03' group by e.name;
select e.name, max(d.last_modified) as last_modified from dbdatasets d, dbinfodatasets e where e.location like '%%' and d.update_frequency != -1 and d.id not in (select a.id from dbdatasets a, dbinfodatasets b, dbruns c where a.id = b.id and
a.run_number = c.run_number and b.location like '%%' and a.update_frequency != -1 and date(c.run_date) = '2017-09-02' and a.last_modified > '2016-03-02') and d.id=e.id and d.last_modified > '2017-09-03' group by e.name;
The SQL queries for the the Hurricane Maria Crisis () (in the order Baseline, Updates, Creates) are:
select b.name, a.last_modified, b.location from dbdatasets a, dbinfodatasets b, dbruns c where a.id = b.id and a.run_number = c.run_number and b.location like '%%' and a.update_frequency != -1 and date(c.run_date) = '2017-09-15' and a.last_modified > '2016-03-15' order by a.last_modified;
with baseline as
(select a.id, b.name from dbdatasets a, dbinfodatasets b, dbruns c where a.id = b.id and a.run_number = c.run_number and b.location like '%%' and a.update_frequency != -1 and date(c.run_date) = '2017-09-15' and a.last_modified > '2016-03-15')
select e.name, max(d.last_modified) as last_modified from dbdatasets d, baseline e where d.id=e.id and d.last_modified > '2017-09-16' group by e.name;
select e.name, max(d.last_modified) as last_modified from dbdatasets d, dbinfodatasets e where e.location like '%%' and d.update_frequency != -1 and d.id not in (select a.id from dbdatasets a, dbinfodatasets b, dbruns c where a.id = b.id and
a.run_number = c.run_number and b.location like '%%' and a.update_frequency != -1 and date(c.run_date) = '2017-09-15' and a.last_modified > '2016-03-15') and d.id=e.id and d.last_modified > '2017-09-16' group by e.name;