title UK Importer data from HMRC (uk_importers_data) pending_draft_review
description Information on companies that import from non-EU countries. Data page and definitions at https://www.uktradeinfo.com/Statistics/Pages/DataDownloads.aspx
current run state not running
last run single run snapshot draft scrape succeeded on January 13, 2015 14:20
next run n/a
created by Chris Taggart ( )
last reviewed by peter.evans
(no subject)
peter.evans commented about 9 years ago

testing - trying to make this task disappear from next admin actions: awaiting a response from us section.

Bot state update
commented about 9 years ago

A draft run succeeded; sending for review

Bot state update
commented about 9 years ago

A draft run started

Bot state update
commented about 9 years ago

The bot was pushed; scheduling a draft run

Bot state update
commented about 9 years ago

A run failed

Bot state update
commented about 9 years ago

A draft run started

Bot state update
commented about 9 years ago

The bot was pushed; scheduling a draft run

Bot state update
commented about 9 years ago

A run failed

Bot state update
commented about 9 years ago

A run started

Bot state update
commented about 9 years ago

An initial run was triggered manually by a moderator

Bot state update
commented over 9 years ago

Automatic bot state update. Was in state running, now in state to_be_scheduled

Re: Problems with bot (turbot bot [uk_importers_data])
Chris Taggart commented over 9 years ago

I'm happy to leave this to you. Got a lot of catching up still to do, so I
wouldn't get fixed for at least a week.
-------------------------------------------------------
OpenCorporates :: The Open Database of the Corporate World
http://opencorporates.com
Blog: http://blog.opencorporates.com
Twitter: http://twitter.com/OpenCorporates
OpenCorporates is published by Chrinon Ltd, a company dedicated to
improving and publishing public data under an open licence that allows and
encourages reuse, including commercially. Registered in England, number
07444723.

Problems with bot
sebbacon commented over 9 years ago

Hi Chris
Have investigated this. There are one or two bugs in the bot, one which led to all the most recent runs being empty, and the other which meant we weren't processing historic archives.
1) See output at http://turbot.opencorporates.com/bots/uk_importers_data/runs/612/metadata. The problem is that we assume filenames definitely match this format, but there is a file at openc@morph1:~/sites/morph/releases/20140715085537/db/scrapers/data/uk_importers_data/data/hmrc_importers/SIAI11~1 which doesn't match this format. I can't fine a file matching that name in any of the downloaded archives so perhaps this isn't a bug and we should just blow away the archives and try again. Because we had cacluated we needed 89 runs, and because there was a bug (now fixed) in reliably getting the exit code from a run, it was assumed these were all successful (and therefore empty) runs. In short: probably no action required, though potentially we could consider logging failures to parse rather than erroring -- but I think we probably should actually leave this.
2) The historic archive files are unzipped correctly, but then the glob in files_for_processing assumes they start `SIAI11`. Actually the historic archive files appear to start `siai11`. Therefore we didn't process any of them.
If you ended up fixing this today, give us a shout before pushing as Peter is fixing up some morph stuff and it might be better to wait a bit first. Also we should manually clear the old run data and ES data.
Thanks

Run history

event metadata
single run snapshot 1 scrape failed on January 12, 2015 14:47 0 rows in less than a minute
single run snapshot draft scrape failed on January 12, 2015 23:20 0 rows in 4 minutes
single run snapshot draft scrape succeeded on January 13, 2015 14:20 2022 rows in 1 minute

Config

{
  "bot_id": "uk_importers_data",
  "title": "UK Importer data from HMRC",
  "description": "Information on companies that import from non-EU countries. Data page and definitions at https://www.uktradeinfo.com/Statistics/Pages/DataDownloads.aspx",
  "language": "ruby",
  "data_type": "primary data",
  "identifying_fields": [
    "name",
    "commodity_code"
  ],
  "files": [
    "scraper.rb",
    "doc",
    "lib",
    "config",
    "spec"
  ],
  "frequency": "daily"
}