title Financial Licences of Sweden (se-licences) ingestion_failed
description This bot scrapes licensing information from Finansinspektionen, the Swedish Financial Services Authority
current run state not running
last run single run snapshot 12 scrape ingestion failed on June 14, 2016 01:22
next run n/a
created by dinotash (Tom Curtis)
last reviewed by peter.evans
Error reported from openc
morph commented almost 8 years ago

Exception (#<Elasticsearch::Transport::Transport::Errors::NotFound: [404] {"_scroll_id":"c2NhbjswOzE7dG90YWxfaGl0czo2OTU3ODs=","took":8,"timed_out":false,"_shards":{"total":8,"successful":0,"failed":8,"failures":[{"status":404,"reason":"RemoteTransportException[[search6][inet[/10.43.6.6:9300]][indices:data/read/search[phase/scan/scroll]]]; nested: SearchContextMissingException[No search context found for id [357423165]]; "},{"status":404,"reason":"RemoteTransportException[[search5][inet[/10.43.6.5:9300]][indices:data/read/search[phase/scan/scroll]]]; nested: SearchContextMissingException[No search context found for id [370408810]]; "},{"status":404,"reason":"SearchContextMissingException[No search context found for id [305055206]]"},{"status":404,"reason":"RemoteTransportException[[search3][inet[/10.43.6.3:9300]][indices:data/read/search[phase/scan/scroll]]]; nested: SearchContextMissingException[No search context found for id [206171534]]; "},{"status":404,"reason":"RemoteTransportException[[search3][inet[/10.43.6.3:9300]][indices:data/read/search[phase/scan/scroll]]]; nested: SearchContextMissingException[No search context found for id [206171535]]; "},{"status":404,"reason":"RemoteTransportException[[search3][inet[/10.43.6.3:9300]][indices:data/read/search[phase/scan/scroll]]]; nested: SearchContextMissingException[No search context found for id [206171536]]; "},{"status":404,"reason":"RemoteTransportException[[search2][inet[/10.43.6.2:9300]][indices:data/read/search[phase/scan/scroll]]]; nested: SearchContextMissingException[No search context found for id [328051902]]; "},{"status":404,"reason":"RemoteTransportException[[search5][inet[/10.43.6.5:9300]][indices:data/read/search[phase/scan/scroll]]]; nested: SearchContextMissingException[No search context found for id [370408807]]; "}]},"hits":{"total":69578,"max_score":0.0,"hits":[]}}>) raised processing snapshot 12 from se-licences
Backtrace:
/home/openc/.rvm/gems/ruby-2.2.2/gems/elasticsearch-transport-1.0.12/lib/elasticsearch/transport/transport/base.rb:135:in `__raise_transport_error'
/home/openc/.rvm/gems/ruby-2.2.2/gems/elasticsearch-transport-1.0.12/lib/elasticsearch/transport/transport/base.rb:227:in `perform_request'
/home/openc/.rvm/gems/ruby-2.2.2/gems/elasticsearch-transport-1.0.12/lib/elasticsearch/transport/transport/http/faraday.rb:20:in `perform_request'
/home/openc/.rvm/gems/ruby-2.2.2/gems/elasticsearch-transport-1.0.12/lib/elasticsearch/transport/client.rb:119:in `perform_request'
/home/openc/.rvm/gems/ruby-2.2.2/gems/elasticsearch-api-1.0.12/lib/elasticsearch/api/actions/scroll.rb:56:in `scroll'
/home/openc/app/lib/base_elasticsearch_client.rb:147:in `block in scroll_through'
/home/openc/app/lib/base_elasticsearch_client.rb:145:in `loop'
/home/openc/app/lib/base_elasticsearch_client.rb:145:in `scroll_through'
/home/openc/app/lib/data_pipeline/run_processor.rb:135:in `find_non_ingested_records_from_snapshot'
/home/openc/app/lib/data_pipeline/run_processor.rb:35:in `process_run_output'
/home/openc/app/lib/data_pipeline/run_processor.rb:14:in `perform'
/home/openc/.rvm/gems/ruby-2.2.2/gems/resque-1.25.2/lib/resque/job.rb:240:in `block (3 levels) in perform'
/home/openc/.rvm/gems/ruby-2.2.2/gems/newrelic_rpm-3.13.0.299/lib/new_relic/agent/instrumentation/resque.rb:41:in `block in around_perform_with_monitoring'
/home/openc/.rvm/gems/ruby-2.2.2/gems/newrelic_rpm-3.13.0.299/lib/new_relic/agent/instrumentation/controller_instrumentation.rb:362:in `perform_action_with_newrelic_trace'
/home/openc/.rvm/gems/ruby-2.2.2/gems/newrelic_rpm-3.13.0.299/lib/new_relic/agent/instrumentation/resque.rb:33:in `around_perform_with_monitoring'
/home/openc/.rvm/gems/ruby-2.2.2/gems/resque-1.25.2/lib/resque/job.rb:239:in `block (2 levels) in perform'
/home/openc/.rvm/gems/ruby-2.2.2/gems/resque-1.25.2/lib/resque/job.rb:247:in `call'
/home/openc/.rvm/gems/ruby-2.2.2/gems/resque-1.25.2/lib/resque/job.rb:247:in `perform'
/home/openc/.rvm/gems/ruby-2.2.2/gems/resque-1.25.2/lib/resque/worker.rb:250:in `perform'
/home/openc/.rvm/gems/ruby-2.2.2/gems/resque-1.25.2/lib/resque/worker.rb:189:in `block in work'
/home/openc/.rvm/gems/ruby-2.2.2/gems/resque-1.25.2/lib/resque/worker.rb:166:in `loop'
/home/openc/.rvm/gems/ruby-2.2.2/gems/resque-1.25.2/lib/resque/worker.rb:166:in `work'
/home/openc/app/lib/tasks/resque.rake:126:in `block (2 levels) in <top (required)>'
/home/openc/.rvm/gems/ruby-2.2.2/gems/rake-10.5.0/lib/rake/task.rb:240:in `call'
/home/openc/.rvm/gems/ruby-2.2.2/gems/rake-10.5.0/lib/rake/task.rb:240:in `block in execute'
/home/openc/.rvm/gems/ruby-2.2.2/gems/rake-10.5.0/lib/rake/task.rb:235:in `each'
/home/openc/.rvm/gems/ruby-2.2.2/gems/rake-10.5.0/lib/rake/task.rb:235:in `execute'
/home/openc/.rvm/gems/ruby-2.2.2/gems/rake-10.5.0/lib/rake/task.rb:179:in `block in invoke_with_call_chain'
/home/openc/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/monitor.rb:211:in `mon_synchronize'
/home/openc/.rvm/gems/ruby-2.2.2/gems/rake-10.5.0/lib/rake/task.rb:172:in `invoke_with_call_chain'
/home/openc/.rvm/gems/ruby-2.2.2/gems/rake-10.5.0/lib/rake/task.rb:165:in `invoke'
/home/openc/.rvm/gems/ruby-2.2.2/gems/rake-10.5.0/lib/rake/application.rb:150:in `invoke_task'
/home/openc/.rvm/gems/ruby-2.2.2/gems/rake-10.5.0/lib/rake/application.rb:106:in `block (2 levels) in top_level'
/home/openc/.rvm/gems/ruby-2.2.2/gems/rake-10.5.0/lib/rake/application.rb:106:in `each'
/home/openc/.rvm/gems/ruby-2.2.2/gems/rake-10.5.0/lib/rake/application.rb:106:in `block in top_level'
/home/openc/.rvm/gems/ruby-2.2.2/gems/rake-10.5.0/lib/rake/application.rb:115:in `run_with_threads'
/home/openc/.rvm/gems/ruby-2.2.2/gems/rake-10.5.0/lib/rake/application.rb:100:in `top_level'
/home/openc/.rvm/gems/ruby-2.2.2/gems/rake-10.5.0/lib/rake/application.rb:78:in `block in run'
/home/openc/.rvm/gems/ruby-2.2.2/gems/rake-10.5.0/lib/rake/application.rb:176:in `standard_exception_handling'
/home/openc/.rvm/gems/ruby-2.2.2/gems/rake-10.5.0/lib/rake/application.rb:75:in `run'
/home/openc/.rvm/gems/ruby-2.2.2/gems/rake-10.5.0/bin/rake:33:in `<top (required)>'
/home/openc/.rvm/gems/ruby-2.2.2/bin/rake:23:in `load'
/home/openc/.rvm/gems/ruby-2.2.2/bin/rake:23:in `<main>'
/home/openc/.rvm/gems/ruby-2.2.2/bin/ruby_executable_hooks:15:in `eval'
/home/openc/.rvm/gems/ruby-2.2.2/bin/ruby_executable_hooks:15:in `<main>'

State changed to ingestion_failed for run #7456, snapshot 12
commented almost 8 years ago

Openc ingestion failed

State changed to ingesting_data for run #7456, snapshot 12
commented almost 8 years ago

The run's output is being ingested

State changed to storing_data for run #7456, snapshot 12
commented almost 8 years ago

The run's output is being stored

State changed to running for run #7456, snapshot 12
commented almost 8 years ago

A run started

State changed to scheduled
commented almost 8 years ago

For run #7456:
A snapshot completed; scheduling the first run of the next snapshot

Saved vars cleared
dinotash commented almost 8 years ago

State changed to ingesting_data
commented almost 8 years ago

For run #6909:
The run's output is being ingested

State changed to storing_data
commented almost 8 years ago

For run #6909:
The run's output is being stored

State changed to running
commented almost 8 years ago

For run #6909:
A run started

Bot state update
commented about 8 years ago

A snapshot completed; scheduling the first run of the next snapshot

Saved vars cleared
dinotash commented about 8 years ago

Bot state update
commented about 8 years ago

A run finished; its output is now being processed

Bot state update
commented about 8 years ago

A run started

Bot state update
commented about 8 years ago

A snapshot completed; scheduling the first run of the next snapshot

Saved vars cleared
dinotash commented about 8 years ago

Bot state update
commented about 8 years ago

A run started

Bot state update
commented about 8 years ago

A snapshot completed; scheduling the first run of the next snapshot

Saved vars cleared
dinotash commented about 8 years ago

Bot state update
commented about 8 years ago

A run started

Bot state update
commented over 8 years ago

A snapshot completed; scheduling the first run of the next snapshot

Saved vars cleared
dinotash commented over 8 years ago

Bot state update
commented over 8 years ago

A run started

Bot state update
commented over 8 years ago

A snapshot completed; scheduling the first run of the next snapshot

Saved vars cleared
dinotash commented over 8 years ago

Bot state update
commented over 8 years ago

A run started

Bot state update
commented over 8 years ago

A snapshot completed; scheduling the first run of the next snapshot

Saved vars cleared
dinotash commented over 8 years ago

Bot state update
commented over 8 years ago

A run started

Bot state update
commented over 8 years ago

A snapshot completed; scheduling the first run of the next snapshot

Saved vars cleared
dinotash commented over 8 years ago

Bot state update
commented over 8 years ago

A run started

Bot state update
commented over 8 years ago

A snapshot completed; scheduling the first run of the next snapshot

Bot state update
commented over 8 years ago

A run started

Bot state update
commented over 8 years ago

A snapshot completed; scheduling the first run of the next snapshot

Bot state update
commented over 8 years ago

A run started

Bot state update
commented almost 9 years ago

A run succeeded; scheduling the next run

Bot state update
peter.evans commented almost 9 years ago

The bot was accepted; starting run to ingest reviewed data

Bot state update
commented almost 9 years ago

A draft run succeeded; sending for final review

Bot state update
peter.evans commented almost 9 years ago

A moderator has approved the draft bot; running a full draft for final review

Bot state update
peter.evans commented almost 9 years ago

A moderator has started reviewing the draft bot

Bot state update
commented almost 9 years ago

Run succeeded; sending for draft review

Bot state update
dinotash commented almost 9 years ago

The bot was pushed; scheduling a draft run

Bot state update
commented almost 9 years ago

A final draft run failed

Bot state update
peter.evans commented almost 9 years ago

A moderator has approved the draft bot; running a full draft for final review

Bot state update
peter.evans commented almost 9 years ago

A moderator has started reviewing the draft bot

Bot state update
commented almost 9 years ago

Run succeeded; sending for draft review

Bot state update
dinotash commented almost 9 years ago

The bot was pushed; scheduling a draft run

Bot state update
commented almost 9 years ago

A final draft run failed

Bot state update
peter.evans commented almost 9 years ago

A moderator has approved the draft bot; running a full draft for final review

Bot state update
peter.evans commented almost 9 years ago

A moderator has started reviewing the draft bot

Bot state update
commented almost 9 years ago

Run succeeded; sending for draft review

Bot state update
dinotash commented almost 9 years ago

The bot was pushed; scheduling a draft run

Bot state update
commented almost 9 years ago

Run succeeded; sending for draft review

Bot state update
dinotash commented almost 9 years ago

The bot was pushed; scheduling a draft run

Bot state update
commented almost 9 years ago

A draft run failed

Bot state update
dinotash commented almost 9 years ago

The bot was pushed; scheduling a draft run

Bot state update
peter.evans commented almost 9 years ago

The bot needs more work

Bot state update
peter.evans commented almost 9 years ago

A moderator has started reviewing the draft bot

Bot state update
commented almost 9 years ago

Run succeeded; sending for draft review

Bot state update
dinotash commented almost 9 years ago

The bot was pushed; scheduling a draft run

Bot state update
commented almost 9 years ago

Run succeeded; sending for draft review

Bot state update
dinotash commented almost 9 years ago

The bot was pushed; scheduling a draft run

(no subject)
peter.evans commented almost 9 years ago

Hi Tom,
Yes that's exactly right - I will aim to write & send some first-draft documentation to get us moving.
Best,
Peter

Re: (turbot bot [se-licences])
dinotash commented almost 9 years ago

Hi Peter
Do you mean writing a new transformer to turn the primary data into rich-licences instead of simple-licences?
If so, then yes - please send the docs!
Tom

(no subject)
peter.evans commented almost 9 years ago

Hi Tom,
Thank you for sending that on again - Very good to know about the attachments causing problems, will pass this on.
Thank you for looking into the geolocation libraries/ APIs that's very useful information. I think we should shelve the issue for now as it sounds that even paying for the APIs might not give the desired results (accurate reconciliation) & actually therefore we're probably in a much better position to do this ourselves.
As you say the only issue is that company_jurisdiction is a required field in the simple-licence schema. This is actually one of the problems that we fixed in the rich licence schema & I think it would be easiest if I quickly get some documentation together for that - The amount of information that we can transform and get onto opencorporates is then much more significant as well.
I'll get some first-draft documentation ready & if that sounds good to you then we can move ahead with this bot (& improve the docs as we go). What do you think?
Thanks,
Peter

Re: (turbot bot [se-licences])
dinotash commented almost 9 years ago

Hi Peter
Here's my original message. I think turbot doesn't like messages with attachments.
I think I need to flick this back to you guys - for both technical and business reasons.
I had a look at geopy, which is basically a neater way of using the various mapping APIs out there, like Google Maps, OpenStreetMap and Bing Maps. You simply tell geopy to use your favourite one of these services to translate the address in words into a geographic location, from which you can determine the country.
Firstly, the results are a little underwhelming on an initial test using the Google Maps API. Here is a list of addresses from primary data and the countries in which Google thought they were. It’s not terrible, but can’t seem to work out that Stockholm or other Swedish cities are in Sweden. It can’t even decide whether they’re in the USA, Poland or the Netherlands. Obviously, that’s a bit of a problem for a Swedish data set. I also saw it thought Lisbon and Amsterdam were in the US too. (https://github.com/tomcurtis/opencorporates/blob/master/se-licences/geopy-countries-test.txt)
Secondly, the sites which run the mapping APIs know they have valuable data which is in demand. All of these APIs require you to sign up for an API key, with various limits on the rate at which you can use the service, and the maximum number of API calls per day.
As far as I can see, none of these services allows more than 2,500 uses per day on a free API. My Swedish bot captured around 25,000 records for which you we may need to find the country. In addition, some of the free versions of the APIs require you to display certain messages with the data, which doesn’t seem very opencorporates (e.g. data copyright Mapquest, etc).
I think this is one where you guys will have to evaluate whether you have sufficient demand across your other sources to investigate further and whether there is sufficient benefit to justify paying for a less restrictive licence, and weigh up which service to use.
How do you want to proceed with this bot? At the moment, I think the only thing holding it up was the fact that jurisdiction_classification is a mandatory field for the simple_licence schema.
Hope this helps
Tom

(no subject)
peter.evans commented almost 9 years ago

Hi Tom,
I had a reply to this bot thread but it only read:
"""
Hi Peter
"""
So possibly there was an issue in the message being delivered, could you possibly re-send if this was the case?
Thanks,
Peter

Re: (turbot bot [se-licences])
dinotash commented almost 9 years ago

Hi Peter

Bot state update
peter.evans commented almost 9 years ago

The bot needs more work

Bot state update
peter.evans commented almost 9 years ago

A moderator has started reviewing the draft bot

(no subject)
peter.evans commented almost 9 years ago

Hi Tom,
I agree that an established library makes more sense, however we don't yet have a working example with which to justify adding a library to the Turbot environment. How would you feel about trying out a library locally? Once it is up and running we could see about adding it to the Turbot environment & this bot could then be pushed and used as an example/ test. What do you think?
Best,
Peter

Re: (turbot bot [se-licences])
dinotash commented about 9 years ago

Hi Peter
I'll hold off until after your Tuesday meeting then.
I think an established library would be preferable, as it helps with maintenance.
I still think they might end up being the same thing. When I had a look at geopy, it looks like it works by passing the request to google/openstreetmap/other services.
Tom

(no subject)
peter.evans commented about 9 years ago

Hi Tom,
If adding a geocoding library is a neater solution then I'd say let's push for that, we have a meeting on Tuesday and I was planning to suggest we get one added quickly, I think the only outstanding question was which one to use. It would be interesting if we could do it without using a library, though that would mean that the bot is dependent on an external api.
Peter

Re: (turbot bot [se-licences])
dinotash commented about 9 years ago

Thanks Peter
It may not be necessary to add a geocoding library to Turbot after all.
Several services (inc Google) have APIs which can be queried just by requesting the right URL. The geocoding libraries just added a neater front end for this, but don't seem to be necessary.
The question will be more around what services are available, whether there's any restrictions on using their data, and usage limits. I think Google allows 2,500 queries per day for free, but that's no good for a bit with 25,000 entries!
I'll have a look at the options in the next week or two.
Tom

(no subject)
peter.evans commented about 9 years ago

Hi Tom,
I've reviewed this bot and I think it's good to go - as soon as we can support a geocoding library then we can add a transformer and get this information added to openc.
Thanks,
Peter

Bot state update
commented about 9 years ago

A draft run succeeded; sending for review

Bot state update
commented about 9 years ago

A draft run started

Bot state update
dinotash commented about 9 years ago

The bot was pushed; scheduling a draft run

Run history

event metadata
single run snapshot draft scrape succeeded on March 02, 2015 12:36 24708 rows in about 6 hours
single run snapshot draft scrape succeeded on June 20, 2015 12:00 1687 rows in 22 minutes
single run snapshot draft scrape succeeded on June 21, 2015 02:05 24975 rows in about 9 hours
single run snapshot draft scrape failed on July 03, 2015 03:31 25141 rows in about 5 hours
single run snapshot draft scrape succeeded on July 04, 2015 03:51 25224 rows in about 6 hours
single run snapshot draft scrape succeeded on July 06, 2015 22:11 25159 rows in about 8 hours
single run snapshot final draft scrape failed on July 08, 2015 20:32 25189 rows in about 6 hours
single run snapshot draft scrape succeeded on July 09, 2015 21:22 24885 rows in about 11 hours
single run snapshot final draft scrape failed on July 10, 2015 23:31 24902 rows in about 10 hours
single run snapshot draft scrape succeeded on July 12, 2015 21:13 24900 rows in about 5 hours
single run snapshot final draft scrape succeeded on July 13, 2015 15:07 24940 rows in about 6 hours
single run snapshot 1 prescrape scrape succeeded on July 13, 2015 20:36 24940 rows in about 1 hour
single run snapshot 2 scrape succeeded on August 14, 2015 01:47 24951 rows in about 6 hours
single run snapshot 3 scrape succeeded on September 14, 2015 03:52 25254 rows in about 9 hours
single run snapshot 4 scrape succeeded on October 16, 2015 13:03 25137 rows in 3 days
single run snapshot 5 scrape succeeded on November 14, 2015 01:30 25220 rows in about 6 hours
single run snapshot 6 scrape succeeded on December 14, 2015 01:32 25309 rows in about 6 hours
single run snapshot 7 scrape succeeded on January 14, 2016 01:37 25428 rows in about 6 hours
single run snapshot 8 scrape succeeded on February 14, 2016 02:46 25850 rows in about 7 hours
single run snapshot 9 scrape succeeded on March 14, 2016 02:14 25925 rows in about 7 hours
single run snapshot 10 scrape succeeded on April 14, 2016 06:02 26012 rows in about 11 hours
single run snapshot 11 scrape succeeded on May 14, 2016 01:20 26063 rows in about 6 hours
single run snapshot 12 scrape ingestion failed on June 14, 2016 01:22 26033 rows in about 6 hours

Config

{
  "bot_id": "se-licences",
  "title": "Financial Licences of Sweden",
  "description": "This bot scrapes licensing information from Finansinspektionen, the Swedish Financial Services Authority",
  "language": "python",
  "data_type": "primary data",
  "identifying_fields": [
    "idx",
    "name"
  ],
  "files": [
    "scraper.py",
    "licence.py"
  ],
  "frequency": "monthly",
  "publisher": {
    "name": "Finansinspektionen",
    "url": "http://www.fi.se",
    "terms": "Per FI: You are allowed to use and download all the material from our website, as long as the source is mentioned.",
    "terms_url": "http://www.fi.se/Folder-EN/Startpage/About-FI/Our-website/Copyright/"
  },
  "transformers": [
    {
      "file": "licence.py",
      "data_type": "licence",
      "identifying_fields": [
        "licence_holder.entity_properties.name",
        "permissions",
        "jurisdiction_of_licence",
        "start_date"
      ]
    }
  ],
  "duplicates_allowed": "true"
}