title Iceland Supervised Entities Scraper (mission_609) running
description Scrapes Iceland Supervised Entities
current run state scraping or awaiting scrape, docker status
last run single run snapshot 20 scrape succeeded on February 19, 2017 10:40
next run enqueued for a run, docker status
created by objectgroup (Lisa Evans)
last reviewed by peter.evans
State changed to running for run #10120, snapshot 21
commented over 3 years ago

A run started

State changed to scheduled for run #10120, snapshot 21
commented over 3 years ago

A snapshot completed; scheduling the first run of the next snapshot

Saved vars cleared
objectgroup commented over 3 years ago

State changed to ingesting_data for run #9944, snapshot 20
commented over 3 years ago

The run's output is being ingested

State changed to storing_data for run #9944, snapshot 20
commented over 3 years ago

The run's output is being stored

State changed to running for run #9944, snapshot 20
commented over 3 years ago

A run started

State changed to scheduled for run #9944, snapshot 20
commented over 3 years ago

A snapshot completed; scheduling the first run of the next snapshot

Saved vars cleared
objectgroup commented over 3 years ago

State changed to ingesting_data for run #9678, snapshot 19
commented over 3 years ago

The run's output is being ingested

State changed to storing_data for run #9678, snapshot 19
commented over 3 years ago

The run's output is being stored

State changed to running for run #9678, snapshot 19
commented over 3 years ago

A run started

State changed to scheduled for run #9678, snapshot 19
commented over 3 years ago

A snapshot completed; scheduling the first run of the next snapshot

Saved vars cleared
objectgroup commented over 3 years ago

State changed to ingesting_data for run #9401, snapshot 18
commented over 3 years ago

The run's output is being ingested

State changed to storing_data for run #9401, snapshot 18
commented over 3 years ago

The run's output is being stored

State changed to running for run #9401, snapshot 18
commented over 3 years ago

A run started

State changed to scheduled for run #9401, snapshot 18
commented over 3 years ago

A snapshot completed; scheduling the first run of the next snapshot

Saved vars cleared
objectgroup commented over 3 years ago

State changed to ingesting_data for run #9174, snapshot 17
commented over 3 years ago

The run's output is being ingested

State changed to storing_data for run #9174, snapshot 17
commented over 3 years ago

The run's output is being stored

State changed to running for run #9174, snapshot 17
commented over 3 years ago

A run started

State changed to scheduled for run #9174, snapshot 17
commented over 3 years ago

A snapshot completed; scheduling the first run of the next snapshot

Saved vars cleared
objectgroup commented over 3 years ago

State changed to ingesting_data for run #8909, snapshot 16
commented over 3 years ago

The run's output is being ingested

State changed to storing_data for run #8909, snapshot 16
commented over 3 years ago

The run's output is being stored

State changed to running for run #8909, snapshot 16
commented over 3 years ago

A run started

State changed to scheduled for run #8909, snapshot 16
commented almost 4 years ago

A snapshot completed; scheduling the first run of the next snapshot

Saved vars cleared
objectgroup commented almost 4 years ago

State changed to ingesting_data for run #8581, snapshot 15
commented almost 4 years ago

The run's output is being ingested

State changed to storing_data for run #8581, snapshot 15
commented almost 4 years ago

The run's output is being stored

State changed to running for run #8581, snapshot 15
commented almost 4 years ago

A run started

State changed to scheduled for run #8581, snapshot 15
commented almost 4 years ago

A snapshot completed; scheduling the first run of the next snapshot

Saved vars cleared
objectgroup commented almost 4 years ago

State changed to ingesting_data for run #8238, snapshot 14
commented almost 4 years ago

The run's output is being ingested

State changed to storing_data for run #8238, snapshot 14
commented almost 4 years ago

The run's output is being stored

State changed to running for run #8238, snapshot 14
commented almost 4 years ago

A run started

State changed to scheduled for run #8238, snapshot 14
commented almost 4 years ago

A snapshot completed; scheduling the first run of the next snapshot

Saved vars cleared
objectgroup commented almost 4 years ago

State changed to ingesting_data for run #7936, snapshot 13
commented almost 4 years ago

The run's output is being ingested

State changed to storing_data for run #7936, snapshot 13
commented almost 4 years ago

The run's output is being stored

State changed to running for run #7936, snapshot 13
commented almost 4 years ago

A run started

State changed to scheduled for run #7936, snapshot 13
commented about 4 years ago

A snapshot completed; scheduling the first run of the next snapshot

Saved vars cleared
objectgroup commented about 4 years ago

State changed to ingesting_data for run #7510, snapshot 12
commented about 4 years ago

The run's output is being ingested

State changed to storing_data for run #7510, snapshot 12
commented about 4 years ago

The run's output is being stored

State changed to running for run #7510, snapshot 12
commented about 4 years ago

A run started

State changed to scheduled for run #7510, snapshot 12
commented about 4 years ago

A snapshot completed; scheduling the first run of the next snapshot

Saved vars cleared
objectgroup commented about 4 years ago

State changed to ingesting_data for run #7024, snapshot 11
commented about 4 years ago

The run's output is being ingested

State changed to storing_data for run #7024, snapshot 11
commented about 4 years ago

The run's output is being stored

State changed to running
commented about 4 years ago

For run #7024:
A run started

Bot state update
commented about 4 years ago

A snapshot completed; scheduling the first run of the next snapshot

Saved vars cleared
objectgroup commented about 4 years ago

Bot state update
commented about 4 years ago

The run's output is being ingested

Bot state update
commented about 4 years ago

The run's output is being stored

Bot state update
Alex Skene commented about 4 years ago

A new snapshot was restarted by the moderator following a failed run

Bot state update
commented over 4 years ago

A run started

Bot state update
commented over 4 years ago

A snapshot completed; scheduling the first run of the next snapshot

Saved vars cleared
objectgroup commented over 4 years ago

Bot state update
commented over 4 years ago

A run started

Bot state update
commented over 4 years ago

A snapshot completed; scheduling the first run of the next snapshot

Saved vars cleared
objectgroup commented over 4 years ago

Bot state update
commented over 4 years ago

A run started

Bot state update
commented over 4 years ago

A snapshot completed; scheduling the first run of the next snapshot

Saved vars cleared
objectgroup commented over 4 years ago

Bot state update
commented over 4 years ago

A run started

Bot state update
commented over 4 years ago

A snapshot completed; scheduling the first run of the next snapshot

Saved vars cleared
objectgroup commented over 4 years ago

Bot state update
commented over 4 years ago

A run started

Bot state update
commented almost 5 years ago

A snapshot completed; scheduling the first run of the next snapshot

Bot state update
commented almost 5 years ago

A run started

Bot state update
commented almost 5 years ago

A snapshot completed; scheduling the first run of the next snapshot

Bot state update
commented almost 5 years ago

A run started

Bot state update
commented almost 5 years ago

A run succeeded; scheduling the next run

Bot state update
commented almost 5 years ago

A run started

Bot state update
commented about 5 years ago

A run succeeded; scheduling the next run

Bot state update
commented about 5 years ago

A run started

Bot state update
commented about 5 years ago

A run succeeded; scheduling the next run

Bot state update
peter.evans commented about 5 years ago

The bot was accepted; starting run to ingest reviewed data

Bot state update
commented about 5 years ago

A draft run succeeded; sending for final review

Bot state update
peter.evans commented about 5 years ago

A moderator has approved the draft bot; running a full draft for final review

Bot state update
peter.evans commented about 5 years ago

A moderator has started reviewing the draft bot

Bot state update
peter.evans commented about 5 years ago

A moderator has started reviewing the draft bot

(no subject)
peter.evans commented about 5 years ago

Hi Lisa,
No problem - that seems to be working perfectly now, thank you for the quick fix. I've given the scraper another review and it is ready to accept & import.
Thank you for working on this bot, please do be in touch if you have any questions or if there is anything that I can help with.
Peter

Re: (turbot bot [mission_609])
objectgroup commented about 5 years ago

Thanks Peter, that is what I was missing. I have submitted the bot again.

Bot state update
commented about 5 years ago

Run succeeded; sending for draft review

Bot state update
objectgroup commented about 5 years ago

The bot was pushed; scheduling a draft run

Bot state update
commented about 5 years ago

A draft run failed

Bot state update
objectgroup commented about 5 years ago

The bot was pushed; scheduling a draft run

(no subject)
peter.evans commented about 5 years ago

Hi Lisa,
Thank you for pushing those changes. Any extra files to be included in the bot should be added to the "files" array in the manifest:
"files": [
"scraper.py"
],
Hope that helps.
Thanks,
Peter

Re: (turbot bot [mission_609])
objectgroup commented about 5 years ago

Hi Peter, I've made the changes you suggested in your last message (remove
super text and added a transformer) and resubmitted but
it seems turbot didn't pick up the licence_transformer.py file I added to
the mission directory and the bot failed. Do I need to do something extra
to add the transformer file?
Best
Lisa

Bot state update
commented about 5 years ago

A draft run failed

Bot state update
objectgroup commented about 5 years ago

The bot was pushed; scheduling a draft run

Bot state update
commented about 5 years ago

A draft run failed

Bot state update
objectgroup commented about 5 years ago

The bot was pushed; scheduling a draft run

(no subject)
peter.evans commented over 5 years ago

Hi Lisa,
Thank you for your response. Apologies for saying that we weren’t scraping every record, I’ve checked again and that seems fine.
I’ve had another look as it was a while since I last looked at this. Found a couple of little things that you might want to update to make the output look perfect, also included some advice about fields for the transformer.
One company name currently includes a reference notation (“1)”) we could optionally capture the text at the bottom of the page that this refers to for this record, but I think we should at least remove it from the company name, this is the one:
"VÖRSLUAÐILAR LÍFEYRISSPARNAÐAR - Pension Savings Depositories1)"
Regarding the transformer - I think while we don’t know what some of the headers are we know enough to write a transformer without any outside help. Have a look at the mappings below, transformer fields on the left, your current primary data output headers on the right.
source_url => source_url
sample_date => sample_date
company_name => company_name
company_jurisdiction => “is”
licence_number => id
jurisdiction_classification => type
Let me know if that isn’t explained very well.
All the best,
Peter

Re: (turbot bot [mission_609])
objectgroup commented over 5 years ago

Hi Peter,
Thank you for reviewing this mission for me. I have checked the results of
the scraper. In the output file there are 121 entries found and on the
source page there are 121 entries in the table, so I'm pretty sure the
scraper is getting all the records. I think the confusion comes because the
turbot page that displays the results of the scrape here
http://turbot.opencorporates.com/bots/mission_609/runs/draft/data_types/primary%20data
says it has 121 entries - but then it only displays 51 entries in total
over the two pages of output. If the turbot web page is for a sample of the
scraper output then that is fine.
I'm really happy to write a transformer but I think, to do that, I need
more details from the Icelandic authority about what the columns represent
in the table on their website. I've emailed them and will follow it up with
a call as it has been a week now since I asked for the column headings and
still no reply.
Thanks again and look forward to your reply,
Lisa

(no subject)
peter.evans commented over 5 years ago

Hi Lisa,
Thank you very much for writing this scraper for Mission 609, I've reviewed it and it looks to be in very good shape. There are a couple of things that I spotted which we could do to make it even better. Firstly I noticed that the very last item on the html list at the data source does not seem to be getting scraped - this could be intentional or maybe the website has updated, worth checking into that anyway. It is also very desirable to output data as primary data and to also output a second set of standardised data using a transformer (and the simple licence schema), have you looked into transformers at all? I'll provide some links for if you feel like having a go, and if you need any assistance you just have to ask.
supported licence types: http://turbot.opencorporates.com/docs/supported_data_types
transformer examples: http://turbot.opencorporates.com/docs/examples#structured-bots
Thanks again for writing this bot and if you need any assistance do feel free to be in touch. I'll also send an invite to our Slack group if you have not already been invited.
All the best,
Peter

Bot state update
commented over 5 years ago

A draft run succeeded; sending for review

Bot state update
commented over 5 years ago

A draft run started

Bot state update
objectgroup commented over 5 years ago

The bot was pushed; scheduling a draft run

Run history

event metadata
single run snapshot draft scrape succeeded on March 15, 2015 20:00 121 rows in less than a minute
single run snapshot draft scrape failed on May 17, 2015 18:30 119 rows in less than a minute
single run snapshot draft scrape failed on May 17, 2015 19:32 119 rows in less than a minute
single run snapshot draft scrape failed on May 18, 2015 09:55 119 rows in less than a minute
single run snapshot draft scrape succeeded on May 18, 2015 10:16 119 rows in less than a minute
single run snapshot final draft scrape succeeded on May 18, 2015 12:05 119 rows in less than a minute
single run snapshot 1 prescrape scrape succeeded on May 18, 2015 12:05 119 rows in less than a minute
single run snapshot 2 scrape succeeded on June 18, 2015 12:06 119 rows in less than a minute
single run snapshot 3 scrape succeeded on July 18, 2015 12:06 118 rows in less than a minute
single run snapshot 4 scrape succeeded on August 18, 2015 12:06 118 rows in less than a minute
single run snapshot 5 scrape succeeded on September 18, 2015 12:06 116 rows in less than a minute
single run snapshot 6 scrape succeeded on October 18, 2015 12:06 109 rows in 1 minute
single run snapshot 7 scrape succeeded on November 18, 2015 12:06 108 rows in less than a minute
single run snapshot 8 scrape succeeded on December 18, 2015 12:06 108 rows in less than a minute
single run snapshot 9 scrape succeeded on January 18, 2016 12:06 108 rows in less than a minute
single run snapshot 10 scrape errored on February 18, 2016 12:07 0 rows in 1 minute
single run snapshot 10 scrape succeeded on April 19, 2016 10:40 107 rows in less than a minute
single run snapshot 11 scrape succeeded on May 19, 2016 10:41 107 rows in less than a minute
single run snapshot 12 scrape succeeded on June 19, 2016 10:40 107 rows in less than a minute
single run snapshot 13 scrape succeeded on July 19, 2016 10:40 0 rows in less than a minute
single run snapshot 14 scrape succeeded on August 19, 2016 10:40 108 rows in less than a minute
single run snapshot 15 scrape succeeded on September 19, 2016 10:40 108 rows in less than a minute
single run snapshot 16 scrape succeeded on October 19, 2016 10:40 108 rows in less than a minute
single run snapshot 17 scrape succeeded on November 19, 2016 10:40 108 rows in less than a minute
single run snapshot 18 scrape succeeded on December 19, 2016 10:40 107 rows in less than a minute
single run snapshot 19 scrape succeeded on January 19, 2017 10:40 107 rows in less than a minute
single run snapshot 20 scrape succeeded on February 19, 2017 10:40 107 rows in less than a minute
single run snapshot 21 scrape scheduled on March 19, 2017 10:40 0 rows

Config

{
  "bot_id": "mission_609",
  "title": "Iceland Supervised Entities Scraper",
  "description": "Scrapes Iceland Supervised Entities",
  "language": "python",
  "data_type": "primary data",
  "identifying_fields": [
    "type",
    "company_name"
  ],
  "company_fields": {
    "name": "company_name"
  },
  "files": [
    "scraper.py",
    "licence_transformer.py"
  ],
  "transformers": [
    {
      "file": "licence_transformer.py",
      "data_type": "simple-licence",
      "identifying_fields": [
        "jurisdiction_classification",
        "company_name"
      ]
    }
  ],
  "frequency": "monthly",
  "publisher": {
    "name": "The Financial Supervisory Authority, Iceland",
    "url": "http://en.fme.is/supervision/supervised-entities/",
    "terms": "",
    "terms_url": ""
  }
}