title Bank of Oman (bank_of_oman) scheduled
description Bot for scraping Bank of Oman data (http://www.cbo-oman.org/related.htm)
current run state not running
last run single run snapshot 8 scrape succeeded on March 16, 2016 10:23
next run next run scheduled at March 13, 2016 08:47
created by helenst (Helen ST)
last reviewed by peter.evans
Bot update
commented over 4 years ago

Bot triggered error in framework

Bot state update
commented over 4 years ago

A snapshot completed; scheduling the first run of the next snapshot

Saved vars cleared
helenst commented over 4 years ago

Bot state update
commented over 4 years ago

A run started

Bot state update
commented over 4 years ago

A snapshot completed; scheduling the first run of the next snapshot

Saved vars cleared
helenst commented over 4 years ago

Bot state update
commented over 4 years ago

A run started

Bot state update
commented over 4 years ago

A snapshot completed; scheduling the first run of the next snapshot

Saved vars cleared
helenst commented over 4 years ago

Bot state update
commented over 4 years ago

A run started

Bot state update
commented over 4 years ago

A snapshot completed; scheduling the first run of the next snapshot

Saved vars cleared
helenst commented over 4 years ago

Manual morph trigger
sebbacon commented over 4 years ago

Manually retriggered run in morph. See https://app.asana.com/0/10487231230096/57400449427076

Bot state update
commented almost 5 years ago

A run started

Bot state update
commented almost 5 years ago

A snapshot completed; scheduling the first run of the next snapshot

Bot state update
commented almost 5 years ago

A run started

Bot state update
commented almost 5 years ago

A snapshot completed; scheduling the first run of the next snapshot

Bot state update
commented almost 5 years ago

A run started

Bot state update
commented about 5 years ago

A run succeeded; scheduling the next run

Bot state update
peter.evans commented about 5 years ago

The bot was accepted; starting run to ingest reviewed data

Bot state update
commented about 5 years ago

A draft run succeeded; sending for final review

Bot state update
peter.evans commented about 5 years ago

A moderator has approved the draft bot; running a full draft for final review

Bot state update
peter.evans commented about 5 years ago

A moderator has started reviewing the draft bot

Bot state update
commented about 5 years ago

Run succeeded; sending for draft review

Bot state update
helenst commented about 5 years ago

The bot was pushed; scheduling a draft run

Bot state update
peter.evans commented about 5 years ago

The bot needs more work

Bot state update
peter.evans commented about 5 years ago

A moderator has started reviewing the draft bot

Bot state update
commented about 5 years ago

Run succeeded; sending for draft review

Bot state update
helenst commented about 5 years ago

The bot was pushed; scheduling a draft run

Bot state update
commented about 5 years ago

Run succeeded; sending for draft review

Bot state update
helenst commented about 5 years ago

The bot was pushed; scheduling a draft run

(no subject)
peter.evans commented about 5 years ago

Hi Helen,
This is looking great and your interpretation of the licence schema looks absolutely fine. Would it have been helpful to have a "complete" example licence record along with a comment explaining each field? I was considering this as a next step for documenting the schema.
The only update that I can see from a quick review of the output is that we should include something for the permissions field. What should populate this field is an indication of what the licence is for - this will probably have to be hard coded for this bot as they seem to use images but we can also work it out from the URLs. E.g. those on this page http://www.cbo-oman.org/related_forign.htm are "Foreign Banks - Commercial Banks", while those on this page http://www.cbo-oman.org/related_specialBanks.htm are "Specialized banks".
Permission type can be "operating" as in this random example
"permissions": [
{
"activity_name": "Common Carrier - Single Destination",
"permission_type": "operating"
}
],
Hope I explained that clearly, please let me know if not. If we can get this sorted then I'll give the bot a final thorough review with a view to getting it into opencorporates.
All the best & Thank you
Peter

Bot state update
commented about 5 years ago

Run succeeded; sending for draft review

Bot state update
helenst commented about 5 years ago

The bot was pushed; scheduling a draft run

Bot state update
commented about 5 years ago

Run succeeded; sending for draft review

Bot state update
helenst commented about 5 years ago

The bot was pushed; scheduling a draft run

Bot state update
commented about 5 years ago

A draft run failed

Bot state update
helenst commented about 5 years ago

The bot was pushed; scheduling a draft run

Bot state update
commented about 5 years ago

A draft run failed

Bot state update
helenst commented about 5 years ago

The bot was pushed; scheduling a draft run

(no subject)
peter.evans commented over 5 years ago

Hi Helen,
Thanks for being in touch and carrying on the bot, glad you enjoyed #FlashHacks, feel free to join us again.
I think the current version of turbot is ‘turbot-gem/0.1.9 (x86_64-linux) ruby/1.9.3’ so I expect that is why the manifest was a little light, I don’t think it will cause a problem other than that but here’s what Seb said about versions when I asked him about updating:
“””
Seb
It depends how they installed it in the first place. This is not a question I was looking forward to being asked! (But it's inevitable...) Going forward, we would want people to use the packaged installers, but historically this has not been the case - and in some cases, people will still want to use the gem. If they used the packaged installer, then we need to release a new package which they would then install. I could do that this week but not now. If they used a gem-based install process, they can do "gem update turbot"
“””
So if you let me know how you installed originally then I can follow up about upgrading, or if you installed via the gem then you can update yourself, by the sound of it.
There’s a bit of a glitch in the simple licence docs regarding required fields - licence_jurisdiction is actually optional. company_jurisdiction is the country where the company is registered, licence_jurisdiction is where a company is licensed to operate - so for the MTB financial missions one might come across a licensed foreign branch where the licence_jurisdiction would be the current jurisdiction whereas company_jurisdiction would be the home country of that company.
You’re right in saying that the simple licence is quite restricted in what it can capture, we’re working on a rich licence which will be able to capture addresses and such, but for now a lot of jurisdictions will only be capturing basic fields like company_name, regulator, jurisdiction, and jurisdiction_classification. This is one of the main reasons why we output primary data as well as transformed data, so that we can easily add rich licences further down the line.
All the best and thanks again for the work you’re putting into the bot,
Peter

Re: (turbot bot [bank_of_oman])
helenst commented over 5 years ago

Hi Peter, good to meet you too - I really enjoyed the session.
My turbot version is: turbot-gem/0.1.1 (x86_64-linux) ruby/1.9.3 - is that
out of date? I can add that field in anyway.
With regard to the supported schemas, it doesn't seem like there's a lot of
overlap between what I've scraped and what's required / used by the
official schemas! The scraped data is mostly contact details... as far as I
can tell, we just have company_name, source_url and sample_date, plus the
jurisdictions if I can figure those out. What's the difference between
company_jurisdiction and licence_jurisdiction?
Helen

Bot state update
peter.evans commented over 5 years ago

The bot was not approved: it needs more work

(no subject)
peter.evans commented over 5 years ago

Hi Helen,
Was good to meet you at #FlashHacks, this was the first bot that I've helped someone get on the system in person so great that we managed to get it submitted.
There's still a few steps before the scraped data is in a format that can be made open and used, but all of the hard work is done now.
The main thing to note is that there is a field missing from the manifest.json, I suspect this could be due to an older version of turbot? You could run 'turbot version' & I could check but for now we can just add "company_fields": {"name": "name"}, to the manifest. Manifest docs here: http://turbot.opencorporates.com/docs/turbot_specification
Secondly the data needs to be transformed to fit a supported schema before it can be used, the docs for doing this are here: http://turbot.opencorporates.com/docs/examples#structured-bots
We're hoping to improve the documentation for this (and generally) so if you have any questions about headers for the simple licence (or anything) then I'd be very happy to answer them.
Thanks again for working on this scraper.
Best,
Peter

Bot state update
commented over 5 years ago

A moderator has started reviewing the bot

Bot state update
commented over 5 years ago

A draft run succeeded; sending for review

Bot state update
commented over 5 years ago

A draft run started

Bot state update
commented over 5 years ago

The bot was pushed; scheduling a draft run

Bot state update
commented over 5 years ago

A draft run succeeded; sending for review

Bot state update
commented over 5 years ago

A draft run started

Bot state update
commented over 5 years ago

The bot was pushed; scheduling a draft run

Run history

event metadata
single run snapshot draft scrape succeeded on January 28, 2015 19:53 40 rows in less than a minute
single run snapshot draft scrape succeeded on January 28, 2015 19:58 40 rows in less than a minute
single run snapshot draft scrape failed on June 24, 2015 18:50 0 rows in less than a minute
single run snapshot draft scrape failed on June 24, 2015 19:18 40 rows in less than a minute
single run snapshot draft scrape succeeded on June 24, 2015 19:20 40 rows in less than a minute
single run snapshot draft scrape succeeded on July 06, 2015 19:57 40 rows in less than a minute
single run snapshot draft scrape succeeded on July 08, 2015 08:09 40 rows in less than a minute
single run snapshot draft scrape succeeded on July 08, 2015 17:21 40 rows in less than a minute
single run snapshot draft scrape succeeded on July 12, 2015 16:15 40 rows in less than a minute
single run snapshot final draft scrape succeeded on July 13, 2015 08:47 40 rows in less than a minute
single run snapshot 1 prescrape scrape succeeded on July 13, 2015 08:47 40 rows in less than a minute
single run snapshot 2 scrape succeeded on August 13, 2015 08:48 40 rows in less than a minute
single run snapshot 3 scrape succeeded on September 13, 2015 08:48 40 rows in less than a minute
single run snapshot 4 scrape succeeded on November 02, 2015 13:00 40 rows in less than a minute
single run snapshot 5 scrape succeeded on November 13, 2015 08:48 40 rows in less than a minute
single run snapshot 6 scrape succeeded on December 13, 2015 08:48 40 rows in 1 minute
single run snapshot 7 scrape succeeded on January 13, 2016 08:48 40 rows in less than a minute
single run snapshot 8 scrape succeeded on March 16, 2016 10:23 40 rows in less than a minute
single run snapshot 9 scrape scheduled on March 13, 2016 08:47 0 rows

Config

{
  "bot_id": "bank_of_oman",
  "title": "Bank of Oman",
  "description": "Bot for scraping Bank of Oman data (http://www.cbo-oman.org/related.htm)",
  "language": "python",
  "data_type": "primary data",
  "identifying_fields": [
    "name",
    "source_url"
  ],
  "company_fields": {
    "name": "name"
  },
  "files": [
    "scraper.py",
    "licence_transformer.py",
    "country.json"
  ],
  "transformers": [
    {
      "file": "licence_transformer.py",
      "data_type": "licence",
      "identifying_fields": [
        "licence_holder.entity_properties.name"
      ]
    }
  ],
  "frequency": "monthly",
  "publisher": {
    "name": "Central Bank of Oman",
    "url": "http://www.cbo-oman.org/",
    "terms": "n/a",
    "terms_url": "n/a"
  }
}