title Argentinean financial institutions (ar-licences) pending_draft_review
description This bot extracts information on Argentinean banks, exchange companies, financial institutions in receivership, credit unions and financial trusts, from the Central Bank of the Republic of Argentina.
current run state not running
last run single run snapshot draft scrape succeeded on April 05, 2015 11:23
next run n/a
created by dinotash (Tom Curtis)
last reviewed by peter.evans
(no subject)
peter.evans commented almost 9 years ago

Hi Tom,
Thanks for letting me know about this. That is a real shame indeed - this bot was very impressive, particularly in the depth of the primary data. I don't think that the new website has info that is very easy to scrape - I suspect there may be some of what we had before buried deep within one of these PDF files but that doesn't look worth writing a bot for.
I'll have another look and see if I can spot anything.
Thanks,
Peter

Re: (turbot bot [ar-licences])
dinotash commented almost 9 years ago

Hi Peter
I’m afraid it’s back to the drawing board with this bot. It is extremely frustrating!
I’ve just had a look at the BCRA website, and they’ve completely overhauled it. The new English version doesn’t seem to contain any information about the institutions they regulate, let alone the previous level of detail. Trying to use the old URLs redirects to the new homepage. The new English site is at: http://www.bcra.gob.ar/Varios/vr090000.asp
It looks like the data is still available in Spanish, by following the menu to Sistema Financiero > Consulta Por Tipo de Entidades. However, there are two problems. Firstly, the format is is different - both in terms of the page layouts/structures, but also in terms of some of the formatting information (e.g. text colour) I used to help interpret the financial data. Secondly, my rusty schoolboy Spanish is not up to the job of understanding what data is there to scrape.
I’m afraid you will need to ask someone else with better Spanish to have a go at this. Obviously, I’d be very happy for you to share my existing bot code with them.
Thanks
Tom

(no subject)
peter.evans commented about 9 years ago

Hi Tom,
Thanks for the quick response. It sounds like it might be worth checking out the pages that didn't scrape this time but if it's just the website not being 100% stable then as the bot runs over time we should pick up those records as well - I'm actually not sure how data from intermittent website is handled by the framework, might be something worth looking at.
Thanks,
Peter

Re: (turbot bot [ar-licences])
dinotash commented about 9 years ago

Hi Peter
I'll take another look at this. I'll admit that there was so much data that I was happy if the bot made it through to the end.
I'll take a look at why certain pages didn't scrape. It's set up to give an "unable to load" message and move onto the next page if anything happens that would have caused it to halt.
There may be legitimate reasons for not being able to load pages. I don't think the site is hosted on the most stable server. During development, I had a couple of times where pages just timed out.
There were also times when it looked like database problems on the other end. A page would sometimes show nothing but "Not presently available" despite working a minute before or after.
In other cases though it could be an error, so I will check.
Happy to change the adjective in the manifest.
Tom

(no subject)
peter.evans commented about 9 years ago

Hi Tom,
Thank you for being so thorough with this Mission, the amount of data captured is extremely impressive!
I’ve spent quite a lot of time looking at the extra data captured into primary data and it looks good - we can also have a closer look at that when it comes time to transform various parts of it.
The status reporting that you built in is very useful - it’s flagged up the areas where it looks like we’re not scraping everything: https://turbot.opencorporates.com/bots/ar-licences/runs/2348/metadata
Specifically it looks like it is only scraping the first 8 institutions in receivership. There’s also an ‘unable to load’ debugging message for financial trusts, which I don’t think are being scraped currently, unless I’m missing something. Otherwise it looks like all categories are being scraped.
There is also an ‘unable to load’ message for private banks, public banks, and financial institutions, but they seem to have scraped okay anyway.

The only other thing I can see to point out is that in the manifest we refer to “Argentinean” which I think should be “Argentinian” or even better “Argentine”.
Thanks again for working on this scraper - please let me know if I’ve misinterpreted anything in this review, it’s not unlikely given the complexity of the data :)
Best wishes,
Peter

Bot state update
commented about 9 years ago

A draft run succeeded; sending for review

Bot state update
commented about 9 years ago

A draft run started

Bot state update
dinotash commented about 9 years ago

The bot was pushed; scheduling a draft run

Bot state update
commented about 9 years ago

A run failed

Bot state update
commented about 9 years ago

A draft run started

Bot state update
dinotash commented about 9 years ago

The bot was pushed; scheduling a draft run

Bot state update
commented about 9 years ago

A run failed

Bot state update
commented about 9 years ago

A draft run started

Bot state update
dinotash commented about 9 years ago

The bot was pushed; scheduling a draft run

Bot state update
commented about 9 years ago

A run failed

Bot state update
commented about 9 years ago

A draft run started

Bot state update
dinotash commented about 9 years ago

The bot was pushed; scheduling a draft run

Bot state update
commented about 9 years ago

A run failed

Bot state update
commented about 9 years ago

A draft run started

Bot state update
dinotash commented about 9 years ago

The bot was pushed; scheduling a draft run

Bot state update
commented about 9 years ago

A run failed

Bot state update
commented about 9 years ago

A draft run started

Bot state update
dinotash commented about 9 years ago

The bot was pushed; scheduling a draft run

Bot state update
commented about 9 years ago

A run failed

Bot state update
commented about 9 years ago

A draft run started

Bot state update
dinotash commented about 9 years ago

The bot was pushed; scheduling a draft run

Run history

event metadata
single run snapshot draft scrape failed on April 04, 2015 21:57 4 rows in 8 minutes
single run snapshot draft scrape failed on April 04, 2015 23:19 4 rows in 10 minutes
single run snapshot draft scrape failed on April 05, 2015 07:58 4 rows in 8 minutes
single run snapshot draft scrape failed on April 05, 2015 08:17 4 rows in 7 minutes
single run snapshot draft scrape failed on April 05, 2015 09:32 137 rows in about 1 hour
single run snapshot draft scrape failed on April 05, 2015 10:01 24 rows in 21 minutes
single run snapshot draft scrape succeeded on April 05, 2015 11:23 200 rows in about 1 hour

Config

{
  "bot_id": "ar-licences",
  "title": "Argentinean financial institutions",
  "description": "This bot extracts information on Argentinean banks, exchange companies, financial institutions in receivership, credit unions and financial trusts, from the Central Bank of the Republic of Argentina.",
  "language": "python",
  "data_type": "primary data",
  "identifying_fields": [
    "name",
    "type_of_institution",
    "address"
  ],
  "files": [
    "scraper.py",
    "licence.py"
  ],
  "frequency": "monthly",
  "publisher": {
    "name": "Banco Central de la República Argentina",
    "url": "http://www.bcra.gov.ar/index_i.htm",
    "terms": "Users are allowed to view, copy or print, either wholly or in part, any data contained in this site, provided that the content of such data is used for personal, educational or professional purposes on a non-profitable basis, and the source from which such information arises is dully quoted. Redistribution, dissemination, retransmission or marketing of any data contained in this site under any title and modality is fully forbidden without the previous and express authorization from BCRA.",
    "terms_url": "http://www.bcra.gov.ar/varios/vr030000_i.asp"
  },
  "transformers": [
    {
      "file": "licence.py",
      "data_type": "simple-licence",
      "identifying_fields": [
        "company_name"
      ]
    }
  ]
}