Hi Peter
Here's my original message. I think turbot doesn't like messages with attachments.
I think I need to flick this back to you guys - for both technical and business reasons.
I had a look at geopy, which is basically a neater way of using the various mapping APIs out there, like Google Maps, OpenStreetMap and Bing Maps. You simply tell geopy to use your favourite one of these services to translate the address in words into a geographic location, from which you can determine the country.
Firstly, the results are a little underwhelming on an initial test using the Google Maps API. Here is a list of addresses from primary data and the countries in which Google thought they were. It’s not terrible, but can’t seem to work out that Stockholm or other Swedish cities are in Sweden. It can’t even decide whether they’re in the USA, Poland or the Netherlands. Obviously, that’s a bit of a problem for a Swedish data set. I also saw it thought Lisbon and Amsterdam were in the US too. (https://github.com/tomcurtis/opencorporates/blob/master/se-licences/geopy-countries-test.txt)
Secondly, the sites which run the mapping APIs know they have valuable data which is in demand. All of these APIs require you to sign up for an API key, with various limits on the rate at which you can use the service, and the maximum number of API calls per day.
As far as I can see, none of these services allows more than 2,500 uses per day on a free API. My Swedish bot captured around 25,000 records for which you we may need to find the country. In addition, some of the free versions of the APIs require you to display certain messages with the data, which doesn’t seem very opencorporates (e.g. data copyright Mapquest, etc).
I think this is one where you guys will have to evaluate whether you have sufficient demand across your other sources to investigate further and whether there is sufficient benefit to justify paying for a less restrictive licence, and weigh up which service to use.
How do you want to proceed with this bot? At the moment, I think the only thing holding it up was the fact that jurisdiction_classification is a mandatory field for the simple_licence schema.
Hope this helps
Tom