Turbot is a tool that allows members of the open data community to write bots to harvest publicly-available corporate data, so that that data can be imported into the OpenCorporates database and made available as open data.
A bot is simply a computer program that programatically retrieves pages and files from the web, extracts structured data from those, and then outputs that data in a useful format.
We hope that anybody with a little bit of programming experience with one of our supported languages will be able to get started in writing a bot. Some sites are harder to scrape than others. If you get stuck, ask for help!
To write a bot and then to end up with the scraped data in the OpenCorporates database, you'll have to go through the following steps:
Read about the installing and using the command line program in the documentation.
Before you start scraping, you'll need to identify a data source to scrape! There's a list of data sets on our Missions page, and you can claim a mission there. Alternatively, if you have found a data source that you'd like to scrape, check that it meets our requirements for the kinds of data we scrape.
At the moment, you can write a bot in Ruby or Python. The Quick start guide takes you through the process of writing and running your first simple bot, and you can then work through some simple example bots.
In order for your scraped data to appear on pages on the OpenCorporates website, it needs to be transformed into a structure that the website can understand. This can be done easily by writing a transformer that takes the output of your bot and transforms each record.
Once you have written your bot and are happy with how it works, you can submit your bot to be reviewed by our QA team. When you do this, it will be run automatically on the Turbot infrastructure. We'll review the output of this run, and we'll check over the code, and we might ask you to make some small changes. Once we are happy with your bot, we'll accept it and will schedule the bot to be run regularly, with the bot's output being imported into the OpenCorporates database.
Once your bot has been accepted, any transformed data that it produces will be matched against a company in the main OpenCorporates database, and will appear on that company's OpenCorporates page, helping us to build a clear picture of that company's structure and activities.Start contributing