We're interested in any kinds of corporate data as outlined below, but things we're specifically looking for are listed on the OpenCorporates Missions website.
Turbot is where we store raw data. We lightly process this data in order to incorporate it into the main OpenCorporates database. For example, we match a name like "Apple, California" to the canonical record for Apple, Inc in OpenCorporates.
When you load a bot on our platform, you are granting us a right to use that code to generate data. You continue to own copyright on that code (though we ask you to license it under the MIT license so others can also alter it as they need). Furthermore, all our data is licenced under the Open Database Licence which guarantees the data is available as Open Data. The raw data for all your bots will always be available via your profile page. Finally, we offer an API for querying and accessing all data loaded into OpenCorporates, including via bots.
We accept any data that:
Examples of "a primary source" are typically bodies with regulatory powers, like Cayman Islands General Corporate Registry, the Austrian National Bank, or information from a company's official website, such as the Altria Group (where you can find information about companies Altria owns).
For example, Wikipedia, Reuters, BBC News are all secondary sources and cannot be considered for inclusion.
An easy way to know if this can be linked to companies is if the data source includes company names. It must also have explicit or implicit information about the jurisdiction of the company mentioned. This is because "Little Widgets Ltd" could be a company in any country or jurisdiction in the world. Without the jurisdiction, we can't link a name to a specific legal entity.
Data can also be linked to companies via unique identifiers - for example, a company number plus a jurisdiction, or a tax number plus a jurisdiction may uniquely identify a company - but note, this varies by jurisdiction. Get in touch if you're not sure!
We cannot accept data from websites which explicitly prohibit reuse. Ideally data comes with an explicit open data licence; however, we are able to accept data that comes without an explicit licence or clear instruction on reuse.
We can only accept data that comes with some kind of legal or regulatory force. A list of mining companies in Wikipedia, for example, while interesting, comes with no guarantees of accuracy.
This list of financial Licenceholders in the Isle of Man, however, is acceptable because it is published by the Financial Supervision Commission of the Isle of Man, which explains on its website that it is the statutory body responsible for the regulation of financial activities.
Even though the register might contain mistakes (such as misspellings of a company name), these are mistakes that are in the register and are therefore the closest we can get to the "truth"
Any bot that outputs data conforming to one of the OpenCorporates supported data types will be reviewed for inclusion in OpenCorporates. Bots of type primary source
are also considered for inclusion directly.
Almost certainly. If enough people want it, or you're expecting to write quite a few scrapers yourself, ask and we'll try to support it.