What happens when a bot is pushed - Review process

When a bot is pushed to Turbot a first-draft run is performed, validated, and sent for review. This initial draft run is curtailed at 2000 records because if there are any issues that need attention prior to loading the data into OpenCorporates it minimises the load placed on the data source's servers.

First-Draft review

At this stage we are mostly checking the data produced by the bot against the data source in order to make sure that we have represented it as accurately as possible. We will also be checking the format of the Transformer as this determines what data will be visible in OpenCorporates.

Some common themes that we are checking for:

  1. Any extra information at the data source that we have not already captured, for example extra pages for different categories of licence, or extra fields available on a details page.
  2. Any information which has been captured incorrectly as a result of a poorly structured data source. In some instances structure may even vary from record to record.
  3. The presence of a transformer and that the transformer conforms to one of our structured formats (schemas). Though data also undergoes validation
  4. That as many relevant primary data fields have been transformed as possible, and that all transformed fields are a correct interpretation of the data source

Once any potential issues have been discussed and resolved the bot is sent on for a complete run and a final review.

Final-Draft review

At this stage we have a complete dataset to work with, which could expose further text-processing issues, but is mainly used to assess the completeness of the dataset and expose any gaps in date ranges, alpha-search sequences etc. The goal of having two review stages like this is in order to perform as few full runs as possible to minimise our footprint on remote servers.

Again any issues which surface at this stage of review will be communicated back and we will work together to resolve them. When all issues have been resolved or if no issues exist then the data can be added to OpenCorporates

After a new bot is accepted

After a bot is accepted it will firstly add its data to OpenCorporates and will then be scheduled to run regularly based on the "frequency" field in the manifest, see the Turbot Specification for more information about the manifest.

If a bot falls over for any reason at any time this will be visible on the "My Bots" page of Turbot, once the reason for the bot's failure has been fixed the bot can be re-pushed and will be reviewed again, the data will be ingested, and the bot will be re-scheduled.