A bot must:
scraper.py
or scraper.rb
bots:validate
command (i.e. which matches a supported schema)manifest.json
(see below)These are required properties of a manifest:
{
"bot_id": "my_amazing_bot", # <- a unique identifier
"title": "An amazing bot" # <- descriptive title
"description": "This is a simple bot", # <- longer description
"namespace_id": "my_amazing_bot", # <- optional. Defaults to the value of `bot_id`.
"language": "ruby", # <- language of bot (currently 'python' or 'ruby')
"data_type": "primary data", # <- reference to a Turbot schema
"identifying_fields": ["number"], # <- like primary key in a SQL database
"files": ["scraper.rb"], # <- list of files required for the bot to run
"frequency": "monthly", # <- desired scrape frequency (once, daily, weekly, monthly
# or yearly)
"publisher": { # <- essential so we can be sure it's open data
"name": "Publisher of the data",
"url": "Publisher's website",
"terms": "Copyright terms (e.g. Open Government License, n/a, etc)",
"terms_url": "A place where these terms can be checked or verified"
}
}
Manifests may also include the following optional fields:
"tags": ["licence", "financial"] # <- arbitrary tags; use those suggested in missions (if any)
"manually_end_run": true # <- Each time it's run, the bot gets more records,
# rather than restarting
"transformers": [{ # <- An array of transformers
"file": "licence_transformer.py", # <- Path to transformer
"data_type": "simple-licence", # <- Data type that transformer emits
"identifying_fields": ["licence_number"]
}]
"public_repo_url": "http://github.com/username/my_amazing_bot_repo
# <- URL where bot's code is publicly available
For more about transformers and incremental bots, see the examples.