Canto v1253
Community curation
Questions? Contact curators ...
Configuring Canto and loading data

After the software is installed some configuration is needed.

If you chose the recommended Docker installation precedure then the commands below will need to be run inside the Canto container. The suggested way to do that to use the canto_docker script as a prefix to the commands below.

So for example to load a genes file when Canto is running via a Docker container, instead of:

./script/ --genes genes_file.tsv --for-taxon 4896

from instead your canto git check out, add your genes_file.tsv to the import_export directory and the run this command in the canto-space directory created in the installed section:

./canto/script/canto_docker ./script/ --genes \
    /import_export/genes_file.tsv --for-taxon 4896

Only three host directories (canto, data and import_export) are visible inside the container so reading and writing of files should be via those directories. In particular, as in the example above, datasets for loading should be added to your import_export directory as created in the installation step.

Creating users

To manage sessions and users from the web interface there needs to be at least one "admin" user. Users can be added with the script. For example:

./script/ --person "Susan Testuser" secret_password 0000-0001-5000-0007 admin

The secret_password is stored as a SHA1 hash in the database rather than as plain text.

For more information on ORCIDs visit

Run with no arguments for a longer description.

Loading data

Canto can operate in two modes: "single organism" and "multi organism". Single organism mode is activated by setting the instance_organism configuration option. Multi-organism mode is assumed otherwise. See the instance_organism section in the configuration file documentation for a full description of the two modes.

The default implementation stores the details of the organism and genes for annotation in Canto's own database. The in the sections below loads data from flat files into Canto's database.

But it's also possible to configure "adaptors" to retreive these details as needed from an external database or webserver. At PomBase for example, gene information is read from the Chado curation database. See the configuration_file documentation for details of how to configure the adaptors.

In the following sections "single organism" mode is assumed. To run Canto in that mode you will need to load at least one organism, a list of genes and one or more ontologies before using Canto.


Add an organism using this command in the canto directory:

./script/ --organism <genus> <species> <taxon_id>

At least one organism is needed in the Canto database before genes can be loaded.

Gene data

Load genes with:

./script/ --genes genes_file.tsv --for-taxon 4896

All genes in an input file must be from one organism. Use the --for-taxon argument with an NCBI taxon ID to specify the organism, which needs to have been loaded with the --organism option (see above).

gene data format

A gene data file consists of four tab separated columns with no header line. The columns are:

There is a small example file in the test directory:

./script/ --genes t/data/pombe_genes.txt --for-taxon 4896

Ontology terms

OBO format ontology data can be imported or updated with:

./script/ --ontology file_1.obo [--ontology file_2.obo ...]

Or if you have a dockerised Canto:

./canto/script/canto_docker ./script/ \
   --ontology file_1.obo [--ontology file_2.obo ...]

If you need to import multiple ontology files, they all must be included in the same command line:

./script/ --ontology ontology_file.obo \
   --ontology another_ontology_file.obo

When updating existing ontologies in Canto, all ontologies must be updated with the same command.

The OBO file can also be given by URL. eg.

./script/ --ontology \

Each ontology must be configured in the available_annotation_type_list section of the canto.yaml file before it can be used in the interface.