wiki:CurationToolFeatures

Canto Overview

To support the in-depth literature curation of fission yeast by expert curators and the fission yeast research community, we are developing an open source, curation tool that will provide:

  • Web-based interface (multi user at local and remote sites)
  • Allow manual literature curation by trained curators or paper authors
  • generic
  • Extendible to handle additional defined datatypes as required (i.e. public or custom ontologies)

Literature Management

  • Support literature management and triage for the literature corpus of a specified organism
    • Retrieve publications from PubMed? for a specified domain and range with a specified frequency
    • Support triage by curators into configurable pre-defined classes and annotation priorities (or flag for community curation) configure options
    • triage types (allow triaging according to organism specific values) add pombe examples

Annotation management

  • provide individual log-ins to track changes made by specific curators.
  • provide reports to allow management of curation workload
    • Add list of report types *curator specific reports and personal curation history
  • Allow genes to be flagged as "annotation complete"
  • Session checking facility to allow "approval" of curated sessions by a curator
  • Support contributor details (Name/ E-mail/ lab/ subject etc)

Curation Interface

General Features

A step-by-step intuitive user interface will make it possible for authors to "curation session for a paper" to be assigned to an author to allow them curate their own papers with no curation background

Specific features

  • Log in
  • Support annotations the most specific to controlled vocabularies
    • Allow organism specific configuration of ontologies (extendible) (So far fission yeast uses GO,FYPO, protein modification (MOD) etc)
    • Enable curators to search and browse the required vocabularies to select appropriate terms
    • Present any existing annotation from a paper
    • Support selection of genes and gene name disambiguation
    • Allow Request of (and temporary annotation to) new ontology terms (probably later using a webservice to term genie in some intrances)
    • Support the use of annotation ontologies for all ontology types based on user configuration (add details)
    • Allow the use of subsets of ontologies for curation (specified in a configuration file), for example for fission yeast allow the use of a version of GO which any terms which are "taxon restricted" as not applicable to yeast are not presented to the user
    • Support for annotation extensions
    • Allowing the transfer of similar annotations to multiple genes in a single interface).
  • Specific support for phenotype annotation
    • Support annotation of phenotypes to single or multiple genes (i.e genetic interactions)
    • allele data, conditions
  • Specific support for GO annotation
    • and column 17
    • negative annotation
  • Support the curation of terms which are not experimental literature based i.e not attached to specific papers (inferences from sequence similarity and curator inferences based on the body of experimental literature across multiple publications)
  • Support single page annotation input for expert users (see brainstorming wiki)

User support

  • Provide a curator feedback for help, linked to the helpdesk

Data types

Annotation type format/source notes status
process/function/component Gene ontology still need to handle extensions, NOT, contributes_to included
phenotype FYPO still need to handle alleles/ extensions/multi gene phenotypes included, partial
PT Modification MOD
genetic interaction BioGRID evidence codes
physical interaction BioGRID evidence codes
protein feature Sequence ontology subset
complementation - IDs from other organisms -

support for protein coding genes and non-coding RNAs

Curator Help

  • New Term suggestions, try to match to existing terms and suggest "did you mean"
  • Block terms which are not used in annotation (configurable list?) component part terms, note to make it clear that these should not be used for direct annotations

Annotation editing updating/correcting

  • Curators need to discuss, how will this happen in practice (delete old, add new, special cases) Things to consider
  • updating gene products
  • handling obsolete terms (this may not be a problem)
    • Do not allow *selection* of obsolete terms but allow *search or obsolete terms as these often suggest alternative term in comment (should not need to do this if old obsolete term names are always included as related synonyms vw)
    • procedures for curators to reannotated when terms have been made obsolete . Existing annotations to obsolete terms should persist until fixed by curators
  • handling secondary IDs after term merges

Quality Control

  • Support quality control of input syntax (free text will be minimal all terms used should be selected from ontologies, vocabularies or lists)
  • managing obsolete or out of data annotations
  • logical consistency, and biological validity
  • implement of ‘rules’ to improve annotation consistency and coverage
    • trivial example reporting to curators that gene products annotated to the GO function ‘histone deacetylase activity’ should also be annotated to the GO biological process ‘histone deacetylase’, but also more complex ‘within’ and ‘between’ ontology annotation rules to be defined by curators over time (for example can you infer this GO process from this phenotype)
  • present the curator suggested annotations from automated analysis pipelines, for approval or rejection.
  • present the user with suggested annotations based on the annotations they make
  • statistical approaches for suggested annotations will be applied, as used by the UniProt? Knowledge Base
  • prevent syntactical errors

Import /Export

  • Support import and export of annotation in flat file format formats GPAD/ GPI /BioGRID
  • Provide for data import/export to Chado database

Tool Admin

  • Other communities can install the tool and configure with their own literature corpus, gene lists, ontologies,

range of allowed annotation extensions, taxon restrictions

  • procedures will be in place to automate regular updates required from external resources e.g. retrieving literature, updating BioGRID etc

Other /later

  • Incorporate existing text-mining software (e.g. Textpresso) to allow full-text manuscripts to be text-mined and tagged.
  • partially automating the capture of inferred information for unpublished genes based on experimentally annotated orthologs in other organisms
  • a reduced set of curation options will be available to participants in the community curation project.

Specific Tool description

Last modified 8 years ago Last modified on Feb 17, 2013, 10:38:08 AM