Changes between Version 79 and Version 80 of AnnotationChecklist

Aug 23, 2011, 2:35:53 PM (9 years ago)

change to redirect to new curation guidelines pace


  • AnnotationChecklist

    v79 v80  
    1 = Publication Annotation =
    2  * From the publication Introduction, identify priority papers which are unannotated and flag as "priority" (this currently means add to the wiki, but later we will be able to do this in the "triage" tool).
    3  * Read paper and make most specific possible annotations for
    5 == Gene Ontology ==
    7 === Locating A Term ===
    8  * Refining Lucene searching within curation tool
    9      * If it was difficult to locate a term which you were using to search for a term, which was *already* a synonym in GO, make a note record in [wiki:Lucene_issues Problems identifying ontology terms], Kim can use this to configure the search to give better results. It is important that ALL frequently used terms appear in the top hits list
    10      * If term location was difficult, request synonyms from GO using SF tracker (helps future searching)
    12 === Documentation for Term Requests /Ontology editing === 
    13        * Request new specific terms as required. Always request specific "within" ontology terms  e.g.  (add examples)
    14        * Add SF tracker links
    15        * OR Use [Term Genie] Can be used to request terms using templates
    16        regulates, part of, involved in, takes place in, metabolism, part of cell component
    18 ==== General Ontology Editing Documentation ====
    19      * [ General Ontology structure] (Includes Cross Products and Logical Definitions)
    20      * [ Process] Every process should have a discrete '''beginning''' and '''end''' , and these should be clearly stated in the process term definition.
    21      * [ Function]
    22      * [ Component]
    23      * [ Ontology_relations]
    24      * for between ontology annotations use "annotation extensions" see below.
    25      * add links to other GO curation guidelines
    27 === Evidence Code Documentation ===
    28      * Gene ontology evidence code Documentation
    30 === Specific GO Annotation Guidelines ===
    31      * [ Downstream processes]
    32      * [ 'Binding' terms in MF]
    33         * [ Chip experiments NOT promoter binding]
    34         * In !PomBase, we'll only annotate to a GO 'protein binding' term if there's strong evidence that a physical interaction is direct, e.g. using purified proteins. Otherwise, we'll just do the Biogrid interaction annotation.
    35      * [ 'Response to' BP terms]
    36      * [ Regulation]
    37      * [ GO Annotation File format]
    38      * [ Other annotation conventions]
    39      * [ Geneva Camp guidelines some of this will be duplicated, break down into lings to specific sections]
    40      * [ Transcription Overhaul details]
    41      * [wiki:AnnotationRules  PomBase In house Annotation rules]
    45      * Add GO annotation extensions/ qualifiers (see list below)
    48 == Phenotype ==
    49    * A guideline from Maria at SGD: "in summary, we annotate phenotypes that represent the effects of mutations on the organism as a whole, rather than the effects on the product of the gene that is mutated. For example, if a particular mutation blocks the in vitro activity of an enzyme, we might annotate that with GO but would not make a phenotype annotation. However, if the same mutation causes the inability of yeast to utilize a certain nutrient, then we would capture that as a phenotype annotation."
    50    * Another thought from Midori: direct involvement or effects can usually be captured with GO terms, whereas indirect effects are better as phenotypes
    51     * example: tad3 subunit of tRNA adenosine deaminase - GO term for adenosine-to-inosine conversion; phenotype annotations for cell cycle arrest (PMID:17875641)
    52    * [ phenotype tracker] to request new terms
    53    * NOTE, at present we can't capture phenotypes of double/triple mutants. The ability to do this will be added to the curation tool in the future. It will also be possible to add allele information to single mutant phenotypes and double mutants, and these will be stored as allele records. As a temporary measure alleles can be captured for single mutants in the "temporary annotation extension using allele=xxx", or in Artemis using qualifier allele=xxx. To capture an allele for a double/triple mutant you will need to create an entry on the wiki. If the phenotype is "synthetic lethal" this is explicit in the BioGRID evidence code and can be inferred later from the evidence, but we'll still need to capture the alleles (on the wiki for now).
    55 == Capturing allele information (TEMPORARY SOLUTION in the annotation extension field of the curation tool) ==
    57    * use allele=name(details)
    58     * if paper names it, use that name
    59     * if it's a full deletion, use their name anyway, and put "deletion" in the details
    60     *  if it's not a full deletion, see "details" below
    61     * if it's a complete deletion, and not named, use allele=deletion then can leave details blank because they'd say "deletion" which would be redundant
    62     * if it's not named, and not a complete deletion, use allele=unnamed(details)
    63     * in details see:
    64       * [wiki:DescribingResidues describing residues modified or mutated]
    65       * list 'overexpression' in details if applicable
    66     * allele systematic ids will be gene systematic id dash integer, e.g. SPBC4.04c-2 SPBC4.04c-1, etc.  won't encode any details in identifier; stuff that info in properties, description, etc.
    69 == Modification (and other "substrate is" annotations) ==
    70     * Add the target as an annotation_extension=has_substrate(GeneDB_Spombe:substrate_gene_ID) to the function activity (or process if there is no function)
    71     * if the annotation is a protein modification, make the modification annotation explicitly on the modified molecule (and if known, add the modified residues). Later we will be able to do some of this annotation by inference.
    72     * For annotations which involve a "target gene" of this type which are not "protein modification" annotations we only capture the  target gene as an annotation extension, and the reciprocal annotation  (gene B is TARGET_OF Gene A) will be inferred later
    73     * See [wiki:DescribingResidues describing residues modified or mutated] for "residue=" syntax
    74     * can also use annotation_extension to capture other things e.g. residue=S1024|T1028, annotation_extension=during(GO:0042594);
    76 == Genetic and Physical interactions ==
    77     * [ Biogrid evidence code definitions]
    78     * [wiki:BiogridSymmetry List of which IGI & IPI are reciprocal]
    79     * Directionality of annotation - [ SGD Curator help]
    80      * Has a section on the direction of the interactions (bait/hit).  In general, the rescued gene is the bait and the rescuer is the hit.  Similarly, the enhanced gene is the bait and the enhancer is the hit. 
    81      * NEW epistasis (same pathway/ no additive effect) BioGRID are going to add this calling it  "asynthetic" (wt<a=b=ab).
    82       * Document these on the wiki for now. Later need to migrate my legacy annotations which are done with GO process/IGI? and qualifier "same pathway" (we do not require this is duplicated as a GO annotation because it should already be captured by the IMP for the single mutant)
    85 == Annotation Extensions Field (annotation extension/allele/residue/qualifier) ==
    86    * Annotation extension format: Relation(Database prefix:database_ID)
    87    * Relations in use:
    88     * [ OBO-format file of relations (GO site)]
    89     * [ examples of how "terms" would be defined if they were pre-composed (GO site)]
    90    * !PomBase list of terms created by annotation extension: [ external link] [http://sloth/test/view/object/cv/411?model=chado internal link
    91    * [wiki:Data_Types Data types used in Cross products annotation extensions and column_17]
    93      Multiple entries can be added comma (,) separated
    94      This field can temporarily be used to capture:
    95      * annotation extension=,
    96      * residue=  [wiki:DescribingResidues describing residues modified or mutated] (modification only?)
    97      * allele= (format see phenotype section) (phenotype only)
    98      * qualifier= ,
    99        * qualifiers allowed for GO:
    100          * contributes_to, colocalizes_with, NOT, constitutive, required, predominantly (only cellular component) predominantly
    101        * qualifiers allowed for phenotype:
    102          * condition(in_minimal_media, during_nutrient_limitation, at_high_temperature, in_presence_of_TBC, add rest of list), fitness_profiling, low_penetrance, high_penetrance, low_expressivity, high_expressivity, ...)
    103        * qualifiers allowed for modification:
    104      * col17=PR:[id]
    105       * identifier for the specific form of a gene product (at present, always protein) to which the annotation applies - use for splice variants, modified forms
    106       * use Protein Ontology entries; splice variants can also use !UniProt IDs
    107       * [ GO wiki page on spliceforms and column 17]
    108       * [ pre-release (i.e. latest) version of the Protein Ontology]
    110 == Protein feature annotation ==
    111    * Example NLS, signal sequence etc
    112    * Uses SO protein feature ontology, not yet included in curation tool, need to add on wiki
    113    * residues can be specified using range  see [wiki:DescribingResidues describing residues modified or mutated]
    116 == Other controlled curation ==
    117 For the complete list see:
    118 [ external link] [http://sloth/test/view/list/cv?numrows=200&page=1&model=chado internal link]
    120  * If has a human ortholog is it associated with any disease?
    121  * If has catalytic activity can you curate rate constant etc?
    122  * If DNA binding do you know the binding specificity?
    123  * Is the gene used in any experimental gene constructs?
    126 = Post-publication annotation checklist =
    128 For each gene you have annotated:
    130 == Status ==
    132 If the gene was not published previously you will need to update the status
    134 Add link to colour mapping here:
    136 Status descriptions here:
    138 http://sloth/test/view/object/cv/589?model=chado
    141 If the gene was previously an orphan you will need to change
    142 "sequence orphan, uncharacterised" to "sequence orphan, characterised" (note if an ortholog has been identified species distribution (below) will also change).
    144 == Names and product ==
    146  * Is product named as commonly referred to?
    147  * Names: do any synonyms need adding?
    148  * Name description: add any missing; are any phenotypes or other annotations which can be derived from names missing?
    150 == Protein families and domains ==
    152  * Protein families and domains, e.g. gas2
    153  * Have any protein domains been described in the paper, and if so are they in Pfam? If not, submit
    154  * If in Pfam, do domains make sense based on annotations?
    155  * Look at collection of proteins with this domain architecture
    156  * Look at species distribution of domain
    157  * manually curate any domain families which are not completely covered by protein family database (have false negatives)
    159 == GO supplementary IC (inferred from curator) annotation ==
    160  * Make any annotation (IC) which can be inferred by curator but are not explicitly annotated because they are not included in the parentage
    162 For example,
    163  * origin recognition complex (ORC) subunits can also get:
    164   * DNA replication preinitiation complex
    165   * pre-replicative complex
    166   * nuclear replication fork
    167  * cardiolipin biosynthesis > mitochondrial membrane organization
    169 (Could any of these even be experimentally supported)
    171 == Remove legacy IEA and NAS annotations not required or incorrect ==
    173  * Remove any TAS/NAS/ISS which are now covered by experiment
    174  * Automated mappings (IEA) will be repressed by experimental data. Are any IEA annotations not covered by your manual annotation? It should be possible to make a manual annotation to cover all automated mappings (explain how why)
    175   * OR, the mapping should be removed:
    176     * If the incorrect mapping is from SPKW: or SP_SL: access the !UniProt entry via the gene page and send a feedback to message !UniProt. Ivo Pedruzzi will fix it quickly.
    177     * Problematic Interpro mappings should be logged here:
    178     * Some "pombe kw mappings" have NAS evidence code (you will know these are mappings because they will not be visible in the Artemis curation tool. These will need to be deleted from the mapping file.
    181   * OR, the ontology should be fixed
    182  * Check for consistency with other annotations and other resources
    183  * Make sure all remaining ISS are made to an experimentally characterised ortholog.
    184 (If the gene in SGD is not annotated to a term, and you think it clearly should be, mail them to add it so that the ISS is supported. This is frequently required when annotation gene products which are not published.)
    185  * If the gene has an S. cerevisiae ortholog, check the annotations to the ortholog in SGD.
    186   * If the gene in SGD is not annotated to a term, and you think it clearly should be, mail them to add it so that the ISS is supported. This is frequently required when annotation gene products which are not published.
    187   * Can you make any further annotations based on what SGD has? (Note reasons why annotations cannot be transferred; 1:1 is easiest)
    189 == Species distribution ==
    190  If conserved to human and human/cerevisiae ortholog reported, predominantly single copy, nothing to do here
    191  * Otherwise does species distribution look OK? Pfam quick check
    192  * Does it make sense based on what the protein is doing?
    193  * Can the species distribution be extended? especially for orphans or fungal specific proteins
    197 Look at !UniProt annotation to see if anything is missing, do obvious fixes and flag papers for priority annotation (not so important)
    200 If in doubt, ask the author!
     1see CurationGuidelines