Changes between Version 79 and Version 80 of AnnotationChecklist


Ignore:
Timestamp:
Aug 23, 2011, 2:35:53 PM (9 years ago)
Author:
gomidori
Comment:

change to redirect to new curation guidelines pace

Legend:

Unmodified
Added
Removed
Modified
  • AnnotationChecklist

    v79 v80  
    1 = Publication Annotation =
    2  * From the publication Introduction, identify priority papers which are unannotated and flag as "priority" (this currently means add to the wiki, but later we will be able to do this in the "triage" tool).
    3  * Read paper and make most specific possible annotations for
    4 
    5 == Gene Ontology ==
    6 
    7 === Locating A Term ===
    8  * Refining Lucene searching within curation tool
    9      * If it was difficult to locate a term which you were using to search for a term, which was *already* a synonym in GO, make a note record in [wiki:Lucene_issues Problems identifying ontology terms], Kim can use this to configure the search to give better results. It is important that ALL frequently used terms appear in the top hits list
    10      * If term location was difficult, request synonyms from GO using SF tracker (helps future searching)
    11    
    12 === Documentation for Term Requests /Ontology editing === 
    13        * Request new specific terms as required. Always request specific "within" ontology terms  e.g.  (add examples)
    14        * Add SF tracker links
    15        * OR Use [Term Genie http://www.berkeleybop.org/obo/quickterm/GO] Can be used to request terms using templates
    16        regulates, part of, involved in, takes place in, metabolism, part of cell component
    17 
    18 ==== General Ontology Editing Documentation ====
    19      * [http://www.geneontology.org/GO.ontology.structure.shtml General Ontology structure] (Includes Cross Products and Logical Definitions)
    20      * [http://www.geneontology.org/GO.process.guidelines.shtml Process] Every process should have a discrete '''beginning''' and '''end''' , and these should be clearly stated in the process term definition.
    21      * [http://www.geneontology.org/GO.function.guidelines.shtml Function]
    22      * [http://www.geneontology.org/GO.component.guidelines.shtml Component]
    23      * [http://www.geneontology.org/GO.ontology-ext.relations.shtml Ontology_relations]
    24      * for between ontology annotations use "annotation extensions" see below.
    25      * add links to other GO curation guidelines
    26      
    27 === Evidence Code Documentation ===
    28      * Gene ontology evidence code Documentation http://www.geneontology.org/GO.evidence.shtml
    29 
    30 === Specific GO Annotation Guidelines ===
    31      * [http://wiki.geneontology.org/index.php/Guidelines_from_Annotation_Camp#Downstream_Process_guidelines Downstream processes]
    32      * [http://wiki.geneontology.org/index.php/Guidelines_from_Annotation_Camp#Binding_guidelines 'Binding' terms in MF]
    33         * [http://wiki.geneontology.org/index.php/Annotation_consistency_:_ChIP_experiments Chip experiments NOT promoter binding]
    34         * In !PomBase, we'll only annotate to a GO 'protein binding' term if there's strong evidence that a physical interaction is direct, e.g. using purified proteins. Otherwise, we'll just do the Biogrid interaction annotation.
    35      * [http://wiki.geneontology.org/index.php/Guidelines_from_Annotation_Camp#.27Response_to.27_guidelines 'Response to' BP terms]
    36      * [http://wiki.geneontology.org/index.php/Guidelines_from_Annotation_Camp#Use_of_Regulation_Terms Regulation]
    37      * [http://www.geneontology.org/GO.format.annotation.shtml GO Annotation File format]
    38      * [http://www.geneontology.org/GO.annotation.conventions.shtml Other annotation conventions]
    39      * [http://www.geneontology.org/GO.annotation.conventions.shtml Geneva Camp guidelines some of this will be duplicated, break down into lings to specific sections]
    40      * [http://wiki.geneontology.org/index.php/Transcription Transcription Overhaul details]
    41      * [wiki:AnnotationRules  PomBase In house Annotation rules]
    42 
    43 
    44      
    45      * Add GO annotation extensions/ qualifiers (see list below)
    46 
    47 
    48 == Phenotype ==
    49    * A guideline from Maria at SGD: "in summary, we annotate phenotypes that represent the effects of mutations on the organism as a whole, rather than the effects on the product of the gene that is mutated. For example, if a particular mutation blocks the in vitro activity of an enzyme, we might annotate that with GO but would not make a phenotype annotation. However, if the same mutation causes the inability of yeast to utilize a certain nutrient, then we would capture that as a phenotype annotation."
    50    * Another thought from Midori: direct involvement or effects can usually be captured with GO terms, whereas indirect effects are better as phenotypes
    51     * example: tad3 subunit of tRNA adenosine deaminase - GO term for adenosine-to-inosine conversion; phenotype annotations for cell cycle arrest (PMID:17875641)
    52    * [https://sourceforge.net/tracker/?group_id=65526&atid=2096431 phenotype tracker] to request new terms
    53    * NOTE, at present we can't capture phenotypes of double/triple mutants. The ability to do this will be added to the curation tool in the future. It will also be possible to add allele information to single mutant phenotypes and double mutants, and these will be stored as allele records. As a temporary measure alleles can be captured for single mutants in the "temporary annotation extension using allele=xxx", or in Artemis using qualifier allele=xxx. To capture an allele for a double/triple mutant you will need to create an entry on the wiki. If the phenotype is "synthetic lethal" this is explicit in the BioGRID evidence code and can be inferred later from the evidence, but we'll still need to capture the alleles (on the wiki for now).
    54 
    55 == Capturing allele information (TEMPORARY SOLUTION in the annotation extension field of the curation tool) ==
    56 
    57    * use allele=name(details)
    58     * if paper names it, use that name
    59     * if it's a full deletion, use their name anyway, and put "deletion" in the details
    60     *  if it's not a full deletion, see "details" below
    61     * if it's a complete deletion, and not named, use allele=deletion then can leave details blank because they'd say "deletion" which would be redundant
    62     * if it's not named, and not a complete deletion, use allele=unnamed(details)
    63     * in details see:
    64       * [wiki:DescribingResidues describing residues modified or mutated]
    65       * list 'overexpression' in details if applicable
    66     * allele systematic ids will be gene systematic id dash integer, e.g. SPBC4.04c-2 SPBC4.04c-1, etc.  won't encode any details in identifier; stuff that info in properties, description, etc.
    67 
    68 
    69 == Modification (and other "substrate is" annotations) ==
    70     * Add the target as an annotation_extension=has_substrate(GeneDB_Spombe:substrate_gene_ID) to the function activity (or process if there is no function)
    71     * if the annotation is a protein modification, make the modification annotation explicitly on the modified molecule (and if known, add the modified residues). Later we will be able to do some of this annotation by inference.
    72     * For annotations which involve a "target gene" of this type which are not "protein modification" annotations we only capture the  target gene as an annotation extension, and the reciprocal annotation  (gene B is TARGET_OF Gene A) will be inferred later
    73     * See [wiki:DescribingResidues describing residues modified or mutated] for "residue=" syntax
    74     * can also use annotation_extension to capture other things e.g. residue=S1024|T1028, annotation_extension=during(GO:0042594);
    75  
    76 == Genetic and Physical interactions ==
    77     * [http://wiki.thebiogrid.org/doku.php/experimental_systems Biogrid evidence code definitions]
    78     * [wiki:BiogridSymmetry List of which IGI & IPI are reciprocal]
    79     * Directionality of annotation - [http://www.yeastgenome.org/help/BiogridCuration.html SGD Curator help]
    80      * Has a section on the direction of the interactions (bait/hit).  In general, the rescued gene is the bait and the rescuer is the hit.  Similarly, the enhanced gene is the bait and the enhancer is the hit. 
    81      * NEW epistasis (same pathway/ no additive effect) BioGRID are going to add this calling it  "asynthetic" (wt<a=b=ab).
    82       * Document these on the wiki for now. Later need to migrate my legacy annotations which are done with GO process/IGI? and qualifier "same pathway" (we do not require this is duplicated as a GO annotation because it should already be captured by the IMP for the single mutant)
    83      
    84 
    85 == Annotation Extensions Field (annotation extension/allele/residue/qualifier) ==
    86    * Annotation extension format: Relation(Database prefix:database_ID)
    87    * Relations in use:
    88     * [http://www.geneontology.org/scratch/xps/go_annotation_extension_relations.obo OBO-format file of relations (GO site)]
    89     * [http://www.geneontology.org/scratch/xps/go_annotation_extension_examples.obo examples of how "terms" would be defined if they were pre-composed (GO site)]
    90    * !PomBase list of terms created by annotation extension: [http://sloth.sysbiol.cam.ac.uk/test/view/object/cv/411?model=chado external link] [http://sloth/test/view/object/cv/411?model=chado internal link
    91    * [wiki:Data_Types Data types used in Cross products annotation extensions and column_17]
    92    * TEMPORARY CAPTURE OF OTHER QUALIFIERS IN THE CURATION TOOL.
    93      Multiple entries can be added comma (,) separated
    94      This field can temporarily be used to capture:
    95      * annotation extension=,
    96      * residue=  [wiki:DescribingResidues describing residues modified or mutated] (modification only?)
    97      * allele= (format see phenotype section) (phenotype only)
    98      * qualifier= ,
    99        * qualifiers allowed for GO:
    100          * contributes_to, colocalizes_with, NOT, constitutive, required, predominantly (only cellular component) predominantly
    101        * qualifiers allowed for phenotype:
    102          * condition(in_minimal_media, during_nutrient_limitation, at_high_temperature, in_presence_of_TBC, add rest of list), fitness_profiling, low_penetrance, high_penetrance, low_expressivity, high_expressivity, ...)
    103        * qualifiers allowed for modification:
    104      * col17=PR:[id]
    105       * identifier for the specific form of a gene product (at present, always protein) to which the annotation applies - use for splice variants, modified forms
    106       * use Protein Ontology entries; splice variants can also use !UniProt IDs
    107       * [http://wiki.geneontology.org/index.php/GAF_Spliceform_Column_Proposal GO wiki page on spliceforms and column 17]
    108       * [http://pir.georgetown.edu/projects/pro/pro_wv.obo pre-release (i.e. latest) version of the Protein Ontology]
    109 
    110 == Protein feature annotation ==
    111    * Example NLS, signal sequence etc
    112    * Uses SO protein feature ontology, not yet included in curation tool, need to add on wiki
    113    * residues can be specified using range  see [wiki:DescribingResidues describing residues modified or mutated]
    114 
    115 
    116 == Other controlled curation ==
    117 For the complete list see:
    118 [http://sloth.sysbiol.cam.ac.uk/test/view/list/cv?numrows=200&page=1&model=chado external link] [http://sloth/test/view/list/cv?numrows=200&page=1&model=chado internal link]
    119 
    120  * If has a human ortholog is it associated with any disease?
    121  * If has catalytic activity can you curate rate constant etc?
    122  * If DNA binding do you know the binding specificity?
    123  * Is the gene used in any experimental gene constructs?
    124 
    125 
    126 = Post-publication annotation checklist =
    127 
    128 For each gene you have annotated:
    129 
    130 == Status ==
    131 
    132 If the gene was not published previously you will need to update the status
    133 
    134 Add link to colour mapping here:
    135 
    136 Status descriptions here:
    137 http://sloth.sysbiol.cam.ac.uk/test/view/object/cv/1297?model=chado
    138 http://sloth/test/view/object/cv/589?model=chado
    139 
    140 
    141 If the gene was previously an orphan you will need to change
    142 "sequence orphan, uncharacterised" to "sequence orphan, characterised" (note if an ortholog has been identified species distribution (below) will also change).
    143 
    144 == Names and product ==
    145 
    146  * Is product named as commonly referred to?
    147  * Names: do any synonyms need adding?
    148  * Name description: add any missing; are any phenotypes or other annotations which can be derived from names missing?
    149 
    150 == Protein families and domains ==
    151 
    152  * Protein families and domains, e.g. gas2
    153  * Have any protein domains been described in the paper, and if so are they in Pfam? If not, submit
    154  * If in Pfam, do domains make sense based on annotations?
    155  * Look at collection of proteins with this domain architecture
    156  * Look at species distribution of domain
    157  * manually curate any domain families which are not completely covered by protein family database (have false negatives)
    158  
    159 == GO supplementary IC (inferred from curator) annotation ==
    160  * Make any annotation (IC) which can be inferred by curator but are not explicitly annotated because they are not included in the parentage
    161 
    162 For example,
    163  * origin recognition complex (ORC) subunits can also get:
    164   * DNA replication preinitiation complex
    165   * pre-replicative complex
    166   * nuclear replication fork
    167  * cardiolipin biosynthesis > mitochondrial membrane organization
    168 
    169 (Could any of these even be experimentally supported)
    170 
    171 == Remove legacy IEA and NAS annotations not required or incorrect ==
    172 
    173  * Remove any TAS/NAS/ISS which are now covered by experiment
    174  * Automated mappings (IEA) will be repressed by experimental data. Are any IEA annotations not covered by your manual annotation? It should be possible to make a manual annotation to cover all automated mappings (explain how why)
    175   * OR, the mapping should be removed:
    176     * If the incorrect mapping is from SPKW: or SP_SL: access the !UniProt entry via the gene page and send a feedback to message !UniProt. Ivo Pedruzzi will fix it quickly.
    177     * Problematic Interpro mappings should be logged here: https://sourceforge.net/tracker/?group_id=36855&atid=605890
    178     * Some "pombe kw mappings" have NAS evidence code (you will know these are mappings because they will not be visible in the Artemis curation tool. These will need to be deleted from the mapping file.
    179 
    180 
    181   * OR, the ontology should be fixed
    182  * Check for consistency with other annotations and other resources
    183  * Make sure all remaining ISS are made to an experimentally characterised ortholog.
    184 (If the gene in SGD is not annotated to a term, and you think it clearly should be, mail them to add it so that the ISS is supported. This is frequently required when annotation gene products which are not published.)
    185  * If the gene has an S. cerevisiae ortholog, check the annotations to the ortholog in SGD.
    186   * If the gene in SGD is not annotated to a term, and you think it clearly should be, mail them to add it so that the ISS is supported. This is frequently required when annotation gene products which are not published.
    187   * Can you make any further annotations based on what SGD has? (Note reasons why annotations cannot be transferred; 1:1 is easiest)
    188 
    189 == Species distribution ==
    190  If conserved to human and human/cerevisiae ortholog reported, predominantly single copy, nothing to do here
    191  * Otherwise does species distribution look OK? Pfam quick check
    192  * Does it make sense based on what the protein is doing?
    193  * Can the species distribution be extended? especially for orphans or fungal specific proteins
    194 
    195 
    196 
    197 Look at !UniProt annotation to see if anything is missing, do obvious fixes and flag papers for priority annotation (not so important)
    198 
    199 
    200 If in doubt, ask the author!
     1see CurationGuidelines