Changes between Version 87 and Version 88 of GOAnnotationGuidelines


Ignore:
Timestamp:
Apr 28, 2015, 11:07:49 AM (6 years ago)
Author:
mah79
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GOAnnotationGuidelines

    v87 v88  
    33== Background Reading ==
    44
    5    * [http://www.geneontology.org/GO.contents.doc.shtml#annotation General GOC documentation]
    6    * [http://geneontology.org/page/annotation GOC annotation documentation]
    7       *  [http://www.geneontology.org/GO.evidence.shtml GOC evidence code documentation]
    8       * [http://www.geneontology.org/GO.format.annotation.shtml GO Annotation File formats]
    9    * [http://www.geneontology.org/GO.annotation.conventions.shtml GOC annotation conventions]   
     5 * [http://www.geneontology.org/GO.contents.doc.shtml#annotation General GOC documentation]
     6 * [http://geneontology.org/page/annotation GOC annotation documentation]
     7  * [http://www.geneontology.org/GO.evidence.shtml GOC evidence code documentation]
     8  * [http://www.geneontology.org/GO.format.annotation.shtml GO Annotation File formats]
     9 * [http://www.geneontology.org/GO.annotation.conventions.shtml GOC annotation conventions]   
    1010 
    1111
    1212== Requesting New GO Terms ==
    1313
    14    * [http://go.termgenie.org/ TermGenie main page] If the term you need follows one of the patterns supported by !TermGenie, you can use it and get a stable ID immediately
    15    * [http://go.termgenie.org/help/index.html TermGenie GO help] (includes how to set up user access)
    16    * Or request your term(s) on the [https://sourceforge.net/p/geneontology/ontology-requests/ GO Ontology Requests tracker] at !SourceForge  include -- name, text definition, parent(s), synonyms, reference, etc.
     14 * [http://go.termgenie.org/ TermGenie main page] If the term you need follows one of the patterns supported by !TermGenie, you can use it and get a stable ID immediately
     15 * [http://go.termgenie.org/help/index.html TermGenie GO help] (includes how to set up user access)
     16 * Or request your term(s) on the [https://sourceforge.net/p/geneontology/ontology-requests/ GO Ontology Requests tracker] at !SourceForge -- include name, text definition, parent(s), synonyms, reference, etc.
    1717
    18   * When should I request a new GO term ?
    19     * 1. To add specificity, providing this should not be done with an annotation extension   
    20     * 2. A general rule is that when a GO process term is representing a single process, a single GO term should be used. 
    21         * For example, request a more specific term which links 2 parents negative regulation of SREBP signaling pathway by transcription factor catabolism with parents: GO:2000639 negative regulation of SREBP signaling pathway GO:0010620 negative regulation of transcription by transcription factor catabolism
     18 * When should I request a new GO term?
     19  * 1. To add specificity, providing this should not be done with an annotation extension   
     20  * 2. A general rule is that when a GO process term is representing a single process, a single GO term should be used. 
     21   * For example, request a more specific term which links 2 parents negative regulation of SREBP signaling pathway by transcription factor catabolism with parents: GO:2000639 negative regulation of SREBP signaling pathway GO:0010620 negative regulation of transcription by transcription factor catabolism
    2222
    23 == Reporting errors  to GO ==
     23== Reporting errors to GO ==
    2424
    25   * Ontology
    26   * Annotation [https://sourceforge.net/p/geneontology/annotation-issues/ GO Annotation Issues tracker] at !SourceForge - use this to raise questions for the GO group, or to report mapping problems (see below)
    27   * Protein2GO
     25 * Ontology
     26 * Annotation [https://sourceforge.net/p/geneontology/annotation-issues/ GO Annotation Issues tracker] at !SourceForge - use this to raise questions for the GO group, or to report mapping problems (see below)
     27 * Protein2GO
    2828
    2929== GO Annotation Extensions ==
    3030
    31   * GO annotation extensions capture specificity that would be undesirable in the ontology. See the [http://wiki.geneontology.org/index.php/Annotation_Extension GOC annotation extension documentation]
    32   * Basic Format: relation(Database prefix:database_ID). Additional format/syntax documentation [AnnotationExtensionSyntax PomBase Annotation Extension Syntax documentation]  [http://wiki.geneontology.org/index.php/Annotation_Extension#The_basic_format GOC Annotation Extension Syntax documentation]
    33   * Which relations can be used in GO extensions?   [wiki:ListOfRelationsForGO List of relations used by PomBase for GO] Graphical View of Relations [http://www.ebi.ac.uk/QuickGO/AnnotationExtensionRelations.html}
    34   * [OntologiesInUse Ontologies used in col 16 and examples of uses]
    35   * When to use an extension versus requesting a new term? See [wiki:PrePostComposeGO PomBase "Precompose or postcompose?" wiki]
     31 * GO annotation extensions capture specificity that would be undesirable in the ontology. See the [http://wiki.geneontology.org/index.php/Annotation_Extension GOC annotation extension documentation]
     32 * Basic Format: relation(Database prefix:database_ID). Additional format/syntax documentation [AnnotationExtensionSyntax PomBase Annotation Extension Syntax documentation]  [http://wiki.geneontology.org/index.php/Annotation_Extension#The_basic_format GOC Annotation Extension Syntax documentation]
     33 * Which relations can be used in GO extensions?  [wiki:ListOfRelationsForGO List of relations used by PomBase for GO] Graphical View of Relations [http://www.ebi.ac.uk/QuickGO/AnnotationExtensionRelations.html}
     34 * [OntologiesInUse Ontologies used in col 16 and examples of uses]
     35 * When to use an extension versus requesting a new term? See [wiki:PrePostComposeGO PomBase "Precompose or postcompose?" wiki]
    3636
    3737 
    38  == Specific GO Annotation Guidelines ==
    39 
     38== Specific GO Annotation Guidelines ==
    4039
    4140=== Annotation Specificity ===
    4241
    43   At PomBase  we have tagged some terms  as 'not to be used for direct annotation' because it should always or usually)  be possible to make a more specific annotation.
    44   Examples include:
     42At !PomBase we have tagged some terms as 'not to be used for direct annotation' because it should always or usually) be possible to make a more specific annotation. Examples include:
    4543  * DNA replication (meiotic or mitotic?)
    4644  * cell cycle/regulation of cell cycle (meiotic or mitotic? which transition?)
    4745  * splicing via the spliceosome -> nuclear mRNA cis splicing, via spliceosome  (no trans splicing in pombe)
    4846  * cytokinesis (should use terms under the appropriate cell cycle mitotic (usually) or meiotic)
    49   * cell wall organization  -> fungal-type cell wall organization or biogenesis or one of its descendants
     47  * cell wall organization -> fungal-type cell wall organization or biogenesis or one of its descendants
    5048  * transport (vesicle-meidated? transmembrane? etc)
    5149  * sporulation ->ascospore formation  or children
    5250
    53  === GO annotation and Redundancy ===
     51=== GO annotation and Redundancy ===
    5452
    55  If you are annotating a newer paper, and it repeats older well annotated experiments,  you do not need to capture the annotation.
     53 If you are annotating a newer paper, and it repeats older well annotated experiments, you do not need to capture the annotation.
    5654
    5755
    5856=== Biological Process ===
    5957
    60    * Every process should have a discrete '''beginning''' and '''end''', and these should be clearly stated in the process term definition. Note, however, that this work is still in progress for GO.
     58 * Every process should have a discrete '''beginning''' and '''end''', and these should be clearly stated in the process term definition. Note, however, that this work is still in progress for GO.
    6159
    62    *  When to make a GO process  annotation 
    63       *   GO or phenotype  ?
    64          * We use GO annotations to describe direct involvement in a process, or its regulation (see below for more detail on regulation). We don't  annotate indirect upstream effects. Caution must therefore be used when using IMP to make process annotation, as it often isn't clear whether the effect is directly involved, regulation, or only affecting a process indirectly. Often a number of phenotypes are used to make a GO annotation.
    65          * Example 1 dil1 in /curs/c7fac5251ee4f493/ro/ where annotation to dynein-driven meiotic oscillatory nuclear movement with IMP is based on a combination of phenotypes :
    66               * decreased meiotic recombination
    67               * horsetail movement abolished
    68               * unequal meiotic chromosome segregation
    69               * decreased protein localization to microtubule cytoskeleton
    70          * Example 2 nda3 and mug164 annotations to intracellular distribution of mitochondria with IMP is based on
    71              * abolished mitochondrion inheritance
    72              * mitochondrial aggregation at cell tip
    73              * normal mitochondrial fission
    74              * normal mitochondrial fusion
    75 
    76    * Regulation
    77        *  [http://www.geneontology.org/GO.annotation.conventions.shtml#regulationTerms  GOC Regulation]
    78        *  Lots of ongoing discussion among !PomBase curators and with other GO annotators, about when to annotate to regulation terms and when not.
    79        *  This is also connected with filling in start and end details for the process terms that still need them.
    80        
    81     * Transcription
    82        * [http://wiki.geneontology.org/index.php/Transcription Transcription Overhaul details]
    83        * For  "Transcription factors" see Molecular Function below
    84        * Annotate to regulation of transcription (with approtiate gene specific extensions) only if there isn't enough data to support annotating to a transcription factor MF term
    85  
    86     * Regulation of transcription and signalling pathways
    87        * Regulation of transcription and signalling pathways Consists of four components:  A signal transduction cascade, a transcription factor, the process of transcription and a downstream process. The signal transduction cascade regulates the downstream process ONLY. The transcription factor regulates transcription AND the downstream process.  For example:
    88             * Transcription factor ste11 should be annotated to positive regulation of transcription involved in sporulation AND signal transduction involved in positive regulation of sporulation.
    89             * Mam2 in contrast is only annotated to signal transduction involved in positive regulation of sporulation.
    90 
    91     * Transport and Localization
    92        * Note the distinction between "transport" and "localization" and always use the appropriate branch. Localization is more general and can involve establishment or maintenance at a specific location, whereas transport involves directed movement
    93        * Note that all transmembrane transporters should have an annotation to the process of  transmembrane transport.
    94  
    95     * Response to .....
    96       * We do not usually annotate to "response to stress" terms unless we can say specifically which process is alteredmi.e. regulation of cytoplasmic translation in response to stress
    97       * See also GOC documentation [http://www.geneontology.org/GO.annotation.conventions.shtml#response 'Response to' BP terms]
    98  
    99     * Cell polarity related
    100        * If you are annotating a process that directly affects cell shape, select one of the terms that specifies "regulating cell shape", e.g. [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0071963 establishment or maintenance of cell polarity regulating cell shape (GO:0071963)] or one of its descendants.
     60 * When to make a GO process annotation 
     61  * GO or phenotype?
     62   * We use GO annotations to describe direct involvement in a process, or its regulation (see below for more detail on regulation). We don't  annotate indirect upstream effects. Caution must therefore be used when using IMP to make process annotation, as it often isn't clear whether the effect is directly involved, regulation, or only affecting a process indirectly. Often a number of phenotypes are used to make a GO annotation.
     63    * Example 1 dil1 in /curs/c7fac5251ee4f493/ro/ where annotation to dynein-driven meiotic oscillatory nuclear movement with IMP is based on a combination of phenotypes:
     64     * decreased meiotic recombination
     65     * horsetail movement abolished
     66     * unequal meiotic chromosome segregation
     67     * decreased protein localization to microtubule cytoskeleton
     68    * Example 2 nda3 and mug164 annotations to intracellular distribution of mitochondria with IMP is based on
     69     * abolished mitochondrion inheritance
     70     * mitochondrial aggregation at cell tip
     71     * normal mitochondrial fission
     72     * normal mitochondrial fusion
     73 * Regulation
     74  * [http://www.geneontology.org/GO.annotation.conventions.shtml#regulationTerms  GOC Regulation]
     75   * Lots of ongoing discussion among !PomBase curators and with other GO annotators, about when to annotate to regulation terms and when not.
     76   * This is also connected with filling in start and end details for the process terms that still need them.
     77 * Transcription
     78  * [http://wiki.geneontology.org/index.php/Transcription Transcription Overhaul details]
     79   * For "Transcription factors" see Molecular Function below
     80   * Annotate to regulation of transcription (with approtiate gene specific extensions) only if there isn't enough data to support annotating to a transcription factor MF term
     81 * Regulation of transcription and signalling pathways
     82  * Regulation of transcription and signalling pathways Consists of four components: A signal transduction cascade, a transcription factor, the process of transcription and a downstream process. The signal transduction cascade regulates the downstream process ONLY. The transcription factor regulates transcription AND the downstream process. For example:
     83    * Transcription factor ste11 should be annotated to positive regulation of transcription involved in sporulation AND signal transduction involved in positive regulation of sporulation.
     84    * Mam2 in contrast is only annotated to signal transduction involved in positive regulation of sporulation.
     85 * Transport and Localization
     86  * Note the distinction between "transport" and "localization" and always use the appropriate branch. Localization is more general and can involve establishment or maintenance at a specific location, whereas transport involves directed movement
     87   * Note that all transmembrane transporters should have an annotation to the process of transmembrane transport.
     88 * Response to .....
     89   * We do not usually annotate to "response to stress" terms unless we can say specifically which process is altered, e.g. regulation of cytoplasmic translation in response to stress
     90   * See also GOC documentation [http://www.geneontology.org/GO.annotation.conventions.shtml#response 'Response to' BP terms]
     91 * Cell polarity
     92  * If you are annotating a process that directly affects cell shape, select one of the terms that specifies "regulating cell shape", e.g. [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0071963 establishment or maintenance of cell polarity regulating cell shape (GO:0071963)] or one of its descendants.
    10193
    10294
    10395=== Molecular Function ===
    10496   
    105   * Protein binding
    106     * In !PomBase, we'll only annotate to a GO 'protein binding'  GO:0005515  term if there's strong evidence that a physical interaction is direct, e.g. using purified proteins. Otherwise, just do the BioGRID interaction annotation.
    107     * We do not use gene product specific, or protein family specific, or GO function specific GO protein binding descendents (this would take  too long, and can be done with a query)
    108     * However we do use "domain specific' GO binding terms to specify binding toa specific region in a target protein
    109  
    110   * DNA binding  - for sequence-specific DNA binding, use an annotation extension with 'occurs_at' and a SO ID (also see [#GOAnnotationExtensions] above)
    111     *  update [http://wiki.geneontology.org/index.php/Annotation_consistency_:_ChIP_experiments chromatin immunoprecipitation (ChIP) experiments NOT promoter binding]
    112    
    113 
    114   * Transcription factors - wherever possible make annotation to both:
    115      *  1. transcription factor activity term  (sequence-specific DNA binding transcription factor activity (GO:0003700) or a descendent)  twith any target genes as has_regulation_target extensions and a
    116      *  2. DNA binding term,  "transcription regulatory region sequence-specific DNA binding (GO:0000976)" or a descendent   (most commonly RNA polymerase II core promoter proximal region sequence-specific DNA binding GO:0000978) with any DNA binding motifs ar has_direct_input(SO:ID)
    117         * Otherwise the TF won't be annotated to DNA binding, because the terms are connected by has_part in GO).  (YOu might need to use ISS or IC to get both)
    118         * Can also put happens_during extensions on the TF activity term to capture stress, cell cycle phase, etc.
    119            * Example: [http://www.ncbi.nlm.nih.gov/pubmed/23231582 PMID:23231582] describes transcription during phosphate starvation. Mutations in pho7 or csk1 affect the -phosphate expression profile. For pho7 they also do ChIP-Seq and TAP assays, and some assays with a reporter construct, to establish that it acts as a transcription factor.
    120            * So I annotated pho7 to GO:0000978 (IPI with targets identified by ChIP-Seq) and GO:0001077 (IMP - Fig 4, plus can interpret Fig 2 ChIP-Seq as also supporting in light of Fig 4; have also thought about whether it might be good enough for IDA), with extensions on both to indicate phosphate starvation -- happens_during(GO:0016036) -- and the target genes highlighted in the text. In contrast, for csk1 I just used the BP GO:0045944 'positive regulation of transcription from RNA polymerase II promoter' with IMP. Full details in the [http://curation.pombase.org/pombe/curs/efb40309f7d22a73 curation session].
     97 * Protein binding
     98  * In !PomBase, we'll only annotate to a GO 'protein binding' GO:0005515 term if there's strong evidence that a physical interaction is direct, e.g. using purified proteins. Otherwise, just do the BioGRID interaction annotation.
     99  * We do not use gene product specific, or protein family specific, or GO function specific GO protein binding descendents (this would take  too long, and can be done with a query)
     100  * However we do use "domain specific' GO binding terms to specify binding toa specific region in a target protein
     101 * DNA binding  - for sequence-specific DNA binding, use an annotation extension with 'occurs_at' and a SO ID (also see [#GOAnnotationExtensions] above)
     102  * Don't use "promoter binding" terms if the only evidence is chromatin immunoprecipitation (ChIP); use a "chromatin" cellular component term instead. See the [http://wiki.geneontology.org/index.php/Annotation_consistency_:_ChIP_experiments GO wiki page on ChIP experiments]. Specify where with 'coincident_with(SO:nnn)' extensions.
     103 * Transcription factors - wherever possible make annotation to both:
     104  * 1. a transcription factor activity term (sequence-specific DNA binding transcription factor activity (GO:0003700) or a descendent) with any target genes as has_regulation_target extensions, and optionally a promoter in an occurs_at(SO:nnn) extension; and a
     105  * 2. a DNA binding term, "transcription regulatory region sequence-specific DNA binding (GO:0000976)" or a descendent (most commonly RNA polymerase II core promoter proximal region sequence-specific DNA binding GO:0000978) with any DNA binding motifs ar has_direct_input(SO:ID)
     106  * Otherwise the TF won't be annotated to DNA binding, because the terms are connected by has_part in GO).  (You might need to use ISS or IC to get both)
     107   * Can also put happens_during extensions on the TF activity term to capture stress, cell cycle phase, etc.
     108   * Example: [http://www.ncbi.nlm.nih.gov/pubmed/23231582 PMID:23231582] describes transcription during phosphate starvation. Mutations in pho7 or csk1 affect the -phosphate expression profile. For pho7 they also do ChIP-Seq and TAP assays, and some assays with a reporter construct, to establish that it acts as a transcription factor.
     109    * So I annotated pho7 to GO:0000978 (IPI with targets identified by ChIP-Seq) and GO:0001077 (IMP - Fig 4, plus can interpret Fig 2 ChIP-Seq as also supporting in light of Fig 4; have also thought about whether it might be good enough for IDA), with extensions on both to indicate phosphate starvation -- happens_during(GO:0016036) -- and the target genes highlighted in the text. In contrast, for csk1 I just used the BP GO:0045944 'positive regulation of transcription from RNA polymerase II promoter' with IMP. Full details in the [http://curation.pombase.org/pombe/curs/efb40309f7d22a73 curation session].
    121110 
    122111=== Gene Product Forms, or "Column 17" ===
    123112 
    124   * Identifier for the specific form of a gene product, for example to describe if a particular cellular location is observed with a specific modification of a protein.
    125     * For modified forms of proteins (e.g. phosphorylated, methylated) use Protein Ontology entries (PR:[id]) request from [https://sourceforge.net/p/pro-obo/term-requests/ PRO tracker ] or [http://pir.georgetown.edu/cgi-bin/pro/race_pro Race Pro]
    126     * splice variants can use !PomBase splice variant IDs (no examples curated yet)
     113 * Identifier for the specific form of a gene product, for example to describe if a particular cellular location is observed with a specific modification of a protein.
     114  * For modified forms of proteins (e.g. phosphorylated, methylated) use Protein Ontology entries (PR:[id]) request from [https://sourceforge.net/p/pro-obo/term-requests/ PRO tracker ] or [http://pir.georgetown.edu/cgi-bin/pro/race_pro Race Pro]
     115  * splice variants can use !PomBase splice variant IDs (no examples curated yet)
    127116
    128117=== GO supplementary IC (inferred from curator) annotation ===
    129118
    130   * Make any annotation (IC) which can be inferred by a curator but are not implicitly annotated by transitivity (i.e. because not included in the term ancestry). For example,
     119 * Make any annotation (IC) which can be inferred by a curator but are not implicitly annotated by transitivity (i.e. because not included in the term ancestry). For example,
    131120  * [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0000808 origin recognition complex (ORC; GO:0000808)] subunits can also get:
    132121  * [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0031261 DNA replication preinitiation complex (GO:0031261)]
     
    135124  * [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0032049 cardiolipin biosynthesis (GO:0032049)] implies [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0007006 mitochondrial membrane organization (GO:0007006)]
    136125  * Also check whether any of these can be experimentally supported by data in the paper
     126 * An IC annotation inferred from annotations from one publication use the same references as the "from" annotations. If an IC is based on putting together multiple "from" annotations that come from more than one paper, use GO_REF:0000036. Do not use GO_REF:0000001.
    137127
    138128----