wiki:GOAnnotationGuidelines

Version 28 (modified by vw253, 6 years ago) (diff)

--

PomBase GO Annotation Guidelines

GO Consortium Documentation - Ontology

These pages describe the overall structure of GO and scope of each main branch:

GO Consortium Documentation - Annotation

All of the general annotation documentation and recommendations on GO web and wiki pages are applicable. Links to some of the most useful:

Requesting GO Terms

  • Requesting new terms

GO Annotation Extensions

GO annotation extensions capture specificity that would be undesirable in the ontology.

A general rule is that when a GO process term is representing a single process, a single GO term should be requested. For example, negative regulation of SREBP signaling pathway by transcription factor catabolism with parents: GO:2000639 negative regulation of SREBP signaling pathway GO:0010620 negative regulation of transcription by transcription factor catabolism

An exception to this rule is when a signalling pathway is activated by a number of different stresses. In this case the stress will be added as a "during..." extension. The rationale for this exception is that the specific annotations do not offer any obvious benefits to users (for enrichments etc). This would change if 2 distinct pathways were observed with distinct gene products annotated for each pathway (this test can be used to make decisions about further proposed exceptions). Note that you should continue to make a concurrent "response to x stress" for these gene products.

Specific GO Annotation Guidelines for PomBase

Annotation Specificity

Biological Process

Every process should have a discrete beginning and end, and these should be clearly stated in the process term definition. Note, however, that this work is still in progress for GO.

  • Regulation
    • Lots of discussion going on among PomBase curators and with other GO annotators, about when to annotate to regulation terms and when not. It's also connected with filling in start and end details for the process terms that still need them. Conclusions will be posted or linked here. (2012-06-07)
  • Transcription
    • See "Transcription factors" under Molecular Function below - includes example
      • Annotate to regulation of transcription (with extensions as appropriate) only if there isn't enough data to support annotating to a transcription factor MF term (descendant of GO:0003700 sequence-specific DNA binding transcription factor activity); if there is a transcription factor MF annotation, a transcription regulation BP annotation will be inferred by transitivity
    • All fission yeast annotations for RNAi so far should be to the term chromatin silencing by small RNA (GO:0031048)
      • Note that the GO term RNA interference (RNAi; GO:0016246) is not an ancestor of GO:0031048, because in GO RNAi is defined more strictly, according to the original "post-transcriptional" usage.
  • Regulation of transcription and signalling pathways

Regulation of transcription and signalling pathways Consists of four components: a signal transduction cascade, a transcription factor, the process of transcription and a downstream process. The signal transduction cascade regulates the downstream process ONLY. The transcription factor regulates transcription AND the downstream process. For instance, the transcription factor ste11 should be annotated to positive regulation of transcription involved in sporulation AND signal transduction involved in positive regulation of sporulation. Mam2 in contrast is only annotated to signal transduction involved in positive regulation of sporulation.

  • Cytokinesis
  • DNA replication
  • Metabolic processes
    • Always check that "metabolic process" terms used have "cellular metabolic/biosynthetic/catabolic process" parentage; if not, request a change in GO
  • Sporulation
    • use ascospore formation (GO:0030437) or children
    • Sporulation terms are in the "mapping to specific terms" file

Cellular Component

  • You can usually use the specific "nuclear x" or "cytoplasmic x" macromolecular complex terms.
    • In particular, because fission yeast has no nuclear envelope breakdown during mitosis, you can always annotate to the "nuclear" versions of terms for chromosomes, chromatin regions, etc., e.g.
    • These terms are in the "mapping to specific terms" file

Molecular Function

  • Protein binding - annotate to GO:0005515 (or an allowable descendant) if there is enough information to conclude that there is a direct physical interaction. Otherwise just make BioGRID interaction annotation(s).
    • use IPI evidence; a 'with' entry is mandatory
    • note that GO:0019899 'enzyme binding' is in the do-not-annotate subset
    • we haven't (yet) tried to systematically delete older protein binding annotations that may not fit the direct interaction criterion
    • some protein binding annotations could presumably be inferred automatically from enzyme activity + input annotations (e.g. a protein kinase and a substrate identified in an extension)
  • DNA binding
    • for sequence-specific DNA binding, can use an annotation extension with 'occurs_at' and a SO ID (also see #GOAnnotationExtensions above)
  • Transcription factors - wherever possible (i.e. make exceptions only when data don't support this):
    • Remember to annotate to both a transcription factor activity term and a DNA binding term, because otherwise the TF won't be annotated to DNA binding, because the terms are connected by has_part in GO.
      • DNA binding: annotate to descendant of transcription regulatory region sequence-specific DNA binding (GO:0000976), e.g. RNA polymerase II core promoter proximal region sequence-specific DNA binding (GO:0000978) is often the right one
      • Transcription factor activity: annotate to a descendant of sequence-specific DNA binding transcription factor activity (GO:0003700), usually RNA polymerase II core promoter proximal region sequence-specific DNA binding transcription factor activity involved in positive regulation of transcription (GO:0001077) or RNA polymerase II core promoter proximal region sequence-specific DNA binding transcription factor activity involved in negative regulation of transcription (GO:0001078)
    • To capture target genes, put has_regulation_target extensions on the transcription factor activity term
    • Also put happens_during extensions on the TF activity term to capture stress, cell cycle phase, etc. No need to make redundant BP annotations.
    • Example: PMID:23231582 describes transcription during phosphate starvation. Mutations in pho7 or csk1 affect the -phosphate expression profile. For pho7 they also do ChIP-Seq and TAP assays, and some assays with a reporter construct, to establish that it acts as a transcription factor.
      • So I annotated pho7 to GO:0000978 (IPI with targets identified by ChIP-Seq) and GO:0001077 (IMP - Fig 4, plus can interpret Fig 2 ChIP-Seq as also supporting in light of Fig 4; have also thought about whether it might be good enough for IDA), with extensions on both to indicate phosphate starvation -- happens_during(GO:0016036) -- and the target genes highlighted in the text.
      • In contrast, for csk1 I just used the BP GO:0045944 'positive regulation of transcription from RNA polymerase II promoter' with IMP.
      • Full details in the curation session.
      • The phenotypes themselves are another hairball, since they're mainly effects on global transcription measured by microarrays. In theory we could do annotations with extensions for all affected genes, but that would be insane. Try to get data for browser track; in meantime I'm just annotating to "altered RNA level during cellular response to phosphate starvation" without extensions. (2014-03-10)

GO annotation and Redundancy

You don't need to make every GO annotation in a paper, if an annotation is already present (or well known) from a previous annotated experiment(s)/papers. For example, you don't need to annotate every demonstrated occurrence of Cdc2 to protein kinase activity.

  • Some guidelines -- make an additional annotation if:
    • There is any new information, for instance an additional annotation extension or qualifier
    • Two (or more) papers containing new experimental information were published within a few months of each other, in which case curate both
    • It lends extra support to a term/annotation which may be considered not well-supported
    • You don't need to make every IGI annotation to support a GO process (usually you should be able to make a single IMP annotation). However, make sure the individual genetic interactions are curated in BioGRID (note: some older IGI annotations that over-interpret the available evidence will gradually be removed if the interactions are represented in BioGRID).

Gene Product Forms, or "Column 17"

GO supplementary IC (inferred from curator) annotation

Making GO process annotations from IMP

We use GO annotations to describe direct involvement in a process, or its regulation (see below for more detail on regulation). We don't annotate indirect upstream effects. Caution must therefore be used when using IMP to make process annotation, as it often isn't clear whether the effect is directly involved, regulation, or only affecting a process indirectly. Often a number of phenotypes are used to make a GO annotation.
Example 1

dil1 in /curs/c7fac5251ee4f493/ro/ where annotation to dynein-driven meiotic oscillatory nuclear movement with IMP is based on a combination of phenotypes :

  • decreased meiotic recombination
  • horsetail movement abolished
  • unequal meiotic chromosome segregation
  • decreased protein localization to microtubule cytoskeleton

Example 2

nda3 and mug164 annotations to intracellular distribution of mitochondria with IMP is based on

  • abolished mitochondrion inheritance
  • mitochondrial aggregation at cell tip
  • normal mitochondrial fission
  • normal mitochondrial fusion

Regulation in GO , describe here


Return to main page or Curation Guidelines?