wiki:GOAnnotationGuidelines

Version 93 (modified by mah79, 3 years ago) (diff)

--

PomBase GO Annotation Guidelines

Background Reading

Requesting New GO Terms

  • (Info on TermGenie successor will go here)
  • Or request your term(s) on the GO Ontology Requests tracker at GitHub -- include name, text definition, parent(s), synonyms, reference, etc.
  • When should I request a new GO term?
    • 1. To add specificity, providing this should not be done with an annotation extension
    • 2. A general rule is that when a GO process term is representing a single process, a single GO term should be used.
      • For example, request a more specific term which links 2 parents negative regulation of SREBP signaling pathway by transcription factor catabolism with parents: GO:2000639 negative regulation of SREBP signaling pathway GO:0010620 negative regulation of transcription by transcription factor catabolism

Reporting errors to GO

GO Annotation Extensions

Specific GO Annotation Guidelines

Annotation Specificity

At PomBase we have tagged some terms as 'not to be used for direct annotation' because it should always or usually) be possible to make a more specific annotation. Examples include:

  • DNA replication (meiotic or mitotic?)
  • cell cycle/regulation of cell cycle (meiotic or mitotic? which transition?)
  • splicing via the spliceosome -> nuclear mRNA cis splicing, via spliceosome (no trans splicing in pombe)
  • cytokinesis (should use terms under the appropriate cell cycle mitotic (usually) or meiotic)
  • cell wall organization -> fungal-type cell wall organization or biogenesis or one of its descendants
  • transport (vesicle-mediated? transmembrane? etc)
  • sporulation -> ascospore formation or children

GO annotation and Redundancy

If you are annotating a newer paper, and it repeats older, well-annotated experiments, you do not need to capture the annotation again.

Biological Process

  • Every process should have a discrete beginning and end, and these should be clearly stated in the process term definition. Note, however, that this work is still in progress for GO.
  • When to make a GO process annotation
    • GO or phenotype?
      • We use GO annotations to describe direct involvement in a process, or its regulation (see below for more detail on regulation). We don't annotate indirect upstream effects. Caution must therefore be used when using IMP to make process annotation, as it often isn't clear whether the effect is directly involved, regulation, or only affecting a process indirectly. Often a number of phenotypes are used to make a GO annotation.
        • Example 1: dil1 in /curs/c7fac5251ee4f493/ro/ where annotation to dynein-driven meiotic oscillatory nuclear movement with IMP is based on a combination of phenotypes:
          • decreased meiotic recombination
          • horsetail movement abolished
          • unequal meiotic chromosome segregation
          • decreased protein localization to microtubule cytoskeleton
        • Example 2: nda3 and mug164 annotations to intracellular distribution of mitochondria with IMP is based on
          • abolished mitochondrion inheritance
          • mitochondrial aggregation at cell tip
          • normal mitochondrial fission
          • normal mitochondrial fusion
  • Regulation
    • GOC Regulation
      • Lots of ongoing discussion among PomBase curators and with other GO annotators, about when to annotate to regulation terms and when not.
      • This is also connected with filling in start and end details for the process terms that still need them.
      • To do: when PomBase deploys qualifiers (e.g. "causally upstream of or within" add documentation (2017-06-27)
  • Transcription
    • Transcription Overhaul details
      • For "Transcription factors" see Molecular Function below
      • Annotate to regulation of transcription (with approtiate gene specific extensions) only if there isn't enough data to support annotating to a transcription factor MF term
  • Regulation of transcription and signalling pathways
    • Regulation of transcription and signalling pathways Consists of four components: A signal transduction cascade, a transcription factor, the process of transcription and a downstream process. The signal transduction cascade regulates the downstream process ONLY. The transcription factor regulates transcription AND the downstream process. For example:
      • Transcription factor ste11 should be annotated to positive regulation of transcription involved in sporulation AND signal transduction involved in positive regulation of sporulation.
      • Mam2 in contrast is only annotated to signal transduction involved in positive regulation of sporulation.
  • Transport and Localization
    • Note the distinction between "transport" and "localization" and always use the appropriate branch. Localization is more general and can involve establishment or maintenance at a specific location, whereas transport involves directed movement
      • Note that all transmembrane transporters should have an annotation to the process of transmembrane transport.
  • Response to .....
    • We do not usually annotate to "response to stress" terms unless we can say specifically which process is altered, e.g. regulation of cytoplasmic translation in response to stress
    • See also GOC documentation 'Response to' BP terms
  • Cell polarity

Molecular Function

  • Protein binding
    • In PomBase, we'll only annotate to a GO 'protein binding' GO:0005515 term if there's strong evidence that a physical interaction is direct, e.g. using purified proteins. Otherwise, just do the BioGRID interaction annotation.
    • We do not use gene product-specific, protein family-specific, or GO function-specific GO protein binding descendants (this would take too long, and can be done with a query)
    • However, we do use "domain-specific" GO binding terms to specify binding to a specific region in a target protein
  • DNA binding - for sequence-specific DNA binding, use an annotation extension with 'occurs_at' and a SO ID (also see #GOAnnotationExtensions above). The rationale is that the bound substrate is the whole DNA molecule, and the SO extension indicates the region where binding takes place.
    • If you don't know whether a gene product is binding to DNA or protein (or both) in chromatin, you can use "chromatin binding" again with 'occurs_at(SO:nnn)' extensions ... BUT:
    • Don't use "promoter binding" terms if the only evidence is chromatin immunoprecipitation (ChIP). In fact, it's safer not to use a binding term at all; use a "chromatin" cellular component term instead. See the GO wiki page on ChIP experiments. Specify where with 'coincident_with(SO:nnn)' or 'coincident_with(PomBase:id)' extensions.
  • Transcription factors - wherever possible make annotation to both:
    • 1. a transcription factor activity term (sequence-specific DNA binding transcription factor activity (GO:0003700) or a descendant) with any target genes as has_regulation_target extensions, and optionally a promoter in an occurs_at(SO:nnn) extension; and a
    • 2. a DNA binding term, "transcription regulatory region sequence-specific DNA binding (GO:0000976)" or a descendent (most commonly RNA polymerase II core promoter proximal region sequence-specific DNA binding GO:0000978). Capture DNA binding specificity (e.g. motifs) with 'occurs_at(SO:nnn)' extensions
    • Otherwise the TF won't be annotated to DNA binding, because the terms are connected by has_part in GO). (You might need to use ISS or IC to get both)
      • Can also put happens_during extensions on the TF activity term to capture stress, cell cycle phase, etc.
      • Example: PMID:23231582 describes transcription during phosphate starvation. Mutations in pho7 or csk1 affect the -phosphate expression profile. For pho7 they also do ChIP-Seq and TAP assays, and some assays with a reporter construct, to establish that it acts as a transcription factor.
        • So I annotated pho7 to GO:0000978 (IPI with targets identified by ChIP-Seq) and GO:0001077 (IMP - Fig 4, plus can interpret Fig 2 ChIP-Seq as also supporting in light of Fig 4; have also thought about whether it might be good enough for IDA), with extensions on both to indicate phosphate starvation -- happens_during(GO:0016036) -- and the target genes highlighted in the text. In contrast, for csk1 I just used the BP GO:0045944 'positive regulation of transcription from RNA polymerase II promoter' with IMP. Full details in the curation session.
  • RNA binding - can use has_direct_input(SO:nnn) extensions
    • The difference between extensions on 'RNA binding' versus 'DNA binding' is that with RNA, the SO terms usually represent different RNA molecules (the ones we tend to use are in the "transcript" branch of SO, and we should eventually switch to the "SO molecule" ontology if and when it becomes more than a clever idea).

Gene Product Forms, or "Column 17"

  • Identifier for the specific form of a gene product, for example to describe if a particular cellular location is observed with a specific modification of a protein.
    • For modified forms of proteins (e.g. phosphorylated, methylated) use Protein Ontology entries (PR:[id]) request from PRO tracker or RACE-PRO
    • splice variants can use PomBase splice variant IDs (no examples curated yet)

Evidence and references for sequence similarity

Annotations using ISS or any of its "child" evidence types (ISO, ISM, etc.) can use a PMID if a paper reports similarity, especially if the authors explicitly make the inference represented by the GO annotation.

Otherwise, use GO_REFs:

  • ISO - GO_REF:0000024
  • ISM - GO_REF:0000050 (with InterPro, Pfam, etc.)

GO supplementary IC (inferred from curator) annotation


Return to main page