Version 3 (modified by 9 years ago) (diff) | ,
---|
PomBase GO Annotation Guidelines
GO Consortium Documentation - Ontology
These pages describe the overall structure of GO and scope of each main branch:
- General Ontology structure (Includes Cross Products and Logical Definitions)
- Ontology relations
- Biological Process Every process should have a discrete beginning and end, and these should be clearly stated in the process term definition. Note, however, that this work is still in progress for GO.
- Molecular Function
- Cellular Component
GO Consortium Documentation - Annotation
All of the general annotation documentation and recommendations on GO web and wiki pages are applicable. Links to some of the most useful:
- GOC annotation conventions (note: some of the links below go to specific sections of this page)
- Specific biological topics
- 'Response to' BP terms
- Regulation
- Downstream processes
- Transcription Overhaul details
- 'Binding' terms in MF
- chromatin immunoprecipitation (ChIP) experiments NOT promoter binding
- In PomBase, we'll only annotate to a GO 'protein binding' term if there's strong evidence that a physical interaction is direct, e.g. using purified proteins. Otherwise, we'll just do the BioGRID interaction annotation.
- Annotation guidance pages (work in progress at GO)
- Evidence
- QC, formats, etc.
- GO Annotation Issues tracker at SourceForge - use this to raise questions for the GO group, or to report mapping problems (see below)
Finding and Requesting GO Terms
- Ways to search for GO terms:
- If you have trouble finding an existing term, request synonyms from GO (SF tracker) to make the search easier in future
- If you have a problem locating a term in the curation tool using a string which was *already* a synonym in GO, make a note in Problems identifying ontology terms. Kim can use this to configure the search to give improved results.
- Requesting new terms
- If the term you need follows one of the patterns supported by TermGenie, you can use it and get a stable ID immediately
- TermGenie main page
- TermGenie GO help (includes how to set up user access)
- Otherwise, request your term(s) on the GO Ontology Requests tracker at SourceForge
- Any information you can include -- name, text definition, parent(s), synonyms, reference, etc. -- will be much appreciated by the GO editors
- If the term you need follows one of the patterns supported by TermGenie, you can use it and get a stable ID immediately
GO Annotation Extensions
GO annotation extensions capture specificity that would be undesirable in the ontology.
- When to use an extension versus requesting a new term?
A general rule is that when a GO process term is representing a single process, a single GO term should be requested. For example, negative regulation of SREBP signaling pathway by transcription factor catabolism with parents: GO:2000639 negative regulation of SREBP signaling pathway GO:0010620 negative regulation of transcription by transcription factor catabolism
An exception to this rule is when a signalling pathway is activated by a number of different stresses. In this case the stress will be added as a "during..." extension. The rationale for this exception is that the specific annotations do not offer any obvious benefits to users (for enrichments etc). This would change if 2 distinct pathways were observed with distinct gene products annotated for each pathway (this test can be used to make decisions about further proposed exceptions). Note that you should continue to make a concurrent "response to x stress" for these gene products.
/Annotation_Extension#When_should_a_curator_use_the_Annotation_Extension_field_instead_of_requesting_a_new_GO_term.3F GO "when to use extensions" wiki] and annotation examples
- The basic format is relation(Database prefix:database_ID). Additional format/syntax documentation is available:
- Relations in use:
- Terms with extensions:
Specific GO Annotation Guidelines for PomBase
Recommended Terms
Biological Process
- Regulation
- Lots of discussion going on among PomBase curators and with other GO annotators, about when to annotate to regulation terms and when not. It's also connected with filling in start and end details for the process terms that still need them. Conclusions will be posted or linked here. (2012-06-07)
- Transcription
- All fission yeast annotations for RNAi so far should be to the term chromatin silencing by small RNA (GO:0031048)
- Note that the GO term RNA interference (RNAi; GO:0016246) is not an ancestor of GO:0031048, because in GO RNAi is defined more strictly, according to the original "post-transcriptional" usage.
- All fission yeast annotations for RNAi so far should be to the term chromatin silencing by small RNA (GO:0031048)
- Translation
- For annotations to translation, it should always be possible to specify cytoplasmic translation (GO:0002181) or mitochondrial translation (GO:0032543)
- Splicing
- Annotations for GT-AG type splicing via the spliceosome should always be to nuclear mRNA cis splicing, via spliceosome (GO:0045292)
- Cytokinesis
- Annotations for cytokinesis in fission yeast should always be to GO:cell cycle cytokinesis (GO:0033205) or one of its descendants
- Cell wall organization
- Annotations to cell wall organization should always be to Fungal-type cell wall organization or biogenesis (GO:0071852) or one of its descendants
- Transport/Localization?
- Always check that transport and localization terms have "cellular" ancestry, usually to intracellular transport (GO:0046907) or cellular localization (GO:0051641)
- Check that the appropriate "intracellular" term is used, e.g, intracellular protein transport (GO:0006886)
- Note the distinction between "transport" and "localization" and always use the appropriate branch. Localization is more general and can involve establishment or maintenance at a specific location, whereas transport involves directed movement
- Note that all transmembrane transporters should have an annotation to the process of transmembrane transport (GO:0055085). In some cases, GO will have a function-process link that allows the transport process annotation to be inferred; in other cases, you may want to request an MF-BP link.
- Note that nucleocytoplasmic transport (GO:0006913) is NOT "transmembrane transport", as the lipid bilayer is not traversed.
- DNA replication
- Annotations to "canonical DNA replication" should be to DNA-dependent DNA replication (GO:0006261) or one of its descendants
- Metabolic processes
- Always check that "metabolic process" terms used have "cellular metabolic/biosynthetic/catabolic process" parentage; if not, request a change in GO
- Response to stress
- We do not generally annotate directly to response to stress (GO:0006950). Usually you would minimally
- specify "cellular" and a specific stress, e.g. cellular response to oxidative stress (GO:0034599) OR
- regulation of a specific process in response to stress, e.g. regulation of translation in response to stress (GO:0043555)
- We do not generally annotate directly to response to stress (GO:0006950). Usually you would minimally
- Cell polarity related
- When using terms related to cell polarity, make sure that if you are annotating a process that affects cell shape, you select one of the terms that specifies "regulating cell shape", e.g. establishment or maintenance of cell polarity regulating cell shape (GO:0071963) or one of its descendants.
- Sporulation
- use ascospore formation (GO:0030437) or children
Cellular Component
- You can usually use the specific "nuclear x" or "cytoplasmic x" macromolecular complex terms.
- In particular, because fission yeast has no nuclear envelope breakdown during mitosis, you can always annotate to the "nuclear" versions of terms for chromosomes, chromatin regions, etc., e.g.
- nuclear heterochomatin GO:0005720
- nuclear chromosome, telomeric region GO:0000784
- nuclear telomeric heterochromatin GO:0005724
- In particular, because fission yeast has no nuclear envelope breakdown during mitosis, you can always annotate to the "nuclear" versions of terms for chromosomes, chromatin regions, etc., e.g.
- Avoid using the "cell fraction" terms (GO:0000267 and descendants)
Molecular Function
(to be added as needed)
GO annotation and Redundancy
You don't need to make every GO annotation in a paper, if an annotation is already present (or well known) from a previous annotated experiment(s)/papers. For example, you don't need to annotate every demonstrated occurrence of Cdc2 to protein kinase activity.
- Some guidelines -- make an additional annotation if:
- There is any new information, for instance an additional annotation extension or qualifier
- Two (or more) papers containing new experimental information were published within a few months of each other, in which case curate both
- It lends extra support to a term/annotation which may be considered not well-supported
- You don't need to make every IGI annotation to support a GO process (usually you should be able to make a single IMP annotation). However, make sure the individual genetic interactions are curated in BioGRID (note: some older IGI annotations that over-interpret the available evidence will gradually be removed if the interactions are represented in BioGRID).
Gene Product Forms, or "Column 17"
- Identifier for the specific form of a gene product
- Background: GO wiki page on spliceforms and column 17
- For modified forms of proteins (e.g. phosphorylated, methylated) use Protein Ontology entries (PR:[id])
- splice variants can use PomBase splice variant IDs (no examples curated yet)
- can also use UniProt IDs
GO supplementary IC (inferred from curator) annotation
- Make any annotation (IC) which can be inferred by a curator but are not implicitly annotated by transitivity (i.e. because not included in the term ancestry). For example,
- Also check whether any of these can be experimentally supported by data in the paper
Updating Existing/Legacy? Annotations
- Check all existing ISS, IEA, TAS and NAS annotations to see if any are no longer required, or are incorrect
- Remove any TAS/NAS/ISS which are now covered by experiment
- Automated mappings (IEA) will be suppressed by experimental data. Are any IEA annotations not covered by your manual annotation? It should be possible to make a manual annotation to cover all automated mappings (if no experimentally supported annotation can be made, a manually evaluated ISS should be possible)
- Mappings that can't be replaced by manual ISS or experimental annotations may be incorrect and need to be removed. Report incorrect mappings on the GO annotation issues tracker
- Swiss-Prot keyword (SPKW, SP_KW, UniProtKB-KW) mappings: choose category "UniProt KW2GO mapping", group "GOA", and assign to "goa-ebi"
- Swiss-Prot Subcellular Location (SP_SL, UniProtKB-SubCell) mappings: category "UniProt subcell2GO mapping", group "GOA", assign to "goa-ebi"
- InterPro mappings: category "InterPro mapping", group "InterPro", assign to "interhelp"
- For UniProt keyword and subcellular location mappings, you can also go to the UniProt entry from a PomBase gene page and send a message to UniProt. Ivo Pedruzzi will fix it quickly.
- Some "pombe kw mappings" have NAS evidence code (you will know these are mappings because they will not be visible in the Artemis curation tool. These will need to be deleted from the mapping file.
- If a mapping doesn't seem to be the problem, the ontology may need to be revised; contact GO editors via the ontology tracker
- Check for consistency with other annotations and other resources
- Make sure all remaining ISS are made to an experimentally characterised ortholog.
- If the gene has an S. cerevisiae ortholog, check the annotations to the ortholog in SGD.
- If the gene in SGD is not annotated to a term, and you think it clearly should be, mail them (sgd-helpdesk at lists.stanford.edu) to add it so that the ISS is supported. This is frequently required when annotation gene products which are not published.
- Can you make any further annotations based on what SGD has? (Note reasons why annotations cannot be transferred; 1:1 is easiest)
- If the gene has an S. cerevisiae ortholog, check the annotations to the ortholog in SGD.