= !PomBase GO Annotation Guidelines = == GO Consortium Documentation - Ontology == These pages describe the overall structure of GO and scope of each main branch: * [http://www.geneontology.org/GO.ontology.structure.shtml General Ontology structure] (Includes Cross Products and Logical Definitions) * [http://www.geneontology.org/GO.ontology-ext.relations.shtml Ontology relations] * [http://www.geneontology.org/GO.process.guidelines.shtml Biological Process] Every process should have a discrete '''beginning''' and '''end''', and these should be clearly stated in the process term definition. Note, however, that this work is still in progress for GO. * [http://www.geneontology.org/GO.function.guidelines.shtml Molecular Function] * [http://www.geneontology.org/GO.component.guidelines.shtml Cellular Component] == GO Consortium Documentation - Annotation == All of the [http://www.geneontology.org/GO.contents.doc.shtml#annotation general annotation documentation and recommendations] on GO web and wiki pages are applicable. Links to some of the most useful: * [http://www.geneontology.org/GO.annotation.conventions.shtml GOC annotation conventions] (note: some of the links below go to specific sections of this page) * Specific biological topics * [http://www.geneontology.org/GO.annotation.conventions.shtml#response_to 'Response to' BP terms] * [http://www.geneontology.org/GO.annotation.conventions.shtml#regulation Regulation] * [http://www.geneontology.org/GO.annotation.conventions.shtml#Downstream_Process_guidelines Downstream processes] * [http://wiki.geneontology.org/index.php/Transcription Transcription Overhaul details] * [http://www.geneontology.org/GO.annotation.conventions.shtml#Binding_guidelines 'Binding' terms in MF] * [http://wiki.geneontology.org/index.php/Annotation_consistency_:_ChIP_experiments chromatin immunoprecipitation (ChIP) experiments NOT promoter binding] * In !PomBase, we'll only annotate to a GO 'protein binding' term if there's strong evidence that a physical interaction is direct, e.g. using purified proteins. Otherwise, we'll just do the BioGRID interaction annotation. * [http://wiki.geneontology.org/index.php/Annotation_Guidance_Pages Annotation guidance pages (work in progress at GO)] * Evidence * [http://www.geneontology.org/GO.evidence.shtml GO evidence code documentation] * [http://wiki.geneontology.org/index.php/Evidence_Code_Ontology_(ECO) Mapping to ECO] * [http://wiki.geneontology.org/index.php/Evidence_Code_proposals Evidence code proposals (work in progress at GO)] * [http://code.google.com/p/evidenceontology/ ECO home page] and [http://code.google.com/p/evidenceontology/issues/list tracker] * QC, formats, etc. * [http://www.geneontology.org/GO.format.annotation.shtml GO Annotation File formats] * [http://www.geneontology.org/GO.format.gaf-2_0.shtml GAF 2.0 (currently in use)] * [http://wiki.geneontology.org/index.php/Gene_Product_Association_Data_(GPAD)_Format GPAD (proposed)] * [http://www.geneontology.org/GO.annotation_qc.shtml Automated annotation quality control checks] * [https://sourceforge.net/p/geneontology/annotation-issues/ GO Annotation Issues tracker] at !SourceForge - use this to raise questions for the GO group, or to report mapping problems (see below) == Finding and Requesting GO Terms == * Ways to search for GO terms: * directly in the curation tool * online GO browsers - [http://www.ebi.ac.uk/QuickGO/ QuickGO] or [http://amigo.geneontology.org/ AmiGO] * install OBO-Edit (the least convenient option, but the most sophisticated search interface) * If you have trouble finding an existing term, request synonyms from GO ([https://sourceforge.net/tracker/?group_id=36855&atid=440764 SF tracker]) to make the search easier in future * If you have a problem locating a term in the curation tool using a string which was *already* a synonym in GO, make a note in [wiki:Lucene_issues Problems identifying ontology terms]. Kim can use this to configure the search to give improved results. * Requesting new terms * If the term you need follows one of the patterns supported by !TermGenie, you can use it and get a stable ID immediately * [http://go.termgenie.org/ TermGenie main page] * [http://go.termgenie.org/help/index.html TermGenie GO help] (includes how to set up user access) * Otherwise, request your term(s) on the [https://sourceforge.net/p/geneontology/ontology-requests/ GO Ontology Requests tracker] at !SourceForge * Any information you can include -- name, text definition, parent(s), synonyms, reference, etc. -- will be much appreciated by the GO editors == GO Annotation Extensions == GO annotation extensions capture specificity that would be undesirable in the ontology. * When to use an extension versus requesting a new term? * [wiki:PrePostComposeGO PomBase "Precompose or postcompose?" wiki] * [http://wiki.geneontology.org/index.php/Annotation_Extension#When_should_a_curator_use_the_Annotation_Extension_field_instead_of_requesting_a_new_GO_term.3F GO "when to use extensions" wiki] and [http://wiki.geneontology.org/index.php/Annotation_Extension#Annotation_Examples annotation examples] A general rule is that when a GO process term is representing a single process, a single GO term should be requested. For example, negative regulation of SREBP signaling pathway by transcription factor catabolism with parents: GO:2000639 negative regulation of SREBP signaling pathway GO:0010620 negative regulation of transcription by transcription factor catabolism An exception to this rule is when a signalling pathway is activated by a number of different stresses. In this case the stress will be added as a "during..." extension. The rationale for this exception is that the specific annotations do not offer any obvious benefits to users (for enrichments etc). This would change if 2 distinct pathways were observed with distinct gene products annotated for each pathway (this test can be used to make decisions about further proposed exceptions). Note that you should continue to make a concurrent "response to x stress" for these gene products. * The basic format is relation(Database prefix:database_ID). Additional format/syntax documentation is available: * [AnnotationExtensionSyntax PomBase wiki] * [http://wiki.geneontology.org/index.php/Annotation_Extension#The_basic_format GO annotation extension wiki] * Relations in use: * [http://viewvc.geneontology.org/viewvc/GO-SVN/trunk/ontology/extensions/go_annotation_extension_relations.obo OBO-format file of relations (GO site)] * Graphical View http://www.ebi.ac.uk/QuickGO/AnnotationExtensionRelations.html * [ListOfRelations List of relations used by PomBase] * [OntologiesInUse Ontologies used in col 16 and examples of uses] * Terms with extensions: * [http://viewvc.geneontology.org/viewvc/GO-SVN/trunk/ontology/extensions/go_annotation_extension_examples.obo examples of how "terms" would be defined if they were pre-composed (GO site)] * [http://curation.pombase.org/test/view/object/cv/PomBase%20annotation%20extension%20terms?model=chado PomBase list of terms created by annotation extension] == Specific GO Annotation Guidelines for !PomBase == === Recommended Terms === In many cases, the best GO term to use for a process or component in pombe is more specific than the term that mot closely matches wordings we typically find in papers. For example, see "DNA replication" or "sporulation" below. We maintain a file that maps more general GO terms to more specific ones that can always be substituted for pombe annotations. The mapping file is in the dropbox at Dropbox/pombase/ontologies/GO/mappings/GO_mapping_to_specific_terms.txt If you don't want a substitution to happen automatically, use a more specific term in the first place. If you find many exceptions to the mapping, change or delete it. ==== Biological Process ==== * Regulation * Lots of discussion going on among !PomBase curators and with other GO annotators, about when to annotate to regulation terms and when not. It's also connected with filling in start and end details for the process terms that still need them. Conclusions will be posted or linked here. (2012-06-07) * Transcription * See "Transcription factors" under Molecular Function below * Annotate to regulation of transcription (with extensions as appropriate) only if there isn't enough data to support annotating to a transcription factor MF term (descendant of GO:0003700 sequence-specific DNA binding transcription factor activity); if there is a transcription factor MF annotation, a transcription regulation BP annotation will be inferred by transitivity * All fission yeast annotations for RNAi so far should be to the term [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0031048 chromatin silencing by small RNA (GO:0031048)] * Note that the GO term [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0016246 RNA interference (RNAi; GO:0016246)] is not an ancestor of GO:0031048, because in GO RNAi is defined more strictly, according to the original "post-transcriptional" usage. * Translation * For annotations to translation, it should always be possible to specify [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0002181 cytoplasmic translation (GO:0002181)] or [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0032543 mitochondrial translation (GO:0032543)] * Splicing * Annotations for GT-AG type splicing via the spliceosome should always be to [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0045292 nuclear mRNA cis splicing, via spliceosome (GO:0045292)] * Cytokinesis * Annotations for cytokinesis in fission yeast should always be to [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0033205 GO:cell cycle cytokinesis (GO:0033205)] or one of its descendants * Cytokinesis terms are in the "mapping to specific terms" file * Cell wall organization * Annotations to cell wall organization should always be to [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0071852 Fungal-type cell wall organization or biogenesis (GO:0071852)] or one of its descendants * Cell wall organization terms are in the "mapping to specific terms" file * Transport/Localization * Always check that transport and localization terms have "cellular" ancestry, usually to [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0046907 intracellular transport (GO:0046907)] or [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0051641 cellular localization (GO:0051641)] * Check that the appropriate "intracellular" term is used, e.g, [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0006886 intracellular protein transport (GO:0006886)] * Note the distinction between "transport" and "localization" and always use the appropriate branch. Localization is more general and can involve establishment or maintenance at a specific location, whereas transport involves directed movement * Note that all transmembrane transporters should have an annotation to the process of [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0055085 transmembrane transport (GO:0055085)]. In some cases, GO will have a function-process link that allows the transport process annotation to be inferred; in other cases, you may want to request an MF-BP link. * Note that [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0006913 nucleocytoplasmic transport (GO:0006913)] is NOT "transmembrane transport", as the lipid bilayer is not traversed. * DNA replication * Annotations to "canonical DNA replication" should be to [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0006261 DNA-dependent DNA replication (GO:0006261)] or one of its descendants. Nuclear DNA replication in pombe is 'nuclear cell cycle DNA replication' [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0033260 GO:0033260]. * DNA replication terms are in the "mapping to specific terms" file * Metabolic processes * Always check that "metabolic process" terms used have "cellular metabolic/biosynthetic/catabolic process" parentage; if not, request a change in GO * Response to stress * The most generic response to stress terms ('response to stress' [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0006950 GO:0006950] and 'cellular response to stress' [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0033554 GO:0033554]) are not available for direct manual annotation. Usually you would minimally * specify "cellular" and a specific stress, e.g. [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0034599 cellular response to oxidative stress (GO:0034599)] OR * regulation of a specific process in response to stress, e.g. [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0043555 regulation of translation in response to stress (GO:0043555)] * Cell polarity related * When using terms related to cell polarity, make sure that if you are annotating a process that affects cell shape, you select one of the terms that specifies "regulating cell shape", e.g. [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0071963 establishment or maintenance of cell polarity regulating cell shape (GO:0071963)] or one of its descendants. * Sporulation * use ascospore formation ([http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0030437 GO:0030437]) or children * Sporulation terms are in the "mapping to specific terms" file ==== Cellular Component ==== * You can usually use the specific "nuclear x" or "cytoplasmic x" macromolecular complex terms. * In particular, because fission yeast has no nuclear envelope breakdown during mitosis, you can always annotate to the "nuclear" versions of terms for chromosomes, chromatin regions, etc., e.g. * nuclear heterochomatin [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0005720 GO:0005720] * nuclear chromosome, telomeric region [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0000784 GO:0000784] * nuclear telomeric heterochromatin [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0005724 GO:0005724] * These terms are in the "mapping to specific terms" file ==== Molecular Function ==== * Protein binding - annotate to GO:0005515 (or an allowable descendant) if there is enough information to conclude that there is a direct physical interaction. Otherwise just make BioGRID interaction annotation(s). * use IPI evidence; a 'with' entry is mandatory * note that GO:0019899 'enzyme binding' is in the do-not-annotate subset * we haven't (yet) tried to systematically delete older protein binding annotations that may not fit the direct interaction criterion * some protein binding annotations could presumably be inferred automatically from enzyme activity + input annotations (e.g. a protein kinase and a substrate identified in an extension) * DNA binding * for sequence-specific DNA binding, can use an annotation extension with 'occurs_at' and a SO ID (also see [#GOAnnotationExtensions] above) === GO annotation and Redundancy === You don't need to make every GO annotation in a paper, if an annotation is already present (or well known) from a previous annotated experiment(s)/papers. For example, you don't need to annotate every demonstrated occurrence of Cdc2 to protein kinase activity. * Some guidelines -- make an additional annotation if: * There is any new information, for instance an additional annotation extension or [wiki:QualifiersUsed qualifier] * Two (or more) papers containing new experimental information were published within a few months of each other, in which case curate both * It lends extra support to a term/annotation which may be considered not well-supported * You don't need to make every IGI annotation to support a GO process (usually you should be able to make a single IMP annotation). However, make sure the individual genetic interactions are curated in BioGRID (note: some older IGI annotations that over-interpret the available evidence will gradually be removed if the interactions are represented in BioGRID). === Gene Product Forms, or "Column 17" === * Identifier for the specific form of a gene product * Background: [http://wiki.geneontology.org/index.php/GAF_Spliceform_Column_Proposal GO wiki page on spliceforms and column 17] * For modified forms of proteins (e.g. phosphorylated, methylated) use Protein Ontology entries (PR:[id]) * [http://pir.georgetown.edu/projects/pro/pro_wv.obo pre-release (i.e. latest) version of the Protein Ontology] * Tracker for term requests [http://pir.georgetown.edu/cgi-bin/pro/race_pro] * Examples: [https://sourceforge.net/tracker/index.php?func=detail&aid=3311095&group_id=65526&atid=2096329] * splice variants can use !PomBase splice variant IDs (no examples curated yet) * can also use !UniProt IDs === GO supplementary IC (inferred from curator) annotation === * Make any annotation (IC) which can be inferred by a curator but are not implicitly annotated by transitivity (i.e. because not included in the term ancestry). For example, * [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0000808 origin recognition complex (ORC; GO:0000808)] subunits can also get: * [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0031261 DNA replication preinitiation complex (GO:0031261)] * [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0005656 pre-replicative complex (GO:0005656)] * [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0043596 nuclear replication fork (GO:0043596)] * [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0032049 cardiolipin biosynthesis (GO:0032049)] implies [http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0007006 mitochondrial membrane organization (GO:0007006)] * Also check whether any of these can be experimentally supported by data in the paper === Updating Existing/Legacy Annotations === * Check all existing ISS, IEA, TAS and NAS annotations to see if any are no longer required, or are incorrect * Remove any TAS/NAS/ISS which are now covered by experiment * Automated mappings (IEA) will be suppressed by experimental data. Are any IEA annotations not covered by your manual annotation? It should be possible to make a manual annotation to cover all automated mappings (if no experimentally supported annotation can be made, a manually evaluated ISS should be possible) * Mappings that can't be replaced by manual ISS or experimental annotations may be incorrect and need to be removed. Report incorrect mappings on the [https://sourceforge.net/tracker/?group_id=36855&atid=605890 GO annotation issues tracker] * Swiss-Prot keyword (SPKW, SP_KW, UniProtKB-KW) mappings: choose category "!UniProt KW2GO mapping", group "GOA", and assign to "goa-ebi" * Swiss-Prot Subcellular Location (SP_SL, UniProtKB-!SubCell) mappings: category "!UniProt subcell2GO mapping", group "GOA", assign to "goa-ebi" * !InterPro mappings: category "!InterPro mapping", group "!InterPro", assign to "interhelp" * For !UniProt keyword and subcellular location mappings, you can also go to the !UniProt entry from a !PomBase gene page and send a message to !UniProt. Ivo Pedruzzi will fix it quickly. * Some "pombe kw mappings" have NAS evidence code (you will know these are mappings because they will not be visible in the Artemis curation tool. These will need to be deleted from the mapping file. * If a mapping doesn't seem to be the problem, the ontology may need to be revised; contact GO editors via the [https://sourceforge.net/tracker/?atid=440764&group_id=36855&func=browse ontology tracker] * Check for consistency with other annotations and other resources * Make sure all remaining ISS are made to an experimentally characterised ortholog. * If the gene has an S. cerevisiae ortholog, check the annotations to the ortholog in SGD. * If the gene in SGD is not annotated to a term, and you think it clearly should be, mail them (sgd-helpdesk at lists.stanford.edu) to add it so that the ISS is supported. This is frequently required when annotation gene products which are not published. * Can you make any further annotations based on what SGD has? (Note reasons why annotations cannot be transferred; 1:1 is easiest) ---- [WikiStart Return to main page] or [CurationGuidelines Curation Guidelines]