wiki:MatrixProject

Version 109 (modified by val_wood, 9 years ago) (diff)

--

QC of GO annotation using annotation overlaps

Summary

This project leverages the existing annotation, to identify annotations which conflict with current biological knowledge. Using process GO slim terms, each term is assessed for overlaps with other slim terms. If no known overlaps exist this is scored as "NO OVERLAP" and any annotations which are made to this intersection will be queried for validity. The template rules are defined after assessing the current annotation set for fission yeast, budding yeast and mouse. More specific rules are created by identifying specific processes, complexes or functions which *are* allowed in intersections, and can be extended over time to take account of new biology, or existing annotations in other organisms.

Why

Querying annotation overlaps is a powerful method to:

  • Identify annotation errors (curated and automated)
    • Allows rapid detection of incorrect mappings
    • Allows detection of "Indirect upstream effects". One process acts upstream of another and if the upstream process is "broken" there is an indirect downstream effect, these are usually annotated from mutant phenotypes. Would the term x involved in y make sense (for example splicing involved in embroyonic development)? if not, probably the effect is indirect (I'm not sure if this is a good criteria to distinguish but it sounds as though it could be?). (Note: PomBase? and SGD already do not annotate these as biological processes, they are captured with "phenotype" annotations)
      • Examples:
      • splicing -> cell cycle)
      • add more examples
  • Experimental errors (mainly from legacy data where newer papers have changed viewpoint)
  • Identify problems in the gene ontology
  • Distinguish between direct and indirect effects and refine existing annotation
  • etc

What is allowed in overlaps ?

Annotations in intersections *always* fall into one or more the following types:

  • Annotation to a term which is a child of both terms i.e pentose phosphate shunt is a child of nucleotide metabolism and carbohydrate metabolism. tRNA acetylation is a child or tRNA metabolism and amino acid metabolism
  • Regulatory upstream effects Upstream signalling pathway regulating both processes
    • Examples:
    • add signal transduction and transcriptional regulation examples
  • Multifunctional gene products, for example NOC3 appears to function in both pathways of replication and rRNA processing (note this could be an example of 2... add better examples)
    • Examples:
    • urmylation pathway is involved in BOTH tRNA modification and "small mmolecule..."
    • add more examples

Method

Use Matrix tool: http://stove.lbl.gov:8899/cgi-bin/amigo/amigo_exp?mode=nmatrix

with GO slim list (template list at the bottom of this page)

Amino acid metabolism (Term 1)

term ID 1 term ID 2 term name 2 exceptions (everything in this column which is NOT a GO ID or "OR/AND" needs to be in brackets comments/checked in species
GO:0006520 GO:0006310 DNA recombination NO OVERLAP S.p; M.m; S.c.
GO:0006520 GO:0006281 DNA repair GO:0006521 (regulation of cellular amino acid metabolic process -NOTE transcriptional) OR GO:0006338 (chromatin remodelling- NOTE, this may be indirect for repair???) S.p; M.m (one bad IP mapping reported); S.c. (one query)
GO:0006520 GO:0006260 DNA replication NO OVERLAP S.p; M. m (one bad IP mapping reported); S.c (3 queried)
GO:0006520 GO:0030437 ascospore formation NO OVERLAP S.p; M.m (n/a); S.c.
GO:0006520 GO:0005975 carbohydrate metabolic process LOTS (add details later) S.p
GO:0006520 GO:0007155 cell adhesion GO:0006521 (regulation of cellular amino acid metabolic process) S.p; M.m (0 pending responses from 2 Uniprot annotations); S.c.
GO:0006520 GO:0070882 cellular cell wall organization or biogenesis GO:0004360 (glutamine-fructose-6-phosphate transaminase isomerizing activity) OR GO:0004067 (asparaginase activity)S.p; M.m (n/a); S.c.
GO:0006520 GO:0016568 chromatin modification GO:0006521 (regulation of cellular amino acid metabolic process- transcriptional) OR GO:0006355 (regulation of transcription, DNA dependent)S.p; M.m.; S.c
GO:0006520 GO:0051276 chromosome organization GO:0023052 (signaling, upstream) OR GO:0006521 (regulation of cellular amino acid metabolic process- transcriptional) S.p; M.m.; S.c.
GO:0006520 GO:0007059 chromosome segregation GO:0023052 (signaling, upstream) S.p; M m.; S.c.
GO:0006520 GO:0051186 cofactor metabolic process LOTS (add details later) S.p
GO:0006520 GO:0000747 conjugation with cellular fusion NO OVERLAP S.p; M.m (n/a); S.c.
GO:0006520 GO:0000910 cytokinesis NO OVERLAP S.p; M.m (pending apc update, iniprot); S.c.
GO:0006520 GO:0002181 cytoplasmic translation GO:0043039 (tRNA aminoacylation) S.p. M.m; S.c
GO:0006520 GO:0007010 cytoskeleton organization GO:0023052 (signaling, upstream) S.p; M.m; S.c. (NOTE check translation too)
GO:0006520 GO:0007163 establishment or maintenance of cell polarity GO:0006355 (regulation of transcription, DNA dependent) S.p; M.m.; S.c.
GO:0006520 GO:0006091 generation of precursor metabolites and energy LOTS (add details later)S.p
GO:0006520 GO:0006629 lipid metabolic process GO:0006521 (regulation of cellular amino acid metabolic process) OR GO:0023052 (signaling, upstream) S.p; M.m.; S.c
GO:0006520 GO:0016071 mRNA metabolic process NO OVERLAP S.p; M.m (0 pending uniprot feedback); S.c.
GO:0006520 GO:0007126 meiosis NO OVERLAP S.p; M.m; S.c.
GO:0006520 GO:0007005 mitochondrion organization GO:0043039 (tRNA aminoacylation) OR GO:0003994 (aconitate hydratase) S.p (fusion proteins need column 17 annotations) bifunctional enzyme ilv5- need to idinetify specific GO term to allow this;
GO:0006520 GO:0071941 nitrogen cycle metabolic process LOTS (add details later) S.p
GO:0006520 GO:0055086 nucleobase-containing small molecule metabolic process LOTS (add details later) S.p
GO:0006520 GO:0006913 nucleocytoplasmic transport NO OVERLAP S.p; M.m (1 but should disappear pending response from Uniprot)
GO:0006520 GO:0007031 peroxisome organization NO OVERLAP S.p; M.m; S.c.
GO:0006520 GO:0030163 protein catabolic process GO:0006521 (regulation of amino acid metabolic process) S.p; S.c
GO:0006520 GO:0006461 protein complex assembly NO OVERLAP S.p; (check Mouse); S.c
GO:0006520 GO:0006457 protein folding LOTS (add details later) S.p; M.m (reported 1 anomaly to MGI); S.c (queried 2 anomalies)
GO:0006520 GO:0006486 protein glycosylation NO OVERLAP S.p; S.c
GO:0006520 GO:0051604 protein maturation NO OVERLAP S.p;
GO:0006520 GO:0070647 protein modification by small protein conjugation or removal GO:0006521 (regulation of cellular amino acid metabolic process) S.p; M.m (querying 3 annotations, one derived from uniprot human, and 2 MGI); S.c (querying 1)
GO:0006520 GO:0006605 protein targeting NO OVERLAPS.p; M.m; S.c
GO:0006520 GO:0007346 regulation of mitotic cell cycle NO OVERLAP S.p CHECK, other term has?
GO:0006520 GO:0042254 ribosome biogenesis NO OVERLAP S.p; M.m; S.c.
GO:0006520 GO:0023052 signaling LOTS (add details later) S.p; M.m; S.c
GO:0006520 GO:0006399 tRNA metabolic process GO:0043039 (tRNA aminoacylation) OR GO:0031071 (cysteine desulphurase) S.p; M.m. S.c (some checks outstanding)
GO:0006520 GO:0006351 transcription, DNA-dependent GO:0006355 (regulation of transcription, DNA dependent) S.p; M.m
GO:0006520 GO:0055085 transmembrane transport NO OVERLAP S.p; S.c (one ontology query)
GO:0006520 GO:0007033 vacuole organization NO OVERLAP S.p; S.c
GO:0006520 GO:0016192 vesicle-mediated transport regulation of vesicle mediated transport & regulation of amino acid metabolism (BOTH, via regulation of transcription) S.p; S.c.
GO:0006520 GO:0006766 vitamin metabolic process GO:0004586 (ornithine decarboxylase activity); GO:0004372 (glycine hydroxymethyltransferase activity) S.p; M.m (one query to fix); S.c. (some SPKW to remove check sno1,2,3)

tRNA metabolic process (Term 1)

term ID 1 term ID 2 term name 2 exceptions (everything in this column which is NOT a GO ID or "OR/AND" needs to be in brackets comments/checked in species
GO:0006399 GO:0006310 DNA recombination NO OVERLAP S.p.; M.m
GO:0006399 GO:0006281 DNA repair NO OVERLAPS.p.; M.m.
GO:0006399 GO:0006260 DNA replication NO OVERLAP S.p.; M.m
GO:0006399 GO:0030437 ascospore formation NO OVERLAP S.p. M.m (n/a)
GO:0006399 GO:0005975 carbohydrate metabolic process NO OVERLAP S.p.; M.m
GO:0006399 GO:0007155 cell adhesion NO OVERLAP S.p.; M.m.
GO:0006399 GO:0006520 cellular amino acid metabolic process GO:0043039 (tRNA aminoacylation) OR (GO:0031071)) cysteine desulphurase S.p; M.m
GO:0006399 GO:0070882 cellular cell wall organization or biogenesis NO OVERLAP S.p.; M.m (n/a)
GO:0006399 GO:0016568 chromatin modification GO:0033588 (elongator complex) S.p (check this, is elongator still thought to be histone acetyltransferase???);M.m
GO:0006399 GO:0051276 chromosome organization GO:0033588 (elongator complex) S.p (check this, is elongator still thought to be histone acetyltransferase???); M.m
GO:0006399 GO:0007059 chromosome segregation NO OVERLAP S.p; M.m.
GO:0006399 GO:0051186 cofactor metabolic process GO:0006777 (Mo-molybdopterin cofactor biosynthetic process)S.p; M.m.
GO:0006399 GO:0000747 conjugation with cellular fusion NO OVERLAP S.p.; M.m (n/a)
GO:0006399 GO:0000910 cytokinesis NO OVERLAP S.p.; M.m
GO:0006399 GO:0002181 cytoplasmic translation GO:0043039 (tRNA aminoacylation) S.p.; M.m.
GO:0006399 GO:0007010 cytoskeleton organization NO OVERLAP S.p; M.m.
GO:0006399 GO:0007163 establishment or maintenance of cell polarity NO OVERLAP S.p; M.m
GO:0006399 GO:0006091 generation of precursor metabolites and energy NO OVERLAP S.p; M.m.
GO:0006399 GO:0006629 lipid metabolic process NO OVERLAP S.p.; M.m.
GO:0006520 GO:0016071 mRNA metabolic process LOTS (add details later) S.p; M.m.
GO:0006520 GO:0007126 meiosis NO OVERLAP S.p; M.m
GO:0006520 GO:0007005 mitochondrion organization GO:0043039 (tRNA aminoacylation) S.p; M.m. (may need some more terms allowed in overlap)
GO:0006399 GO:0071941 nitrogen cycle metabolic process NO OVERLAP S.p; M.m.
GO:0006399 GO:0055086 nucleobase-containing small molecule metabolic process LOTS (add details later)S.p.; M.m includes GO:0008479 (queuine tRNA-ribosyltransferase activity) OR GO:0003924 (GTPase activity, specifically IPR004520)
GO:0006399 GO:0006913 nucleocytoplasmic transport NO OVERLAP S.p.; M.m
GO:0006399 GO:0007031 peroxisome organization NO OVERLAP S.p.; M.m
GO:0006399 GO:0030163 protein catabolic process NO OVERLAP S.p.; M.m
GO:0006399 GO:0006461 protein complex assembly NO OVERLAP S.p.; M.m (need to deal with protein tetramerization)
GO:0006399 GO:0006457 protein folding NO OVERLAP S.p.; M.m
GO:0006399 GO:0006486 protein glycosylation NO OVERLAP S.p.; M.m.
GO:0006399 GO:0051604 protein maturation NO OVERLAP S.p; M.m
GO:0006399 GO:0070647 protein modification by small protein conjugation or removal GO:0032447 (protein urmylation) S.p; M.m.
GO:0006399 GO:0006605 protein targeting NO OVERLAP S.p.; M.m
GO:0006399 GO:0007346 regulation of mitotic cell cycle NO OVERLAP S.p; M.m
GO:0006399 GO:0042254 ribosome biogenesis LOTS (add details later) S.p.; M.m.
GO:0006399 GO:0023052 signaling NO OVERLAP S.p; M.m.
GO:0006399 GO:0006351 transcription, DNA-dependent LOTS (add details later)S.p.; M.m
GO:0006399 GO:0055085 transmembrane transport NO OVERLAP S.p.; M.m
GO:0006399 GO:0007033 vacuole organization NO OVERLAP S.p.; M.m.
GO:0006399 GO:0016192 vesicle-mediated transport NO OVERLAP S.p; M.m.
GO:0006399 GO:0006766 vitamin metabolic process NO OVERLAP S.p.; M.m

Template for process specific tables

GO:0006310 DNA recombination - -
GO:0006281 DNA repair - -
GO:0006260 DNA replication - -
GO:0030437 ascospore formation - -
GO:0005975 carbohydrate metabolic process - -
GO:0007155 cell adhesion - -
GO:0006520 cellular amino acid metabolic process - -
GO:0070882 cellular cell wall organization or biogenesis - -
GO:0016568 chromatin modification - -
GO:0051276 chromosome organization - -
GO:0007059 chromosome segregation - -
GO:0051186 cofactor metabolic process - -
GO:0000747 conjugation with cellular fusion - -
GO:0000910 cytokinesis - -
GO:0002181 cytoplasmic translation - -
GO:0007010 cytoskeleton organization - -
GO:0007163 establishment or maintenance of cell polarity - -
GO:0006091 generation of precursor metabolites and energy - -
GO:0006629 lipid metabolic process - -
GO:0016071 mRNA metabolic process - -
GO:0007126 meiosis - -
GO:0007005 mitochondrion organization - -
GO:0071941 nitrogen cycle metabolic process - -
GO:0055086 nucleobase-containing small molecule metabolic process - -
GO:0006913 nucleocytoplasmic transport - -
GO:0007031 peroxisome organization - -
GO:0030163 protein catabolic process - -
GO:0006461 protein complex assembly - -
GO:0006457 protein folding - -
GO:0006486 protein glycosylation - -
GO:0051604 protein maturation - -
GO:0070647 protein modification by small protein conjugation or removal - -
GO:0006605 protein targeting - -
GO:0007346 regulation of mitotic cell cycle - -
GO:0042254 ribosome biogenesis - -
GO:0023052 signaling - -
GO:0006399 tRNA metabolic process - -
GO:0006351 transcription, DNA-dependent - -
GO:0055085 transmembrane transport - -
GO:0007033 vacuole organization - -
GO:0016192 vesicle-mediated transport - -
GO:0006766 vitamin metabolic process - -

Template to type into Matrix GO:0006310 GO:0006281 GO:0006260 GO:0030437 GO:0005975 GO:0007155 GO:0006520 GO:0070882 GO:0016568 GO:0051276 GO:0007059 GO:0051186 GO:0000747 GO:0000910 GO:0002181 GO:0007010 GO:0007163 GO:0006091 GO:0006629 GO:0016071 GO:0007126 GO:0007005 GO:0071941 GO:0055086 GO:0006913 GO:0007031 GO:0030163 GO:0006461 GO:0006457 GO:0006486 GO:0051604 GO:0070647 GO:0006605 GO:0007346 GO:0042254 GO:0023052 GO:0006399 GO:0006351 GO:0055085 GO:0007033 GO:0016192 GO:0006766