Version 8 (modified by al637, 5 years ago) (diff)


Phenotype Curation Guide


Scope of phenotype curation

... especially when to curate phenotype, GO, or both

  • SGD's criteria (from Maria): "in summary, we annotate phenotypes that represent the effects of mutations on the organism as a whole, rather than the effects on the product of the gene that is mutated. For example, if a particular mutation blocks the in vitro activity of an enzyme, we might annotate that with GO but would not make a phenotype annotation. However, if the same mutation causes the inability of yeast to utilize a certain nutrient, then we would capture that as a phenotype annotation."
  • General PomBase attitude: direct involvement or effects can usually be captured with GO terms, whereas indirect effects are better as phenotypes. We can pretty much always annotate an observed phenotype, but only a subset of phenotypes effectively support IMP-evidenced GO annotations.
    • example: tad3 subunit of tRNA adenosine deaminase - GO term for adenosine-to-inosine conversion; phenotype annotations for cell cycle arrest (PMID:17875641)



  • wild type is yfg1+
  • if paper names it, use that name; otherwise:
    • deletions are named yfg1delta by default
    • if a deletion also has a marker inserted, you can use the name yfg1delta::ura4+
    • a disruption (different from a deletion, in that gene sequence, esp. coding, is still present) is yfg1::ura4+
    • everything else will just default to "noname" (displayed as "unnamed" on gene pages)
  • Additonal note: allele systematic ids will be gene systematic id dash integer, e.g. SPBC4.04c-2 SPBC4.04c-1, etc. and won't encode any details in identifier; stuff that info in properties, description, etc.


  • See the built-in Canto hints and the page on describing residues for how to specify allele descriptions
  • Choose "unknown" option in Canto if the change isn't described in the paper you're curating
    • If a description is entered in any Canto session, it will override an "unknown" description for the same allele name in any other session.
    • On gene pages, just the allele name will appear, with "unknown" in a mouseover



  • Describes the experimental conditions which may or definitely affect the phenotype. Includes things such as:
    • Type of medium: rich medium, minimal medium, growth on agar plates, growth in liquid culture, glucose medium, sporulation medium etc
    • Chemicals added to the assay which are 'normally' not included in the medium, or which are added to the medium in a higher than normal concentration. This includes a wide range of substances: glutamate, cyclosporin A, calcium, hydrogen peroxide.
    • Chemicals added in a limiting amount (perhaps enough to get things going) but is then rapidly depleted by the cells. For instance adding 5 mg/L of adenine to the medium instead of 100 mg/L.
    • The temperature at which the cells were grown. Currently split into 3 categories; low, medium and high.
    • Sequential growth conditions, in which cells were subjected to a series of different conditions such as nitrogen starvation and recovery or heat shock and recovery.
  • Conditions live in a small in-house ontology in github. Request new condition terms on the FYPO tracker.
  • Also see the page on pre- vs. post-composed FYPO terms

Annotation extensions

Phenotype annotations can have extensions to capture expressivity (severity), penetrance, and genes, transcripts or proteins used in an assay. Expressivity and penetrance can use qualitative values from the FYPO_EXT mini-ontology (in svn at pombe-embl/mini-ontologies/fypo_extension.obo), or penetrance can be specified quantitatively as a percent (I haven't run into any quantitative expressivity as of 2015-04-10 -mah).

  • Expressivity is usually qualitative, e.g. has_expressivity(FYPO_EXT:0000003) (note: FYPO_EXT:0000003 = low).
  • Penetrance
    • Qualitative e.g. has_penetrance(FYPO_EXT:0000003)
    • Quantitative e.g. has_penetrance(25%)
  • Specifying what was assayed
    • 'Binding' phenotype terms
      • Binding to DNA, chromatin, small molecules
        • DNA binding FYPO:0000653 and descendants
          • assayed_using(geneA) strongly recommended
          • assayed_using(SO:nnnnnnn) optional, but recommended where possible
        • chromatin binding FYPO:0001093 - assayed_using(geneA) strongly recommended
        • small molecule binding, e.g. GTP binding FYPO:0001528 and descendants - assayed_using(geneA) strongly recommended; assumed if omitted
      • Protein binding - this is a special case because all of the things involved are gene products. Annotation extensions should therefore name both/all of the interacting proteins, whether or not that includes the mutated gene product. Canto should eventually require that each assayed_using() extension contains two or more gene/protein IDs. A check is run with each Chado load to flag any protein binding annotations that don't have two extensions (part of the chado_checks log). The Chado load also allows duplicated extensions specifically for the protein binding terms, in case a protein binding to itself is assayed.
        • mutation in geneA affects binding of protein A to protein B
          • assayed_using(geneA),assayed_using(geneB)
        • mutation in geneA affects binding of protein B to protein C
          • assayed_using(geneB),assayed_using(geneC)
        • mutation in geneA affects binding of protein A to protein B and protein C (three-way interaction)
          • assayed_using(geneA),assayed_using(geneB),assayed_using(geneC)
    • 'Catalytic activity' phenotype terms
      • Use assayed_enzyme to capture which gene product's activity was assayed
      • Use assayed_substrate to capture the substrate
      • A catalytic activity phenotype annotation may have either or both of assayed_enzyme and assayed_substrate. It's less useful if the annotation has neither extension.
      • assayed_enzyme and assayed_substrate introduced March-April 2015. We've decided not to be too bothered about retrofitting extensions made with assayed_using prior to adding the new extension relations, because they're not wrong, they're just a bit less specific than is now possible.
    • Everything else (e.g. protein localization, protein level, RNA level, etc.) - assayed_using
      • e.g. assayed_using(PomBase:SPAC30D11.10)

FYPO annotation extension relations

These are in an OBO file at github. As of 2015-04-09:

  • has_expressivity - see above
  • has_penetrance - see above
  • assayed_using - see above
    • assayed_enzyme - see above
    • assayed_substrate - see above
  • occurs_in - rarely used; use with biological process phenotypes, usually with a Cell Ontology ID to capture when a phenotype is seen in one mating type but not the other
    • e.g. FYPO:0000821 occurs_in(CL:0002675)
  • is_bearer_of - rarely used, and a terrible kludge; use with colony pigmentation phenotypes to capture the colour
    • e.g. FYPO:0000741 is_bearer_of(PATO:0000322), where PATO:0000322 is "red"

Additional notes

  • The Canto documentation on phenotype curation has more information and examples.
  • Canto changes to capture phenotypes of double/triple mutants are in the testing phase now. In the meantime, we're alleles, phenotypes, etc. on the wiki or in files. Note: If the phenotype is "synthetic lethal" this is explicit in the BioGRID evidence code and can be inferred later from the interaction evidence, but we'll still need to make a note of the alleles.