wiki:DatabaseSanityChecks

Sanity checks to be run on the Chado database

  • GO checks
  • Check that all required fields are present
    • aspect=F/P/C only
    • term
    • GOID
    • evidence
    • date

  • check that all features CDS/ncRNA/rRNA/tRNA/snRNA/snoRNA have product

  • check that aspect is correct for term
  • check that all DB identifiers prefixes are in this list <add link to db xrefs>

  • check that ISS/IPI/IGI have with
  • check that IC have from
  • check that "qualifier" types are "approved" (qualifier, allele, etc)
  • check that qualifier contains allowed terms (qualifier only described at present)
  • check PMID is known
  • WITH FIELD
    • check that SGD in with field is i) known and ii) matches the SGD code for the ortholog code Y* listed in ortholog for this gene
    • pipes are only allowed in the with field with the GO term "protein binding, bridging".
  • check that GO term and ID match (DONE)
  • check for UTR to CDS overlaps and check for gaps between UTR and CDS (non urgent or not required as is checked for by Ensembl loader)
  • IC should always have "from"
  • IPI annotations should be reciprocal
  • GO ID should exist in ontology
  • db_xref should only contain PMID:xxx OR GO_REF:xxx
  • GO term should not use synonym ID
  • check IDs in the with and from fields i.e. that all fission yeast systemtic IDs are valid) That all GO ids are valid, and that the gene is annotated to them
  • after any colour 8 without sequence orphan/ any other colour with kw sequence orphan, uncharacterised
  • check how many qualifiers are attached to annotation extensions (not many), make sure these are represented on the Pombase gene pages
  • Once the triage is in place and all of the papers are imported, check allused PMIDs are included (will find all the typos!)
  • orthologs when an orthologous relationship occurs it should be the same for every occurance.

i.e. always SPAC15A10.11 YGR184C|YLR024C SPBC19C7.02 YGR184C|YLR024C never SPAC15A10.11 YGR184C|YLR024C SPBC19C7.02 YGR184C or SPAC15A10.11 YGR184C|YLR024C SPBC19C7.02 YGR184C|YLR024C|YDR470C

  • Some species distribution annotations are mutually exclusive. e.g. "orthologs cannot be distinguished" and "no apparent orthologs" (Val to compile a complete list)
Last modified 9 years ago Last modified on Feb 8, 2012, 10:08:44 PM