Describing Residues Modified or Mutated

Describing Amino Acid Residues for Modifications etc

  • Single Amino acid AA single letter code, residue number residue=K54 (residue=54 was used in legacy annotations but is now deprecated
  • Multiple single amino acid residues either residue=K54|residue=K22 or residue=K22,K54 as appropriate
  • Range of amino acids residue=A114-A660 (114-660 was used in legacy annotation but is now deprecated)
  • Region not fully defined used residue=region_A114-A660
  • Special case i) in RNA pol II CTD domain residues tend to be referred to in context of the CTD domain and are annotated as CTD,S2|CTD,S5 (TODO FIX THESE)
  • Special case ii) Cleavage sites. Are defined as a range of 2 amino acids flanking the cleaved peptide bond, may not be the best way to do this... eg residue=R179-N180

Describing Amino Acid Residues and Bases to for allele descriptions

Amino Acids

  • Mutation of amino acids to specific stop codons (i.e important if generating read-through) allele=name(W234->opal)
  • Mutation of single amino acid residue: allele=name(K132A)
  • Mutation of multiple single AA residues (not consecutive): allele=name(K132A,K144A)
  • Mutation of multiple consecutive AA residues: range in single letter code, number of first AA in range, range of residues mutated to e.g. allele=Rhp54Km(KEN26AAA);
  • Legacy data (In Artemis) all allele data in Artemis which is not in the form allele=name(description) is ALL in the form allele=name ONLY e.g. allele=htbK119R
  • Partial AA deletion: allele=name(del_100-200) for proteins
  • Note that for almost all proteins, amino acid numbering is based on the unmodified protein. The annoying exception is histones, which are numbered assuming removal of the initiator methionine.
  • Allele=heterozygous for heterozygous diploid? (check at next inhouse meeting)


  • No examples yet curated; if any needed, follow patterns as for amino acids, but include 'nt' to indicate nucleotides, e.g.
    • single changes: ntA25G
    • partial deletions: delnt_100-200
  • Note that if neither 'aa' nor 'nt' is included, 'aa' (amino acids) will be assumed