PomBase Planning Meeting 03 July 2013

Cambridge - Meeting Room B, Sanger Building

Action item review

action items from 29th May


  • AI Versioning What is the action item? (MArk/Dan)

Update cycle and automated update pipeline

  • AI Check all required files are present and correct on ftp site (Val)
  • AI standard health checks (Mark) - Continual
  • AI Selenium testing (Dan) - tests now cover gene page, simple, advanced search, plus existing speed test.
  • AI manual smoke testing (Dan) - first draft test sheet completed
  • AI :pep val spotted that this item referred to cDNA fasta, which seems sort of odd that it has a pep ID at all? (val check)
  • AI Report that the GFF "source" column should be the specific data provider (PomBase? not Ensembl) (dan) DONE
  • AI Val liaise with Mark to make sure the update begins with time to get the new phenotype data live for pombe 2013
  • AI Val will get as much of the phenotype data from mapped as possible but won't be able to finalise until Midori includes all of the new population phenotype terms required which may take a few days (mostly DONE)

ENA updating

  • AI Once the new db_refs are present in the EMBL entries curators should e-mail NCBI to let them know (val)

Hosting large scale datasets

  • AI create an incoming Ftp directory to support hosting incoming large scale datasets (EG) - DONE - this is a single username & password for uploading to (Dan can supply)
  • AI Curators document use of ftp directory (file naming, required data etc, but this is on hold until the system is running smoothly ...Paul does not want to announce until tested). At the same time we will document the accptable file formats for each data type

Pombe 2013

  • Various action items (ALL DONE, except some front page changes, see below)


  • AI Mark needs to know when new SO qualifiers (like TR box) are added (open jira item for Kim to generate a list)

Pombase front page

  • AI curators should ensure that front page is updated, minimally at release time (curators)
  • AI change Pombase tagline in database header (mark) AI change Pombase tagline in banner (midori) AI change Pombase tagline in any other publicity material (check flyer etc) (all) - Done

Curation, community curation

  • AI Aim to send out ~50 before pombe 2013 (val) (DONE)

Action items from before May

  • AI: Curators and Kim to record curation stats. Should separate staff & community.
  • AI: Curators to ask community for pictures to rotate on the front page (part DONE)
  • AI Curators to draw up a script for the curation video. What steps do we want to show on the video? Discuss at next curator meeting. (Part Done)
  • AI Mark will do general PomBase poster for Pombe 2013 (DONE)
  • AI Mark to work on automated release pipeline with regression testing. Plan is to have a pipeline ready for the next meeting (Mark - In Progress, see minutes).
  • AI: Document version calling (Mark)
  • AI: User documentation for website, should be copy-pasteable from Mark's documentation (Midori) (DONE)

Postponed action items

  • Demo of curation tool (postponed)
  • Val/curators will start to send out a large number of community curation sessions (DONE)
  • Mark to liaise with Giulietta to help with making a Pombe community curation video (postponed until curators have script)
  • Dan and Mark to see what files downloadable from PomBase have previously been generated by ensembl. These should be easy to create automatic updates for (In progress?)
  • making Artemis applet avaiable

New (and continuing) Agenda items

Feedback pombe 2013

  • PomBase gene page feedback
    • Gene pages still too slow (and many people miss the 'others' links, which would slow further
    • Need short description or summary paragraph (antonia)
    • Page ordering put GO MF, then BP then CC then phenotype
    • any more (I have added small items to the tracker)
    • People seem really supportive of the decision to make GO annotation direct only, a few people have said that this was a downfall of GO annotation....

Hosting high throughput datasets

The BAM files ideally should be in ENA

submitting to ENA will be a world of pain for our users To trouble shoot submission EG will submit these test files to ENA

  • Incoming ftp site for high throughput data - DONE
  • Can we get a list of formats for each datatype? "File formats and repositories for HTP data" which unfortunately slipped from the action items.

Priority Jira tickets

Things which are broken or misleading

Required for pending announcments

front page news ( keeping it current, could discuss with jira tracker issues below)

Quick text or reformatting changes

Other Jira related

Update pipeline and ftp site

  • Q can we get stats for how many times particular files are downloaded?
  • see related action items
  • Add Jira fix-versions for next couple chado releases ?
  • question, do Ensembl support GFF (which plavour?) or GTF and where is the file described - GFF3 and GTF FTP dumps provided by EnsemblGenomes?.org as of EG19
  • Is GTF/GFF production now part of update pipelinge (file is now from April so does not match last release date, is it still generated from "Restful" Interface) - GFF will be on FTP site from EG19 onwards
  • Is restful interface ready to document in our FAS (i.e general release?) - Documentation is at Beta label likely to be removed by Ensembl in autumn but no known issues and would strongly encourage use.
  • FROM MAY Minor issue with the last update, some gene lists were missing for specific ontology terms. This was due to the annotation extension term names not being updated with the ontology. Mark has changed this so that everything (ontology terms and extensions0 are updated, so this should not happen in future ...this happened again for the last update (see ticket)


Remove ':pep' from identifiers in cDNA FASTA file (Kim)

Discussed moving to similar schema as for transcripts, so proteins are named systematic_ID.1 .2 matching alternative transcripts and only differentiated by "type" Any changes related to this are postponed until after 2013 which gives us time to think if this is what we really want to do (Kim sounded not keen)

Usage stats

GeneDB decommissioned on 14th May. Large increase in users and page accesses



  • There will be a major version number for the community which will follow chado version numbers 34 etc For each version we will record on a PomBase? documentation page (automatically generated)
    • INSDC assembly (currently 2)
    • gene build 2.1 etc
    • annotation version will be the chado version
    • ensembl software version
    • GO GAF file date
    • Interpro release
    • compara version
    • pombe cerevisiae ortholog table version
    • etc

This means :

  • Users can report chado version for any analysis and other components will be traceable
  • Users can easily check which components other than the functional annotation (which will change every release) have changed, to see if a particular type of analysis is affected. For example analysis which is dependent only on gene structures changes will need to be updated less frequently).

  • For legacy data users can continue to use file date stamp

What needs to happen next?


  • ETA/effort estimate for reciprocal interaction annotations (PB-873)?

Should we put this as an action item to follow up next time?

Phenotype ontology

Anything to report?

  • 2272 terms


Anything to report?

Other general issues

  • transcript type of TR box Mark: I have to add it to a list of features to define which side of the fence they fall, either gene/transcript/translation or if they are simple features, or they are something else that is used else where but does not need to be considered as a gene/simple feature, for example Chromsomes are treated as a special case.
    • This sounds as though it could be easily automated
      • note promoter should be part of gene, not transcript



  • Update on community curation
  • Antonia's epiphany Antonia co-curated a number of papers with the authors at the meeting and realised this is a much, much quicker approach. The author can home in on what they showed and the curator knows the ontology really well, so papers could be curated very quickly. So we have decided to all visit labs to sit with the authors and curate.
    • Antonia is starting with Sara mole's lab next month, other London labs and Warwick for starters
    • Val will do Manchester
    • Next Edinburgh, Brighton, Bangor (up for grabs)
  • Description lines. One sentence, Antonia now has these in progress, has invited community to contribute
    • Kim will need to write a loader for flat file
    • MArk will need to display in section header
  • Literature (triage) status

ItemMarch meetingApril meeting May meeting July 3
All publications975597619773 9935
Un-triaged publications042 1
Curatable publications47354740 4780 4877
Publications with Approved sessions580600 647 674
Publications with active sessions245247 265 249
Publications with session needing approval144 416
community curatable publications - -- 262
community curated publications with approved sessions- -- 31
  • Canto annotations (from 2013-04-26)
nameMarch countApril countMay countJuly count
PomBase annotation extension terms1448160516511938

All annotation types

EC numbers839
PomBase family or domain1873
PomBase gene characterisation status5143
PomBase gene products7018
PomBase annotation extension terms31170

(last month's total was 125473)

Next priorities



Next planning meeting


Last modified 6 years ago Last modified on Jul 7, 2015, 3:28:37 PM