Cambridge May 8, 2014
Present: Paul, Val, Midori, Dan, Mark, Antonia
Remote: Kim, Jurg
Apologies: Steve

PomBase Planning Meeting 7th May 2014

Action item review

(see follow up below)

  • AI: Re user gff query the new file appears to contain LTRs, UTRs and the correct number of CDS, Mark to send file/ close ticket (DONE)
  • AI: Val circulate list of proposed PIs and other groups for support letters for additional suggestions. Draft letter (DONE)
  • AI: Val create a list of the Intermine features which will be useful for PomBase users (DONE)
  • AI: Re gene page revisions to deal with increasing page length Val to a) document b) make mock ups b) ask for community feedback on proposal (pending)
  • AI: Problem with Paco's data. Mark to contact Paco/Luis/Ignacio? (pending, see below)
  • AI: Val to make curation tool videos for pombelist/FAQ (pending)

Collected Action items May

  • AI: UniProt xref script needs to be run each release, Chado42. (Q is there a jira ticket for this?) (Mark)
  • AI: Re-contact Paco's group to fix plus and minus signs in primary database submission (Mark) DONE
  • AI: Circulate Wilhelm and Marguerat data to Jurg, Sam and Brian for feedback, are all tracks useful? Sam’s data: this is only bigwigs, would bam be useful? Still need track descriptions(Mark/curators)

New (and continuing) Agenda items

UniProt referencing

Release procedure not done yet, Paul:it is important to do every release. Mark runs it on its own for releases. Dan: put it into the dump script, don’t run on its own.

Paco’s data

Mark - plus and minus signs possibly the wrong way around.

  • AI: Re-contact Paco's group to fix plus and minus signs in primary database submission (Mark) Have emailed Luis

Mark has fixed the Wilhelm transcription data, he is just waiting for descriptions from them now. Jurg can’t comment right now on top of his head if all 31 tracks "should" be shown (are some more important than others? Do we really need to load all 31?.

  • AI: Circulate Wilhelm and Marguerat data to Jurg, Sam and Brian for feedback, are all tracks useful? Would bam be useful? Still need track descriptions (Mark/curators)
  • AI: Speak to Eugene to group tracks (aka track hub) (Mark) (Q is there an Ensembl jira ticket for this?) -

Multiple alignments

Related to protein multiple alignments from Compara Dan says that the gaps have been squashed out. Val says that just taking the gaps out is not sufficient. Paul agrees. Paul: dynamic pairwise alignments are generated, no reason why multiple alignments can't also be generated.

  • AI check that i) gap squashing and or ii) dynamic alignment generation for subsets of protein in progress (EG, Q check is there a jira ticket for this)


Trackhub - Integrate multiple tracks into one -> speed optimization (MWIG). This is now implemented, but not for PomBase?. Release 43 or 44?

  • AI check progress of trackhub for PomBase? (EG) (jira ticket?)

Update cycle

Now have a single repo that sets up config file, downloads chado, update or reload from chado, and a set of script that pulls in uniprot ID’s etc running interpro scan. Healthcheck steps, genemart step (coreDB and ontology mart), caching and updating step, loads onto drupal, generates sup caches such as synonyms, ontology counts (dependent on biomart). And so on. Operationally this means it is smoother to do.

This should be break-downable into 2 or 3 scripts.

Now there should not be a problem with ontologies being out of synch. Paul advises to have a make file (each time you to things it is not possible to do things out of order). Dependency path.

Time to make a release assuming no problems Mark -4 days. Paul wonders if it really takes 4 days if using scripts. Dan also thinks it sounds a lot.

  • AI: Look into speeding Ensembl loading part (need jira ticket) (Mark and Dan).
  • AI: Establish a 'make' file to check dependencies are completed (Mark)
  • AI: Document update procedure and time taken (Mark)

Browser defaults

We want to move the bottom graph (showing genes) up.

The top graph is a good size now, 25kb. The lower one we want smaller. Currently 15kb because the largest gene needs this. The problem then is, if you go to find genome regions (from then only the top graph will show, the lower graphs says that the region is too large.

Paul reckons they should be configurable independently. The max for the bottom should not be the min for the top. Mark has a jira ticket with Julia for that.

Top graph has TRNA instead of tRNA, Mark will fix this, should be a jira ticket.

  1. Why does it say e.g. “protein coding” even if colour coded? A. Because Ensembl has different types of ncRNA colored differently but then the specific type is labeled underneath.

Nick Rhind’s data PB-1803

The top datafiles are BAM files The bottom ones, converted BAM coverage to something that is quicker to display.

Bam has 3 views: large number alignments, small number alignments and coverage. Bigwig only coverage.

Val thinks the labeling is problematic. Bam files don't label in the same way as bigwigs. Weird numbers are displayed on LHS. Paul guesses it is a scale for Y axis, but no 0 is shown? Also the colour is different.

Track names are not shown underneath, Should be a jira ticket for this. Mark has raised this with Eugene before. Paul says he should make a Jira ticket. Mark says there might already be one.

  • AI Look into making consistent browser track descriptions for different types of transcriptome data (existing jira ticket, transfer to Ensembl?)
  • AI. For the “omitted” text in the full alignment view for trancriptome data , change this to “not displayed” and stick at the front. The omitted refers to transcripts (in this case) that are not shown.

consider: Either show coverage or full thing, not the half-coverage. The halfway house is a bit confusing. Paul: But you might want to cap it if there is a highly expressed gene in the area.


Show broad, exact and related. Not narrow ones. Problem with onto perl – stripping out the required field.

  • AI: report onto perl bug synonym type not included (who?) - Have raised with ensembl.
Last modified 6 years ago Last modified on May 16, 2014, 3:29:11 PM