Development News Brief
Get Galaxy
getgalaxy.org | ||
new: | $ hg clone http://www.bx.psu.edu/hg/galaxy galaxy-dist |
|
upgrade: | $ hg pull -u -r f364d992270c |
Bowtie and Lastz Migration to Tool Shed
The alignment tools 'Bowtie' and 'Lastz' from the tool group NGS: Mapping have moved from the Galaxy distribution to the Galaxy Main Tool Shed.
Migration scripts for both Bowtie and Lastz will run upon Galaxy's first launch (after updating to this release) that will automatically handle installing replacement tool wrappers from the Tool Shed. Primary executables for Bowtie and Lastz plus target reference genomes should still be installed as described in the Galaxy wiki - start in the Tool Dependencies section.
Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end).
Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25.
LASTZ is a program for aligning DNA sequences, a pairwise aligner. Originally designed to handle sequences the size of human chromosomes and from different species, it is also useful for sequences produced by NGS sequencing technologies such as Roche 454.
Harris, R.S. (2007) Improved pairwise alignment of genomic DNA. Ph.D. Thesis, The Pennsylvania State University.
## New Galaxy CloudMan Release
CloudMan offers an easy way to get a personal and completely functional instance of Galaxy in the cloud in just a few minutes, without any manual configuration.
This update brings a large number of updates and new features, the most prominent ones being:
- Support for Eucalyptus cloud middleware; thanks to Alex Richter. Also, CloudMan can now run on the HPcloud in basic mode (note that there is no public image available on the HPcloud at the moment and one would thus need to be built by you).
- Added a new file system management interface on the CloudMan Admin page, allowing control and providing insight into each available file system
- Added quite a few new user data options. See the UserData page for details; thanks to John Chilton.
- Galaxy can now be run in multi-process mode; thanks to John Chilton.
- Added Galaxy Reports app as a CloudMan service; thanks to John Chilton.
- Introduced a new format for cluster configuration persistence, allowing more flexibility in how services are maintained
- Added a new file system service for instance's transient storage, allowing it to be used across the cluster over NFS. The file system is available at
/mnt/transient_nfs
just know that any data stored there will not be preserved after a cluster is terminated. - Support for Ubuntu 12.10
- Worker instances are now also SGE submit hosts
This update comes as a result of 175 code changesets; for a complete list of changes, see the commit messages.
Any new cluster will automatically start using this version of CloudMan. Existing clusters will be given an option to do an automatic update once the main interface page is refreshed.
# Tool Shed
Improvements in the display of repository dependencies and contents in the tool shed
The various types of contents of a tool shed repository ( valid tools, invalid tools, datatypes, workflows ) as well as the dependencies that are defined for the repository are now displayed in clickable containers that can be opened or closed. For example here is the view of the emboss_5 repository that I'm hosting on my local Galaxy tool shed.
Notice the "Repository dependencies" container? This is currently in development, and will be available in the tool shed shortly. This container displays the list of all repositories int he tool shed upon which this repository depends.
Opening each of the above containers (by clicking on the links) displays the contents of each.
Functional test framework for the tool shed
Miscellaneous tool shed enhancements and fixes
- You can now configure the directory location for the tool shed's
hgweb.config
file using the following setting in yourcommunity_wsgi.ini
file. Configuring this location is highly recommended, but if you choose not to, a newhgweb.config
file will automatically be created in the default location (the Galaxy root directory).
Backups will be made of the hgweb.config file
(in the same directory in which it is located) any time a new repository is added to your tool shed, so configuring it to be located in it's own directory has benefits. You can also choose to change the configured location over time, and simply move the hgweb.config
file to that new location before starting your tool shed server, and everything should work as expected.
- #2 Implement a new
HgWebConfigManager
to manage the tool shed's hgweb.config file. This will greatly diminish file i/o for the tool shed. - #3 When defining dependencies for tools contained in a repository, allow for environment variables that contain neither
REPOSITORY_INSTALL_DIR
norINSTALL_DIR
; thanks to James Johnson. Allowing these values to be set in a single location rather than hard-coded into each config file is the best approach. Here's an example:
- #5 Don't allow reviewing empty repositories in the tool shed.
- #6 Provide a warning message when uploading files to a toolshed repository and a
tool_dependencies.xml
has been provided, buttool_dependencies
metadata has not been generated.
User Interface (UI)
- Introduction of the dataset "Paused" state and basic "Resume-Paused" functionality for a history.
- Adjustments and fixes to history panel layout.
- Added back in "display" and "edit" attribute buttons to datasets in the error state.
- Scatterplot visualization tool: updated layout of features.
-
Updated History Pull-down menu. Options affect all datasets in the current history:
- Resume Paused Jobs - a single-click resume of all paused datasets
- Collapse Expanded Datasets - a single-click to collapse all expanded datasets
- Show/Hide Deleted Datasets - a single-click toggle to show or hide all deleted datasets
- Show/Hide Hidden Datasets - a single-click toggle to show or hide all hidden datasets
- Unhide Hidden Datasets - a single-click to change state of hidden datasets to that of regular datasets
# Job Runner
- The query for determining which jobs are ready to run has been significantly optimized. Heavily loaded multiprocess Galaxy installations should see increased performance in job dispatch and finish times.
- Jobs and their outputs are no longer set to an error state when their inputs fail to complete successfully. Instead, they are moved to a "paused" state. In the distribution release following this, it will be possible to rerun the failed jobs and continue paused jobs from the point of failure.
- The
SGE
runner has been deprecated for a long time, and has finally been completely removed. TheDRMAA
runner should be used to connect toSGE
clusters. - The
check_galaxy
Nagios script has been updated to be compatible with the new client-side histories.
# Source
Miscellaneous Galaxy fixes and enhancements
- Add the ability to view the current data tables registry. This new feature is available from the Galaxy Administration menu within the "Server" section, and is labeled "View data tables registry".
- Since tool migration scripts can be executed any number of times, make sure that no repositories are installed if no tools associated with the migration are defined in the
tool_conf.xml
(or equivalent) file. This fix is associated only with the recently introduced Galaxy administration UI feature displaying the list of migrations stages currently available in the local Galaxy instance. This is the way that the migration process at Galaxy server startup always worked, so no changes were needed in that scenario. - Maintain entries for Galaxy's
ToolDataTableManager
that are acquired from installed tool shed repositories in a new config file namedshed_tool_data_table_conf.xml
. This will ensure that manual edits to the originaltool_data_table_conf.xml
file (which has existed for some time) will not be altered or lost when Galaxy's tool shed repository installation process automatically adds entries into the file. - Fix for
ToolDataTable
new entries that should have been persisted to theshed_tool_data_table_conf.xml
file were not being handled correctly. - Attempt to make sure
.sample
files included in an installed tool shed repository are copied to the~/tool-data
directory only if they are sample data index files. - Add error messages for a
DataToolParameter
when the provided value is no longer valid due to be deleted or being in an error state. - Rework "Re-run" functionality to validate and display errors between the original job and currently set states (e.g. the previously used dataset has been deleted).
- To help with reproducibility, when extracting a workflow from a history, provide a warning message if the tool version for a job does not match the tool version of the currently loaded tool.
# Security Fixes
All Galaxy instance maintainers are strongly encouraged to run the latest release.
- Grid filters are now sanitized correctly.
# Bug Fixes
- Ensure that slugs cannot be duplicated for active, importable items.
- Fix paging in embedded grids.
- When getting job parameters for extracting a workflow from a history, set
ignore_errors to True
. Prevents traceback when e.g. a tool was updated and had a text value changed to an integer. - Fix for rendering workflow tooltips when tool help is nonexistent in the wrapper.
# Announcements
News, December 2012 Galaxy Update
- Training Day Topic Nominations for GCC2013 will open in December. Start thinking of ideas now!
- Slides and Screencast from November GalaxyAdmins Meetup are online. The next GalaxyAdmins Meetup will be on January 16 and feature John Chilton discussing "Deploying Galaxy on OpenStack with CloudBioLinux & CloudMan"
- A short "Getting started with JGalaxy" document (with screenshots), by John Chilton
- Batch Workflow starting using the Galaxy API : Practical Example by Geert Vandeweyer
# About Galaxy
The Galaxy Team is a part of BX at Penn State, and the Biology and Mathematics and Computer Science departments at Emory University.
Galaxy is supported in part by NSF, NHGRI, the Huck Institutes of the Life Sciences, and The Institute for CyberScience at Penn State, and Emory University.
Join us at Twitter @galaxyproject or just read our tweets [Galaxy on Twitter](http://wiki.galaxyproject.org/Galaxy on Twitter) å