Initial EuroScienceGateway workflows published in WorkflowHub
Workflows covering astronomy, biodiversity, earth science and genomics published in WorkflowHub together with onboarding guide
WorkflowHub is a registry of computational workflows, provided as a EOSC Service by ELIXIR-UK, and used by over 200 different research projects, institutions and virtual collaborations. For this milestone of EuroScienceGateway (ESG), the project has developed an onboarding guide for WorkflowHub in order to register in WorkflowHub the initial ESG workflows that have been developed and maintained by the project. These workflows cover the EuroScienceGateway use cases of astronomy, biodiversity, earth science and genomics.
ESG onboarding guide for WorkflowHub
EuroScienceGateway (ESG) has developed a project-specific onboarding guide for WorkflowHub [Soiland-Reyes 2024]. The guide gives an overview of the structure used in WorkflowHub and pointers to general WorkflowHub onboarding.
The guide is managed as a living document in Google Doc, and registered as a Standard Operating Procedure in WorkflowHub with versioned snapshots.
Guide content include:
- EuroScienceGateway workflow organisation in WorkflowHub
- Registering EuroScienceGateway workflows in WorkflowHub
- Linking your workflow GitHub repository with WorkflowHub
- Keeping workflows up-to-date
- Linking workflows with other workflows
- Publishing workflows
- Adding new teams and collections
As part of the EuroScienceGateway project is about maturing the WorkflowHub EOSC service, similar project-specific guides have now been developed for the Biodiversity Genomics Europe (BGE) project, Beyond COVID-19 (BY-COVID), BioDiversity Digital Twin (BioDT). As these projects are larger and combine existing collaboration networks and e-Infrastructures, their organisation in WorkflowHub can be more complex than ESG, assisted by their respective onboarding guides.
Registering ESG workflows in WorkflowHub
Within the WorkflowHub, a EuroScienceGateway team groups the contributors, organisations and workflows developed by the ESG project. Each contributor can register for a separate account, and their workflows can be given shared attribution.
Figure 1: The WorkflowHub team https://workflowhub.eu/projects/166 shows registered people, organisations, standard operating procedures, workflows and collections.
Workflows are registered as Workflow RO-Crates, capturing the workflow definition and its metadata as a FAIR Digital Object. RO-Crate is a general-purpose FAIR packaging mechanism for data, metadata and software.
Figure 2: Workflow https://workflowhub.eu/workflows/749?version=1 with options for Download RO-Crate and Run on usegalaxy.eu
Galaxy workflows registered this way can be launched from WorkflowHub (Figure 2) directly onto the usegalaxy.eu instance. This feature is also used by the GTN as part of a snippet (Figure 3) that enables such launching of workflows from a particular tutorial.
Figure 3: Galaxy Training Network guide for embedding WorkflowHub execution snippets in tutorials.
In addition, an ESG Workflow Collection has been created - the purpose of this is to also aggregate pre-existing and third-party workflows which have been helped or further developed by ESG, such as in the Galaxy Training Network (GTN) and Intergalactic Workflow Commission (IWC), both which are community-led initiatives with participants outside ESG.
Registered EuroScienceGateway workflows
As of 2024-02-28, the WorkflowHub ESG collection contains the workflows:
- Gravitational Wave source Cone Search (CWL) by Volodymyr Savchenko
- Example Multi-Wavelength Light-Curve Analysis (Galaxy) by Volodymyr Savchenko
- Refining Genome Annotations with Apollo (prokaryotes) (Galaxy) by Anthony Bretaudeau
- Visualizing NDVI time-series data with HoloViz (Galaxy) by Marie Jossé
- Calculating and visualizing OBIS marine biodiversity indicators (Galaxy) Marie Josse
- Finding the Muon Stopping Site using PyMuonSuite (Galaxy) by Leandro Liborio, Muon Spectroscopy Computational Project
- Sentinel 5P volcanic data visualization (Galaxy) by Marie Jossé
- Functional protein annotation using EggNOG-mapper and InterProScan (Galaxy) by Anthony Bretaudeau
- Genome annotation with Funannotate (Galaxy) by Anthony Bretaudeau
- Masking repeats with RepeatMasker (Galaxy) by Anthony Bretaudeau
These workflows span the ESG use cases, including from astronomy, biodiversity, earth science and genomics. Further workflows will be registered during the second phase of the project.
Additional Galaxy Training Network (GTN) workflows are being considered for WorkflowHub registration, however as some of these workflows are building blocks meant to be completed according to a particular tutorial, these will be better suited for a separate collection, as they may not be directly suitable for scientific use.
In contrast, the Intergalactic Workflow Commission (IWC) has developed mature, production-grade workflows for Galaxy. The separate IWC team in WorkflowHub has registered 47 workflows as of 2024-02-28. These are automatically registered by the WorkflowHub Bot, which scans the IWC GitHub repositories and registers the workflows according to their RO-Crate metadata. As many of these also have defined tests, WorkflowHub is able to show their test status via the LifeMonitor service, picked up from the test definition in their RO-Crate (Figure 4).
Figure 4: Workflow https://workflowhub.eu/workflows/615?version=2 indicates Tests Passing and links to the LifeMonitor test results.
Cite As
Stian Soiland-Reyes, Björn Grüning, Paul De Geest (2024):
EuroScienceGateway MS3: Initial EuroScienceGateway workflows registered.
Zenodo (Milestone)
https://doi.org/10.5281/zenodo.1072892