Introduction
The RSEc version-controlled repository federates metadata for research software, predominantly within the life sciences domain. These metadata cover a wide spectrum of use cases, spanning software discovery, evaluation, deployment, and execution.
Centralised in an open and version-controlled repository, the metadata enable cross-linking between services, facilitate curation, and provide insights through aggregation and analysis. This page collects key links to access and understand the software metadata provided by the Research Software Ecosystem, plus contribution guidelines and support channels.
Quick start
Open the content repository and explore `data/` (by tool) or `imports/` (by source).
See when imports run in the GitHub Actions workflow.
Use identifiers in the cross-link table below (e.g. bio.tools IDs, Bioconda packages, Galaxy tool IDs).
Open an issue in the content repo tracker if you spot a problem.
How to use the metadata
Search `data/tool-id` folders to see aggregated metadata across registries.
Compare entries across sources (e.g. bio.tools vs OpenEBench) and file PRs to fix discrepancies.
Consume raw JSON/YAML from the repository, mirror it, or automate updates with the weekly imports.
Metadata Repository contents
The RSEc metadata can be accessed on the GitHub dedicated repository. The main folders to access metadata are: the imports folder, which contains one subfolder per metadata source, and the data folder, which contains one subfolder for each of the bio.tools entries, combining bio.tools tools and metadata files which are directly linked to it. An example of this organisation is illustrated in Figure 1.
Fig. 1: Example organisation of the metadata files imported in the RSEc metadata repository
Supported Formats
Details about the specific formats for each of the federated resources can be found in the following places:
| Resource Description | Link |
|---|---|
| Bio.tools API Reference | Bio.tools API Reference |
| OpenEBench | OpenEBench Technical metrics and endpoints description Tool JSON Schema Metrics JSON Schema |
| Bioconda | Bioconda contribution guidelines |
| Biocontainers | WIP |
| Galaxy Codex | Documentation work-in-progress |
| Debian Med | The YAML files describing the packages are based on information extracted from the Ultimate Debian Database using a custom import script |
| BIII | The metadata describing the software are serialized as Bioschemas-based JSON-LD files, using a custom import script |
| Bioconductor | The metadata describing the software are in |
Most metadata formats for a given source include cross-links to other sources:
| Destination / Source | bio.tools | OpenEBench | Bioconda | Biocontainers | Galaxy Codex | Debian Med | BIII | Bioconductor |
|---|---|---|---|---|---|---|---|---|
| bio.tools | n/a | url entries of the download key where url starts with "https://anaconda.org/bioconda/", the remainder of the url being the Bioconda package name |
url entries of the download key where url starts with "https://tracker.debian.org/pkg/", the remainder of the url being the Debian package name |
|||||
| OpenEBench | List elements that have and @id key starting with "https://openebench.bsc.es/monitor/metrics/biotools" |
n/a | List elements that have and @id key starting with "https://openebench.bsc.es/monitor/metrics/bioconda" |
List elements that have and @id key starting with "https://openebench.bsc.es/monitor/metrics/galaxy" |
||||
| Bioconda | YAML list extra.identifiers, CURIEs starting with "biotools:" |
n/a | For usegalaxy.eu, YAML list extra.identifiers, CURIEs starting with "usegalaxy-eu:" |
|||||
| Biocontainers | n/a | |||||||
| Galaxy Codex | ‘bio.tool_id’ key in the JSON file | ‘Conda_id’ key in the JSON file | n/a | |||||
| Debian Med | YAML list registries, CURIES are in entry key when name is "bio.tools" |
YAML list registries, CURIES are in entry key when name is "conda:bioconda" |
n/a | |||||
| BIII | n/a | |||||||
| Bioconductor | n/a |
Metadata Import Workflow
The metadata is imported and updated using a GitHub actions workflow which runs weekly, importing the updated metadata from all sources. Each import task (listed in the table below) is an independent Github action that queries a specific source and updates the metadata in the git repository. It is usually running a python script that:
- cleans all the data from the source from the existing repository checkout
- retrieves the latest version of the metadata, using e.g. an HTTP API, a git repository checkout, or a database access.
- formats these metadata in a format which is as close as possible to the source format, yet compatible with git (e.g. YAML or JSON reformatting is sometimes required).
- commits this version of the metadata. The outline of this workflow is illustrated in Fig. 2.
CI Import workflows in the repository
Contributing guidelines
We welcome any contribution to the project. Please refer to the governance document, and get in contact with us (see the Contacts page).