Validation CLI
==============
The Bioregistry packages a set of CLI tools in the subcommand ``bioregistry validate``
that can be used to check the prefix map in a JSON-LD, Turtle, and eventually other
document formats for their conformance to the Bioregistry and offer actionable
suggestions for improvement.
Validating JSON-LD
------------------
The ``bioregistry validate jsonld`` that can be used to check the prefix map in a
JSON-LD document (either a local file or remote) conform to the Bioregistry.
For example, running the following returns a system exit of 1 because Bioregistry
collections indeed are Bioregistry-compliant by construction:
.. code-block:: console
$ bioregistry validate jsonld "https://bioregistry.io/api/collection/0000002?format=context"
However, the Prefix Commons vendored Gene Ontology (GO) context is not valid against the
Bioregistry. It can be checked with:
.. code-block:: console
$ bioregistry validate jsonld "https://raw.githubusercontent.com/prefixcommons/prefixcommons-py/master/prefixcommons/registry/go_context.jsonld" --tablefmt rst
============== ===================================================================== ========================= =======================================
prefix uri_prefix issue solution
============== ===================================================================== ========================= =======================================
BIOMD `http://www.ebi.ac.uk/compneur-srv/biomodels-main/publ-model.do?mid=` non-standard CURIE prefix Switch to standard prefix: biomodels.db
COG_Function `http://www.ncbi.nlm.nih.gov/COG/grace/shokog.cgi?fun=` unknown CURIE prefix
WB `http://identifiers.org/wormbase/` non-standard CURIE prefix Switch to standard prefix: wormbase
FBbt `http://purl.obolibrary.org/obo/FBbt_` non-standard CURIE prefix Switch to standard prefix: fbbt
KEGG_LIGAND `http://www.genome.jp/dbget-bin/www_bget?cpd:` non-standard CURIE prefix Switch to standard prefix: kegg.ligand
PSO_GIT `https://github.com/Planteome/plant-stress-ontology/issues/` unknown CURIE prefix
MaizeGDB_stock `http://maizegdb.org/data_center/stock?id=` unknown CURIE prefix
...
============== ===================================================================== ========================= =======================================
There are two things that might be the problem. First, the resource might use stylized
(i.e. mixed case) prefixes. Therefore, we could try passing ``--use-preferred`` to
respect prefix stylization
.. code-block:: console
$ bioregistry validate jsonld --use-preferred "https://raw.githubusercontent.com/prefixcommons/prefixcommons-py/master/prefixcommons/registry/go_context.jsonld"
============== ===================================================================== ========================= ========================================
prefix uri_prefix issue solution
============== ===================================================================== ========================= ========================================
BIOMD `http://www.ebi.ac.uk/compneur-srv/biomodels-main/publ-model.do?mid=` non-standard CURIE prefix Switch to preferred prefix: biomodels.db
COG_Function `http://www.ncbi.nlm.nih.gov/COG/grace/shokog.cgi?fun=` unknown CURIE prefix
WB `http://identifiers.org/wormbase/` non-standard CURIE prefix Switch to preferred prefix: WormBase
KEGG_LIGAND `http://www.genome.jp/dbget-bin/www_bget?cpd:` non-standard CURIE prefix Switch to preferred prefix: kegg.ligand
PSO_GIT `https://github.com/Planteome/plant-stress-ontology/issues/` unknown CURIE prefix
MaizeGDB_stock `http://maizegdb.org/data_center/stock?id=` unknown CURIE prefix
NCBI_GP `http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&val=` unknown CURIE prefix
...
============== ===================================================================== ========================= ========================================
Second, we could use a pre-defined community context that might have deviations from the
vanilla Bioregistry context using the ``--context`` option in combination with one of
the contexts' keys (see `all contexts `_):
.. code-block:: console
$ bioregistry validate jsonld --context obo "https://raw.githubusercontent.com/prefixcommons/prefixcommons-py/master/prefixcommons/registry/go_context.jsonld"
============== ===================================================================== ========================= =======================================
prefix uri_prefix issue solution
============== ===================================================================== ========================= =======================================
BIOMD `http://www.ebi.ac.uk/compneur-srv/biomodels-main/publ-model.do?mid=` non-standard CURIE prefix Switch to standard prefix: biomodels.db
COG_Function `http://www.ncbi.nlm.nih.gov/COG/grace/shokog.cgi?fun=` unknown CURIE prefix
WB `http://identifiers.org/wormbase/` non-standard CURIE prefix Switch to standard prefix: WormBase
KEGG_LIGAND `http://www.genome.jp/dbget-bin/www_bget?cpd:` unknown CURIE prefix
PSO_GIT `https://github.com/Planteome/plant-stress-ontology/issues/` unknown CURIE prefix
MaizeGDB_stock `http://maizegdb.org/data_center/stock?id=` unknown CURIE prefix
NCBI_GP `http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&val=` unknown CURIE prefix
...
============== ===================================================================== ========================= =======================================
It turns out that the GO JSON-LD file doesn't even validate against the OBO context!
Validating RDF in Turtle
------------------------
RDF data stored in Turtle files typically begins with a stanza defining a prefix map.
For example, one of the turtle files in the `Chemotion Knowledge Graph (Chemotion-KG)
`_
begins with the following six prefixes:
.. code-block:: turtle
@prefix nfdicore: .
@prefix ns1: .
@prefix ns2: .
@prefix obo: .
@prefix rdfs: .
@prefix xsd: .
The ``bioregistry validate jsonld`` command can be used to check the prefix map in this
file and give feedback on non-standard CURIE prefix usage, unknown CURIE prefixes, etc.
while giving suggestions for fixes, when possible.
Running the command on the file that contains the example prefixes from above gives the
following output:
.. code-block:: console
$ bioregistry validate ttl --tablefmt rst https://github.com/ISE-FIZKarlsruhe/chemotion-kg/raw/4cb5c24af/processing/output_bfo_compliant.ttl
======== ======================================== ========================= ==================================================================
prefix uri_prefix issue solution
======== ======================================== ========================= ==================================================================
nfdicore https://nfdi.fiz-karlsruhe.de/ontology/ non-standard CURIE prefix Switch to standard prefix: nfdi.core
ns1 http://purls.helmholtz-metadaten.de/mwo/ unknown CURIE prefix Consider switching to the more specific CURIE/URI prefix pair mwo:
`http://purls.helmholtz-metadaten.de/mwo/mwo_`
ns2 http://purl.obolibrary.org/obo/chebi/ unknown CURIE prefix
======== ======================================== ========================= ==================================================================
Validating Prefix Maps in LinkML
--------------------------------
`LinkML `_ enables defining a data model with YAML. Each definition
also includes a prefix map, which can be validate against the Bioregistry. Here's an
abridged excerpt of one `such configuration
`_:
.. code-block:: yaml
id: https://w3id.org/nfdi4cat/catcore
name: catcore-metadata
title: CatCore Metadata Reference Model
prefixes:
catcore: https://w3id.org/nfdi4cat/catcore/
voc4cat: https://w3id.org/nfdi4cat/voc4cat_
CHMO: http://purl.obolibrary.org/obo/CHMO_
OBI: http://purl.obolibrary.org/obo/OBI_
AFR: http://purl.allotrope.org/ontologies/result#AFR_
AFP: http://purl.allotrope.org/ontologies/process#AFP_
AFQ: http://purl.allotrope.org/ontologies/quality#AFQ_
NCIT: http://purl.obolibrary.org/obo/NCIT_
nmrCV: 'http://nmrML.org/nmrCV#NMR:'
linkml: https://w3id.org/linkml/
AFRL: http://purl.allotrope.org/ontologies/role#AFRL_
APOLLO_SV: http://purl.obolibrary.org/obo/APOLLO_SV_
SIO: http://semanticscience.org/resource/SIO_
default_prefix: catcore
The ``bioregistry validate linkml`` command can be used to check the prefix map in this
file and give feedback on non-standard CURIE prefix usage, unknown CURIE prefixes, etc.
while giving suggestions for fixes, when possible.
Running the command on the file that contains the example prefixes from above gives the
following output:
.. code-block:: console
$ bioregistry validate linkml --tablefmt rst --use-preferred https://github.com/HendrikBorgelt/CatCore/raw/refs/heads/main/src/catcore/schema/catcore.yaml
======= =================================================== ========================= ===============================
prefix uri_prefix issue solution
======= =================================================== ========================= ===============================
catcore `https://w3id.org/nfdi4cat/catcore/` unknown CURIE prefix
AFR `http://purl.allotrope.org/ontologies/result#AFR_` unknown CURIE prefix
AFP `http://purl.allotrope.org/ontologies/process#AFP_` unknown CURIE prefix
AFQ `http://purl.allotrope.org/ontologies/quality#AFQ_` unknown CURIE prefix
nmrCV `http://nmrML.org/nmrCV#NMR:` non-standard CURIE prefix Switch to preferred prefix: NMR
AFRL `http://purl.allotrope.org/ontologies/role#AFRL_` unknown CURIE prefix
SIO `http://semanticscience.org/resource/SIO_` non-standard CURIE prefix Switch to preferred prefix: sio
======= =================================================== ========================= ===============================
Note that LinkML is developed by members of the OBO Community, and therefore, its
prefixes often skew towards OBO community preferences. Therefore, you might want to use
the ``--use-preferred`` flag.
Reference
---------
.. automodapi:: bioregistry.validate
:no-heading:
:no-main-docstr: