Schema
Pydantic models for the Bioregistry.
- class Attributable(**data)[source]
An upper-level metadata for a person.
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- add_triples(graph)[source]
Add triples to an RDF graph for this author.
- Parameters
graph (rdflib.Graph) – An RDF graph
- Return type
- Returns
The RDF node representing this author using an ORCiD URI.
- class Author(**data)[source]
Metadata for an author.
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- class Provider(**data)[source]
A provider.
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- class Resource(**data)[source]
Metadata about an ontology, database, or other resource.
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- providers: Optional[List[bioregistry.schema.struct.Provider]]
Additional non-default providers for the given resource
- contact: Optional[bioregistry.schema.struct.Attributable]
The contact email address for the individual responsible for the resource
- example: Optional[str]
An example local identifier for the resource, explicitly excluding any redundant usage of the prefix in the identifier. For example, a GO identifier should only look like
1234567
and not likeGO:1234567
- part_of: Optional[str]
An annotation between this prefix and a super-prefix. For example,
chembl.compound
is a part ofchembl
.
- provides: Optional[str]
An annotation between this prefix and a prefix for which it is redundant. For example,
ctd.gene
has been given a prefix by Identifiers.org, but it actually just reuses identifies fromncbigene
, soctd.gene
providesncbigene
.
- banana: Optional[str]
The banana is a generalization of the concept of the “namespace embedded in local unique identifier”. Many OBO foundry ontologies use the redundant uppercased name of the ontology in the local identifier, such as the Gene Ontology, which makes the prefixes have a redundant usage as in
GO:GO:1234567
. The banana tag explicitly annotates the part in the local identifier that should be stripped, if found. While the Bioregistry automatically knows how to handle all OBO Foundry ontologies’ bananas because the OBO Foundry provides the “preferredPrefix” field, the banana can be annotated on non-OBO ontologies to more explicitly write the beginning part of the identifier that should be stripped. This allowed for solving one of the long-standing issues with the Identifiers.org resolver (e.g., foroma.hog
; see https://github.com/identifiers-org/identifiers-org.github.io/issues/155) as well as better annotate new entries, such as SwissMap Lipids, which have the prefixswisslipid
but have the redundant informationSLM:
in the beginning of identifiers. Therefore,SLM:
is the banana.
- deprecated: Optional[bool]
A flag denoting if this resource is deprecated. Currently, this is a blanket term that covers cases when the prefix is no longer maintained, when it has been rolled into another resource, when the website related to the resource goes down, or any other reason that it’s difficult or impossible to find full metadata on the resource. If this is set to true, please add a comment explaining why. This flag will override annotations from the OLS, OBO Foundry, and Prefix Commons on the deprecation status, since they often disagree and are very conservative in calling dead resources.
- mappings: Optional[Dict[str, str]]
A dictionary of metaprefixes (i.e., prefixes for registries) to prefixes in external registries. These also correspond to the registry-specific JSON fields in this model like
miriam
field.
- synonyms: Optional[List[str]]
A list of synonyms for the prefix of this resource. These are used in normalization of prefixes and are a useful reference tool for prefixes that are written many ways. For example,
snomedct
has many synonyms including typos likeSNOWMEDCT
, lexical variants likeSNOMED_CT
, version-variants likeSNOMEDCT_2010_1_31
, and tons of other nonsense likeSNOMEDCTCT
.
- references: Optional[List[str]]
A list of URLs to also see, such as publications describing the resource
- appears_in: Optional[List[str]]
A list of prefixes whose corresponding resources use this resource for xrefs, provenance, etc.
- depends_on: Optional[List[str]]
A list of prefixes that use this resource depends on, e.g., ontologies that import each other.
- namespace_in_lui: Optional[bool]
A flag denoting if the namespace is embedded in the LUI (if this is true and it is not accompanied by a banana, assume that the banana is the prefix in all caps plus a colon, as is standard in OBO). Currently this flag is only used to override identifiers.org in the case of
gramene.growthstage
,oma.hog
, andvario
.
- no_own_terms: Optional[bool]
A flag to denote if the resource mints its own identifiers. Omission or explicit marking as false means that the resource does have its own terms. This is most applicable to ontologies, specifically application ontologies, which only reuse terms from others. One example is ChIRO.
- contributor: Optional[bioregistry.schema.struct.Author]
Contributor information, including the name, ORCiD, and optionally the email of the contributor. All entries curated through the Bioregistry GitHub Workflow must contain this field.
- reviewer: Optional[bioregistry.schema.struct.Author]
Reviewer information, including the name, ORCiD, and optionally the email of the reviewer. All entries curated through the Bioregistry GitHub Workflow must contain this field pointing to the person who reviewed it on GitHub.
- proprietary: Optional[bool]
A flag to denote if this database is proprietary and therefore can not be included in normal quality control checks nor can it be resolved. Omission or explicit marking as false means that the resource is not proprietary.
- has_canonical: Optional[str]
An annotation between this prefix and another prefix if they share the same provider IRI to denote that the other prefix should be considered as the canonical prefix to which IRIs should be contracted as CURIEs.
See also
This field was added and described in detail in https://github.com/biopragmatics/bioregistry/pull/164
- preferred_prefix: Optional[str]
An annotation of stylization of the prefix. This appears in OBO ontologies like FBbt as well as databases like NCBIGene. If it’s not given, then assume that the normalized prefix used in the Bioregistry is canonical.
- obofoundry: Optional[Mapping[str, Any]]
External data from the Open Biomedical Ontologies (OBO) Foundry catalog
- get_mapped_prefix(metaprefix)[source]
Get the prefix for the given external.
- Parameters
metaprefix (
str
) – The metaprefix for the external resource- Return type
- Returns
The prefix in the external registry, if it could be mapped
>>> from bioregistry import get_resource >>> get_resource("chebi").get_mapped_prefix("wikidata") 'P683'
- get_prefix_key(key, metaprefixes)[source]
Get a key enriched by the given external resources’ data.
- get_default_uri(identifier)[source]
Return the default URI for the identifier.
- Parameters
identifier (
str
) – The local identifier in the nomenclature represented by this resource- Return type
- Returns
The first-party provider URI for the local identifier, if one can be constructed
>>> from bioregistry import get_resource >>> get_resource("chebi").get_default_uri("24867") 'https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:24867'
- get_banana()[source]
Get the optional redundant prefix to go before an identifier.
A “banana” is an embedded prefix that isn’t actually part of the identifier. Usually this corresponds to the prefix itself, with some specific stylization such as in the case of FBbt. The banana does NOT include a colon “:” at the end
Explicitly annotated banana
>>> from bioregistry import get_resource >>> get_resource("go.ref").get_banana() 'GO_REF'
Banana imported through OBO Foundry
>>> get_resource("fbbt").get_banana() 'FBbt'
Banana inferred for OBO Foundry ontology
>>> get_resource("chebi").get_banana() 'CHEBI'
No banana, no namespace in LUI
>>> get_resource("pdb").get_banana() None
Banana is not inferred for OBO Foundry ontologies that were imported: >>> get_resource(“ncit”).get_banana() None >>> get_resource(“ncbitaxon”).get_banana() None
- get_default_format()[source]
Get the default, first-party URI prefix.
>>> from bioregistry import get_resource >>> get_resource("ncbitaxon").get_default_format() 'https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=$1' >>> get_resource("go").get_default_format() 'http://amigo.geneontology.org/amigo/term/GO:$1'
- get_preferred_prefix()[source]
Get the preferred prefix (e.g., with stylization) if it exists.
- Return type
- Returns
The preferred prefix, if annotated in the Bioregistry or OBO Foundry.
No preferred prefix annotation, defaults to normalized prefix >>> from bioregistry import get_resource >>> get_resource(“rhea”).get_preferred_prefix() None
Preferred prefix defined in the Bioregistry >>> get_resource(“wb”).get_preferred_prefix() ‘WormBase’
Preferred prefix defined in the OBO Foundry >>> get_resource(“fbbt”).get_preferred_prefix() ‘FBbt’
Preferred prefix from the OBO Foundry overridden by the Bioregistry (see also https://github.com/OBOFoundry/OBOFoundry.github.io/issues/1559) >>> get_resource(“dpo”).get_preferred_prefix() ‘DPO’
- get_obo_preferred_prefix()[source]
Get the OBO preferred prefix, if this resource is mapped to the OBO Foundry.
- get_contact_email()[source]
Return the contact email, if available.
>>> from bioregistry import get_resource >>> get_resource("bioregistry").get_contact_email() # from bioregistry curation 'cthoyt@gmail.com' >>> get_resource("chebi").get_contact_email() 'amalik@ebi.ac.uk'
- get_contact_name()[source]
Return the contact name, if available.
>>> from bioregistry import get_resource >>> get_resource("bioregistry").get_contact_name() # from bioregistry curation 'Charles Tapley Hoyt' >>> get_resource("chebi").get_contact_name() 'Adnan Malik'
- get_contact_github()[source]
Return the contact GitHub handle, if available.
>>> from bioregistry import get_resource >>> get_resource("bioregistry").get_contact_github() # from bioregistry curation 'cthoyt' >>> get_resource("agro").get_contact_github() # from OBO Foundry 'marieALaporte'
- get_contact_orcid()[source]
Return the contact ORCiD, if available.
>>> from bioregistry import get_resource >>> get_resource("bioregistry").get_contact_orcid() # from bioregistry curation '0000-0003-4423-4370'
- is_deprecated()[source]
Return if the given prefix corresponds to a deprecated resource.
- Return type
- Returns
If the prefix has been explicitly marked as deprecated either by the Bioregistry, OBO Foundry, OLS, or MIRIAM. If no marks are present, assumed not to be deprecated.
>>> from bioregistry import get_resource >>> assert get_resource("imr").is_deprecated() # marked by OBO >>> assert get_resource("iro").is_deprecated() # marked by Bioregistry >>> assert get_resource("miriam.collection").is_deprecated() # marked by MIRIAM
- get_obofoundry_prefix()[source]
Get the OBO Foundry prefix if available.
>>> from bioregistry import get_resource >>> get_resource("go").get_obofoundry_prefix() # standard 'GO' >>> get_resource("ncbitaxon").get_obofoundry_prefix() # mixed case 'NCBITaxon' >>> assert get_resource("sty").get_obofoundry_prefix() is None
- get_obofoundry_uri_prefix()[source]
Get the OBO Foundry URI prefix for this entry, if possible.
>>> from bioregistry import get_resource >>> get_resource("go").get_obofoundry_uri_prefix() # standard 'http://purl.obolibrary.org/obo/GO_' >>> get_resource("ncbitaxon").get_obofoundry_uri_prefix() # mixed case 'http://purl.obolibrary.org/obo/NCBITaxon_' >>> assert get_resource("sty").get_obofoundry_uri_prefix() is None
- get_obofoundry_uri_format()[source]
Get the OBO Foundry URI format string for this entry, if possible.
>>> from bioregistry import get_resource >>> get_resource("go").get_obofoundry_uri_format() # standard 'http://purl.obolibrary.org/obo/GO_$1' >>> get_resource("ncbitaxon").get_obofoundry_uri_format() # mixed case 'http://purl.obolibrary.org/obo/NCBITaxon_$1' >>> assert get_resource("sty").get_obofoundry_uri_format() is None
- get_prefixcommons_uri_format()[source]
Get the Prefix Commons URI format string for this entry, if available.
>>> from bioregistry import get_resource >>> get_resource("hgmd").get_prefixcommons_uri_format() 'http://www.hgmd.cf.ac.uk/ac/gene.php?gene=$1'
- get_identifiers_org_prefix()[source]
Get the identifiers.org prefix if available.
- Return type
- Returns
The Identifiers.org/MIRIAM prefix corresponding to the prefix, if mappable.
>>> from bioregistry import get_resource >>> get_resource('chebi').get_identifiers_org_prefix() 'chebi' >>> get_resource('ncbitaxon').get_identifiers_org_prefix() 'taxonomy' >>> assert get_resource('MONDO').get_identifiers_org_prefix() is None
- get_miriam_uri_prefix()[source]
Get the Identifiers.org URI prefix for this entry, if possible.
>>> from bioregistry import get_resource >>> get_resource('ncbitaxon').get_miriam_uri_prefix() 'https://identifiers.org/taxonomy:' >>> get_resource('go').get_miriam_uri_prefix() 'https://identifiers.org/GO:' >>> assert get_resource('sty').get_miriam_uri_prefix() is None
- get_miriam_uri_format()[source]
Get the Identifiers.org URI format string for this entry, if possible.
>>> from bioregistry import get_resource >>> get_resource('ncbitaxon').get_miriam_uri_format() 'https://identifiers.org/taxonomy:$1' >>> get_resource('go').get_miriam_uri_format() 'https://identifiers.org/GO:$1' >>> assert get_resource('sty').get_miriam_uri_format() is None
- get_ols_uri_prefix()[source]
Get the OLS URI prefix for this entry, if possible.
Warning
This doesn’t have a normal form, so it only works for OBO Foundry at the moment.
>>> from bioregistry import get_resource >>> get_resource("go").get_ols_uri_prefix() # standard 'https://www.ebi.ac.uk/ols/ontologies/go/terms?iri=http://purl.obolibrary.org/obo/GO_' >>> get_resource("ncbitaxon").get_ols_uri_prefix() # mixed case 'https://www.ebi.ac.uk/ols/ontologies/ncbitaxon/terms?iri=http://purl.obolibrary.org/obo/NCBITaxon_' >>> assert get_resource("sty").get_ols_uri_prefix() is None
- get_ols_uri_format()[source]
Get the OLS URI format string for this entry, if possible.
Warning
This doesn’t have a normal form, so it only works for OBO Foundry at the moment.
>>> from bioregistry import get_resource >>> get_resource("go").get_ols_uri_format() # standard 'https://www.ebi.ac.uk/ols/ontologies/go/terms?iri=http://purl.obolibrary.org/obo/GO_$1' >>> get_resource("ncbitaxon").get_ols_uri_format() # mixed case 'https://www.ebi.ac.uk/ols/ontologies/ncbitaxon/terms?iri=http://purl.obolibrary.org/obo/NCBITaxon_$1' >>> assert get_resource("sty").get_ols_uri_format() is None
- DEFAULT_URI_FORMATTER_PRIORITY: ClassVar[Sequence[str]] = ('default', 'obofoundry', 'prefixcommons', 'miriam', 'n2t', 'ols')
The default priority for generating URIs
- get_uri_format(priority=None)[source]
Get the URI format string for the given prefix, if it’s available.
- Parameters
priority (
Optional
[Sequence
[str
]]) –The priority order of metaresources to use for format URI lookup. The default is:
Default first party (from bioregistry, prefix commons, or miriam)
OBO Foundry
Prefix Commons
Identifiers.org
N2T
OLS
BioPortal
- Return type
- Returns
The best URI format string, where the
$1
should be replaced by a local unique identifier.$1
could potentially appear multiple times.
>>> from bioregistry import get_resource >>> get_resource("chebi").get_uri_format() 'https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:$1'
If you want to specify a different priority order, you can do so with the
priority
keyword. This is of particular interest to ontologists and semantic web people who might want to usepurl.obolibrary.org
URL prefixes over the URL prefixes corresponding to the first-party providers for each resource (e.g., the ChEBI example above). Do so like:>>> from bioregistry import get_resource >>> priority = ['obofoundry', 'bioregistry', 'prefixcommons', 'miriam', 'ols'] >>> get_resource("chebi").get_uri_format(priority=priority) 'http://purl.obolibrary.org/obo/CHEBI_$1'
- get_uri_prefix(priority=None)[source]
Get a well-formed URI prefix, if available.
- Parameters
priority (
Optional
[Sequence
[str
]]) – The prioirty order forget_format()
.- Return type
- Returns
The URI prefix. Similar to what’s returned by
get_uri_format()
, but it MUST have only one$1
and end with$1
to use thie function.
>>> import bioregistry >>> bioregistry.get_uri_prefix('chebi') 'https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:'
- standardize_identifier(identifier, prefix=None)[source]
Normalize the identifier to not have a redundant prefix or banana.
- Parameters
- Return type
- Returns
A normalized identifier, possibly with banana/redundant prefix removed
Examples with explicitly annotated bananas: >>> from bioregistry import get_resource >>> get_resource(“vario”).standardize_identifier(‘0376’) ‘0376’ >>> get_resource(“vario”).standardize_identifier(‘VariO:0376’) ‘0376’ >>> get_resource(“swisslipid”).standardize_identifier(‘000000001’) ‘000000001’ >>> get_resource(“swisslipid”).standardize_identifier(‘SLM:000000001’) ‘000000001’
Examples with bananas from OBO: >>> get_resource(“fbbt”).standardize_identifier(‘00007294’) ‘00007294’ >>> get_resource(“fbbt”).standardize_identifier(‘FBbt:00007294’) ‘00007294’ >>> get_resource(“chebi”).standardize_identifier(‘1234’) ‘1234’ >>> get_resource(“chebi”).standardize_identifier(‘CHEBI:1234’) ‘1234’
Examples from OBO Foundry that should not have a redundant prefix added: >>> get_resource(“ncit”).standardize_identifier(“C73192”) ‘C73192’ >>> get_resource(“ncbitaxon”).standardize_identifier(“9606”) ‘9606’
Standard: >>> get_resource(“pdb”).standardize_identifier(‘00000020’) ‘00000020’
- miriam_standardize_identifier(identifier)[source]
Normalize the identifier for legacy usage with MIRIAM using the appropriate banana.
- Parameters
identifier (
str
) – The identifier in the CURIE- Return type
- Returns
A normalize identifier, possibly with banana/redundant prefix added
Because identifiers.org used to have URIs in the form of https://identifiers.org/<prefix>/<prefix>:<identifier> for entries annotated with
namespaceEmbeddedInLui
astrue
Examples with explicitly annotated bananas: >>> from bioregistry import get_resource >>> get_resource(“vario”).miriam_standardize_identifier(‘0376’) ‘VariO:0376’ >>> get_resource(“vario”).miriam_standardize_identifier(‘VariO:0376’) ‘VariO:0376’
Examples with bananas from OBO: >>> get_resource(“fbbt”).miriam_standardize_identifier(‘00007294’) ‘FBbt:00007294’ >>> get_resource(“fbbt”).miriam_standardize_identifier(‘FBbt:00007294’) ‘FBbt:00007294’
Examples from OBO Foundry: >>> get_resource(“chebi”).miriam_standardize_identifier(‘1234’) ‘CHEBI:1234’ >>> get_resource(“chebi”).miriam_standardize_identifier(‘CHEBI:1234’) ‘CHEBI:1234’
Examples from OBO Foundry that should not have a redundant prefix added: >>> get_resource(“ncit”).miriam_standardize_identifier(“C73192”) ‘C73192’ >>> get_resource(“ncbitaxon”).miriam_standardize_identifier(“9606”) ‘9606’
Standard: >>> get_resource(“pdb”).miriam_standardize_identifier(‘00000020’) ‘00000020’
- is_canonical_identifier(identifier)[source]
Check that a local unique identifier is canonical, meaning no bananas.
- is_known_identifier(identifier)[source]
Check that a local unique identifier can be normalized and also matches a prefix’s pattern.
- class Collection(**data)[source]
A collection of resources.
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- authors: List[bioregistry.schema.struct.Author]
A list of authors/contributors to the collection
- add_triples(graph)[source]
Add triples to an RDF graph for this collection.
- Parameters
graph (rdflib.Graph) – An RDF graph
- Return type
- Returns
The RDF node representing this collection using a Bioregistry IRI.
- class Registry(**data)[source]
Metadata about a registry.
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- availability: bioregistry.schema.struct.RegistrySchema
A structured description of the metadata the registry collects
- resolver_uri_format: Optional[str]
A URL with a $1 for a prefix and $2 for an identifier to resolve in the registry
- resolver_type: Optional[str]
An optional type annotation for what kind of resolver it is (i.e., redirect or lookup)
- contact: bioregistry.schema.struct.Attributable
The contact for the registry
- get_provider_uri_format(prefix)[source]
Get the provider string.
- Parameters
prefix (
str
) – The prefix used in the metaregistry- Return type
- Returns
The URL in the registry for the prefix, if it’s able to provide one
>>> from bioregistry import get_registry >>> get_registry("fairsharing").get_provider_uri_format("FAIRsharing.62qk8w") 'https://fairsharing.org/FAIRsharing.62qk8w' >>> get_registry("miriam").get_provider_uri_format("go") 'https://registry.identifiers.org/registry/go'
- add_triples(graph)[source]
Add triples to an RDF graph for this registry.
- Parameters
graph (rdflib.Graph) – An RDF graph
- Return type
- Returns
The RDF node representing this registry using a Bioregistry IRI.