
Universal Protein Resource (UniProt)
====================================


The Universal Protein Resource (UniProt), a collaboration between the European
Bioinformatics Institute (EBI), the SIB Swiss Institute of Bioinformatics, and
the Protein Information Resource (PIR), is comprised of three databases, each
optimized for different uses. The UniProt Knowledgebase (UniProtKB) is the
central access point for extensively curated protein information, including
function, classification and cross-references. The UniProt Reference Clusters
(UniRef) combine closely related sequences into a single record to speed up
sequence similarity searches. The UniProt Archive (UniParc) is a comprehensive
repository of all protein sequences, consisting only of unique identifiers and
sequences.


UniProt RDF Distribution
========================

This directory contains the following files:

- Core datasets of UniProt in RDF/XML format:

  Due to the volume of data, each core dataset is distributed as a collection of
  files that match the following file name patterns:

    uniprotkb_*.rdf.xz   UniProt Knowledgebase (UniProtKB)
    uniref_*.rdf.xz      UniProt Reference clusters (UniRef)
    uniparc_*.rdf.xz     UniProt Sequence archive (UniParc)

  The UniProtKB dataset is split into files based on the top levels of the NCBI
  taxonomy (the file name indicates the classification and ID of the taxon) that
  contain at most 1 million entries. Obsolete entries are provided in separate
  files with at most 10 million entries (uniprotkb_obsolete_*.rdf.xz). The
  UniRef dataset is split into files that contain about 100,000 clusters. The
  UniParc dataset is split into files of about 1 GB in size.

- Supporting datasets for UniProt in RDF/XML format:

    citations.rdf.xz   Literature citations
    diseases.rdf.xz    Human diseases
    journals.rdf.xz    Journals which contain articles cited in UniProt
    taxonomy.rdf.xz    Organisms
    keywords.rdf.xz    Keywords
    go.owl.xz          Gene Ontology
    enzyme.rdf.xz      Enzyme classification
    pathways.rdf.xz    Pathways
    locations.rdf.xz   Subcellular locations
    tissues.rdf.xz     Tissues
    databases.rdf.xz   Databases that are linked to from uniprot.rdf.xz
    proteomes.rdf.xz   Proteomes
 
  For taxonomy and GO, these additional files contain inferred rdfs:subClassOf
  statements:
 
    taxonomy-hierarchy.rdf.xz
    go-hierarchy.rdf.xz

  For chemical reaction data, Rhea RDF can be downloaded from
  https://ftp.expasy.org/databases/rhea/rdf/

- Classes and properties used in the UniProt RDF distribution:

    core.owl, also includes Cellular components (Organelles)

- Release information:

    RELEASE.metalink
  or
    RELEASE.meta4


For more information about UniProt RDF, please see

  https://sparql.uniprot.org/

--------------------------------------------------------------------------------
  LICENSE
--------------------------------------------------------------------------------
We have chosen to apply the Creative Commons Attribution 4.0 International
(CC BY 4.0) License (https://creativecommons.org/licenses/by/4.0/) to all
copyrightable parts of our databases.

(c) 2002-2025 UniProt Consortium

--------------------------------------------------------------------------------
  DISCLAIMER
--------------------------------------------------------------------------------
We make no warranties regarding the correctness of the data, and disclaim
liability for damages resulting from its use. We cannot provide unrestricted
permission regarding the use of the data, as some data may be covered by patents
or other rights.

Any medical or genetic information is provided for research, educational and
informational purposes only. It is not in any way intended to be used as a
substitute for professional medical advice, diagnosis, treatment or care.
