HLA Ligand Atlas

Working with the HLA Ligand Atlas Data

The HLA Ligand Atlas data can be used freely under the CC-BY 4.0 license. You can export search results when browsing and filtering the data, but you can also download an entire data release from the Downloads page.

Besides the ZIP bundles provided on the Downloads page we also provide stable URLs to individual tables for every release that can be conveniently used programmatically. The URLs can be formed based on the schema

http://hla-ligand-atlas.org/rel/{releasename}/{tablename}.tsv.gz

where {tablename} can reference one of the following tables:

Peptides (tablename: peptides, keys: peptide_sequence_id)
Maps peptide sequence IDs to unique peptid sequences
Donors (tablename: donors, keys: donor)
Maps donors to donor HLA alleles
Sample Hits (tablename: sample_hits, keys: peptide_sequence_id, donor)
Maps samples (triplets of donor, tissue and HLA class) to peptide sequence IDs
Protein Map (tablename: protein_map, keys: peptide_sequence_id)
Maps peptide sequence IDs to source protein IDs (Uniprot IDs)
Aggregated peptide list (tablename: aggregated, keys: peptide_sequence_id)
An aggregated list of peptides as shown in in the peptide browser. It contains one row per peptide in the database.

Peptides

A code example written in Python using the Pandas package which illustrates the structure of the peptides table is shown below:

In [1]:
In [2]:
In [3]:
In [4]:
Out [4]:
                            
import pandas as pd
base_url = 'http://hla-ligand-atlas.org/rel/2020.12/'
peptides = pd.read_csv(base_url + 'peptides.tsv.gz', sep='\t')
peptides.head()
   peptide_sequence_id      peptide_sequence
0                    1       LLPKKTESHHKAKGK
1                    2  PKGKKAKGKKVAPAPAVVKK
2                    3  PEPAKSAPAPKKGSKKAVTK
3                    4      IKKWEKQVSQKKKQKN
4                    5         IKKWEKQVSQKKK

Donors

Similarly, the table of donors can be obtained:

In [5]:
In [6]:
Out [6]:
                            
donors = pd.read_csv(base_url + 'donors.tsv.gz', sep='\t')
donors.head()
         donor hla_allele
0  OVA01-DN281    A*11:01
1   AUT01-DN13    A*11:01
2   AUT01-DN12    A*11:01
3   AUT01-DN05    A*11:01
4   AUT01-DN03    A*11:01

Sample Hits

The table of sample hits can be used to identify the samples in which peptides were observed:

In [7]:
In [8]:
Out [8]:
                            
sample_hits = pd.read_csv(base_url + 'sample_hits.tsv.gz', sep='\t')
sample_hits.head()
   peptide_sequence_id       donor         tissue hla_class
0                    1  AUT01-DN02           Lung    HLA-II
1                    1  AUT01-DN03  Adrenal gland    HLA-II
2                    1  AUT01-DN03          Aorta    HLA-II
3                    1  AUT01-DN03      Esophagus    HLA-II
4                    1  AUT01-DN03          Heart    HLA-II

For example, to obtain a list of all unique peptide sequences that were measured on gallbladder tissue, the following query can be used:

In [9]:
Out [9]:

                            
sample_hits[sample_hits.tissue=='Gallbladder'].merge(peptides).peptide_sequence.drop_duplicates()
0                     DESGPSIVHRK
1                    VSEKGTVQQADE
4               ISKQEYDESGPSIVHRK
5                     YASGRTTGIVM
8                      YASGRTTGIV
                  ...
2749               DPIIEDRHGGYKPS
2750    LVAGRSSDRVDGPASNLKQSGVVPF
2751                 AHKEVDPGTKTA
2752                DRPARHPQEQPLW
2753    SVDSGSSEEQGGSSRALVSTLVPLG
Name: peptide_sequence, Length: 2643, dtype: object

Protein Map

The protein map table maps every peptide sequence ID to it's corresponding source protein Uniprot IDs:

In [5]:
In [6]:
Out [6]:
                            
protein_map = pd.read_csv(base_url + 'protein_map.tsv.gz',sep='\t')
protein_map.head()
   peptide_sequence_id uniprot_id
0                    1     P04908
1                    1     P0C0S8
2                    1     P20671
3                    1     Q6FI13
4                    1     Q7L7L0

For example, to obtain a list of all peptides that were measured on a given protein with the respective tissues they were measured on, one can query the tables as follows:

In [1]:


Out [1]:

                            
protein_map[protein_map.uniprot_id=='Q9HBM6']\
.merge(sample_hits)\
.merge(peptides)[['tissue', 'peptide_sequence']].drop_duplicates()
             tissue     peptide_sequence
     Bone marrow     TPLPLIKPYAGPRLPP
          Kidney     TPLPLIKPYAGPRLPP
           Liver     TPLPLIKPYAGPRLPP
            Lung     TPLPLIKPYAGPRLPP
           Colon     TPLPLIKPYAGPRLPP
          Muscle     TPLPLIKPYAGPRLPP
            Skin     TPLPLIKPYAGPRLPP
Small intestine     TPLPLIKPYAGPRLPP
         Spleen     TPLPLIKPYAGPRLPP
        Thyroid     TPLPLIKPYAGPRLPP
        Trachea     TPLPLIKPYAGPRLPP
         Thymus     TPLPLIKPYAGPRLPP
    Bone marrow    TPLPLIKPYAGPRLPPD
          Colon    TPLPLIKPYAGPRLPPD
         Kidney    TPLPLIKPYAGPRLPPD
          Liver    TPLPLIKPYAGPRLPPD
           Lung    TPLPLIKPYAGPRLPPD
       Pancreas    TPLPLIKPYAGPRLPPD
        Trachea    TPLPLIKPYAGPRLPPD
         Muscle    TPLPLIKPYAGPRLPPD
Small intestine    TPLPLIKPYAGPRLPPD
         Spleen    TPLPLIKPYAGPRLPPD
         Thymus    TPLPLIKPYAGPRLPPD
         Thymus            QMLEFAFRY
         Kidney            APNYRLKSL
           Lung            APNYRLKSL
         Spleen            APNYRLKSL
     Cerebellum            APNYRLKSL
         Thymus            APNYRLKSL
        Bladder           ILKDMGITEY
           Lung           ILKDMGITEY
Small intestine           ILKDMGITEY
        Thyroid           ILKDMGITEY
      Esophagus           ILKDMGITEY
           Skin           ILKDMGITEY
         Spleen           ILKDMGITEY
        Trachea           ILKDMGITEY
          Colon           ILKDMGITEY
         Tongue           ILKDMGITEY
       Prostate           ILKDMGITEY
  Adrenal gland           ILKDMGITEY
    Bone marrow           ILKDMGITEY
         Kidney           ILKDMGITEY
          Liver           ILKDMGITEY
     Lymph node           ILKDMGITEY
         Uterus           ILKDMGITEY
         Thymus           ILKDMGITEY
         Kidney  TPLPLIKPYAGPRLPPDRY
         Spleen  TPLPLIKPYAGPRLPPDRY
         Kidney   TPLPLIKPYAGPRLPPDR
           Lung   TPLPLIKPYAGPRLPPDR
         Spleen   TPLPLIKPYAGPRLPPDR
          Colon   TPLPLIKPYAGPRLPPDR
Small intestine   TPLPLIKPYAGPRLPPDR
Small intestine  NQTPLPLIKPYAGPRLPPD
          Ovary            TANEANPLK