Working with the HLA Ligand Atlas Data

The HLA Ligand Atlas data can be used freely under the CC-BY 4.0 license. You can export search results when browsing and filtering the data, but you can also download an entire data release from the Downloads page.

Besides the ZIP bundles provided on the Downloads page we also provide stable URLs to individual tables for every release that can be conveniently used programmatically. The URLs can be formed based on the schema

http://hla-ligand-atlas.org/rel/{releasename}/{tablename}.tsv.gz

where {tablename} can reference one of the following tables:

  • Peptides (tablename: peptides, keys: peptide_sequence_id)
    Maps peptide sequence IDs to unique peptid sequences
  • Donors (tablename: donors, keys: donor)
    Maps donors to donor HLA alleles
  • Sample Hits (tablename: sample_hits, keys: peptide_sequence_id, donor)
    Maps samples (triplets of donor, tissue and HLA class) to peptide sequence IDs
  • Protein Map (tablename: protein_map, keys: peptide_sequence_id)
    Maps peptide sequence IDs to source protein IDs (Uniprot IDs)
  • Aggregated peptide list (tablename: aggregated, keys: peptide_sequence_id)
    An aggregated list of peptides as shown in in the peptide browser. It contains one row per peptide in the database.

Peptides

A code example written in Python using the Pandas package which illustrates the structure of the peptides table is shown below:

In [1]:
In [2]:
In [3]:
In [4]:
Out [4]:
import pandas as pd
base_url = 'http://hla-ligand-atlas.org/rel/2020.04/'
peptides = pd.read_csv(base_url + 'peptides.tsv.gz', sep='\t')
peptides.head()
   peptide_sequence_id      peptide_sequence
0                    1       LLPKKTESHHKAKGK
1                    2  PKGKKAKGKKVAPAPAVVKK
2                    3  PEPAKSAPAPKKGSKKAVTK
3                    4      IKKWEKQVSQKKKQKN
4                    5         IKKWEKQVSQKKK

Donors

Similarly, the table of donors can be obtained:

In [5]:
In [6]:
Out [6]:
donors = pd.read_csv(base_url + 'donors.tsv.gz', sep='\t')
donors.head()
         donor hla_allele
0  OVA01-DN281    A*11:01
1   AUT01-DN13    A*11:01
2   AUT01-DN12    A*11:01
3   AUT01-DN05    A*11:01
4   AUT01-DN03    A*11:01

Sample Hits

The table of sample hits can be used to identify the samples in which peptides were observed:

In [7]:
In [8]:
Out [8]:
sample_hits = pd.read_csv(base_url + 'sample_hits.tsv.gz', sep='\t')
sample_hits.head()
   peptide_sequence_id       donor         tissue hla_class
0                    1  AUT01-DN02           Lung    HLA-II
1                    1  AUT01-DN03  Adrenal gland    HLA-II
2                    1  AUT01-DN03          Aorta    HLA-II
3                    1  AUT01-DN03      Esophagus    HLA-II
4                    1  AUT01-DN03          Heart    HLA-II

For example, to obtain a list of all unique peptide sequences that were measured on gallbladder tissue, the following query can be used:

In [9]:
Out [9]:
sample_hits[sample_hits.tissue=='Gallbladder'].merge(peptides).peptide_sequence.drop_duplicates()
0                     DESGPSIVHRK
1                    VSEKGTVQQADE
4               ISKQEYDESGPSIVHRK
5                     YASGRTTGIVM
8                      YASGRTTGIV
                  ...
2749               DPIIEDRHGGYKPS
2750    LVAGRSSDRVDGPASNLKQSGVVPF
2751                 AHKEVDPGTKTA
2752                DRPARHPQEQPLW
2753    SVDSGSSEEQGGSSRALVSTLVPLG
Name: peptide_sequence, Length: 2643, dtype: object

Protein Map

The protein map table maps every peptide sequence ID to it's corresponding source protein Uniprot IDs:

In [5]:
In [6]:
Out [6]:
protein_map = pd.read_csv(base_url + 'protein_map.tsv.gz',sep='\t')
protein_map.head()
   peptide_sequence_id uniprot_id
0                    1     P04908
1                    1     P0C0S8
2                    1     P20671
3                    1     Q6FI13
4                    1     Q7L7L0

For example, to obtain a list of all peptides that were measured on a given protein with the respective tissues they were measured on, one can query the tables as follows:

In [1]:


Out [1]:
protein_map[protein_map.uniprot_id=='Q9HBM6']\
.merge(sample_hits)\
.merge(peptides)[['tissue', 'peptide_sequence']].drop_duplicates()
             tissue     peptide_sequence
0       Bone marrow     TPLPLIKPYAGPRLPP
1            Kidney     TPLPLIKPYAGPRLPP
2             Liver     TPLPLIKPYAGPRLPP
3              Lung     TPLPLIKPYAGPRLPP
5             Colon     TPLPLIKPYAGPRLPP
8            Muscle     TPLPLIKPYAGPRLPP
9              Skin     TPLPLIKPYAGPRLPP
10  Small intestine     TPLPLIKPYAGPRLPP
11           Spleen     TPLPLIKPYAGPRLPP
12          Thyroid     TPLPLIKPYAGPRLPP
13          Trachea     TPLPLIKPYAGPRLPP
14           Thymus     TPLPLIKPYAGPRLPP
19      Bone marrow    TPLPLIKPYAGPRLPPD
20            Colon    TPLPLIKPYAGPRLPPD
21           Kidney    TPLPLIKPYAGPRLPPD
22            Liver    TPLPLIKPYAGPRLPPD
23             Lung    TPLPLIKPYAGPRLPPD
24         Pancreas    TPLPLIKPYAGPRLPPD
25          Trachea    TPLPLIKPYAGPRLPPD
30           Muscle    TPLPLIKPYAGPRLPPD
31  Small intestine    TPLPLIKPYAGPRLPPD
32           Spleen    TPLPLIKPYAGPRLPPD
34           Thymus    TPLPLIKPYAGPRLPPD
39           Thymus            QMLEFAFRY
40           Kidney            APNYRLKSL
41             Lung            APNYRLKSL
43           Spleen            APNYRLKSL
44       Cerebellum            APNYRLKSL
45           Thymus            APNYRLKSL
47          Bladder           ILKDMGITEY
48             Lung           ILKDMGITEY
49  Small intestine           ILKDMGITEY
50          Thyroid           ILKDMGITEY
51        Esophagus           ILKDMGITEY
52             Skin           ILKDMGITEY
54           Spleen           ILKDMGITEY
56          Trachea           ILKDMGITEY
58            Colon           ILKDMGITEY
60           Tongue           ILKDMGITEY
61         Prostate           ILKDMGITEY
62    Adrenal gland           ILKDMGITEY
63      Bone marrow           ILKDMGITEY
64           Kidney           ILKDMGITEY
65            Liver           ILKDMGITEY
67       Lymph node           ILKDMGITEY
73           Uterus           ILKDMGITEY
74           Thymus           ILKDMGITEY
75           Kidney  TPLPLIKPYAGPRLPPDRY
76           Spleen  TPLPLIKPYAGPRLPPDRY
77           Kidney   TPLPLIKPYAGPRLPPDR
78             Lung   TPLPLIKPYAGPRLPPDR
79           Spleen   TPLPLIKPYAGPRLPPDR
80            Colon   TPLPLIKPYAGPRLPPDR
82  Small intestine   TPLPLIKPYAGPRLPPDR
84  Small intestine  NQTPLPLIKPYAGPRLPPD
85            Ovary            TANEANPLK