Working with the HLA Ligand Atlas Data
The HLA Ligand Atlas data can be used freely under the CC-BY 4.0 license. You can export search results when browsing and filtering the data, but you can also download an entire data release from the Downloads page.
Besides the ZIP bundles provided on the Downloads page we also provide stable URLs to individual tables for every release that can be conveniently used programmatically. The URLs can be formed based on the schema
http://hla-ligand-atlas.org/rel/{releasename}/{tablename}.tsv.gz
where {tablename} can reference one of the following tables:
- Peptides (tablename: peptides, keys: peptide_sequence_id)
Maps peptide sequence IDs to unique peptid sequences - Donors (tablename: donors, keys: donor)
Maps donors to donor HLA alleles - Sample Hits (tablename: sample_hits, keys: peptide_sequence_id, donor)
Maps samples (triplets of donor, tissue and HLA class) to peptide sequence IDs - Protein Map (tablename: protein_map, keys: peptide_sequence_id)
Maps peptide sequence IDs to source protein IDs (Uniprot IDs) - Aggregated peptide list (tablename: aggregated, keys: peptide_sequence_id)
An aggregated list of peptides as shown in in the peptide browser. It contains one row per peptide in the database.
Peptides
A code example written in Python using the Pandas package which illustrates the structure of the peptides table is shown below:
import pandas as pd
base_url = 'http://hla-ligand-atlas.org/rel/2020.12/'
peptides = pd.read_csv(base_url + 'peptides.tsv.gz', sep='\t')
peptides.head()
peptide_sequence_id peptide_sequence
0 1 LLPKKTESHHKAKGK
1 2 PKGKKAKGKKVAPAPAVVKK
2 3 PEPAKSAPAPKKGSKKAVTK
3 4 IKKWEKQVSQKKKQKN
4 5 IKKWEKQVSQKKK
Donors
Similarly, the table of donors can be obtained:
donors = pd.read_csv(base_url + 'donors.tsv.gz', sep='\t')
donors.head()
donor hla_allele
0 OVA01-DN281 A*11:01
1 AUT01-DN13 A*11:01
2 AUT01-DN12 A*11:01
3 AUT01-DN05 A*11:01
4 AUT01-DN03 A*11:01
Sample Hits
The table of sample hits can be used to identify the samples in which peptides were observed:
sample_hits = pd.read_csv(base_url + 'sample_hits.tsv.gz', sep='\t')
sample_hits.head()
peptide_sequence_id donor tissue hla_class
0 1 AUT01-DN02 Lung HLA-II
1 1 AUT01-DN03 Adrenal gland HLA-II
2 1 AUT01-DN03 Aorta HLA-II
3 1 AUT01-DN03 Esophagus HLA-II
4 1 AUT01-DN03 Heart HLA-II
For example, to obtain a list of all unique peptide sequences that were measured on gallbladder tissue, the following query can be used:
sample_hits[sample_hits.tissue=='Gallbladder'].merge(peptides).peptide_sequence.drop_duplicates()
0 DESGPSIVHRK
1 VSEKGTVQQADE
4 ISKQEYDESGPSIVHRK
5 YASGRTTGIVM
8 YASGRTTGIV
...
2749 DPIIEDRHGGYKPS
2750 LVAGRSSDRVDGPASNLKQSGVVPF
2751 AHKEVDPGTKTA
2752 DRPARHPQEQPLW
2753 SVDSGSSEEQGGSSRALVSTLVPLG
Name: peptide_sequence, Length: 2643, dtype: object
Protein Map
The protein map table maps every peptide sequence ID to it's corresponding source protein Uniprot IDs:
protein_map = pd.read_csv(base_url + 'protein_map.tsv.gz',sep='\t')
protein_map.head()
peptide_sequence_id uniprot_id
0 1 P04908
1 1 P0C0S8
2 1 P20671
3 1 Q6FI13
4 1 Q7L7L0
For example, to obtain a list of all peptides that were measured on a given protein with the respective tissues they were measured on, one can query the tables as follows:
protein_map[protein_map.uniprot_id=='Q9HBM6']\
.merge(sample_hits)\
.merge(peptides)[['tissue', 'peptide_sequence']].drop_duplicates()
tissue peptide_sequence
0 Bone marrow TPLPLIKPYAGPRLPP
1 Kidney TPLPLIKPYAGPRLPP
2 Liver TPLPLIKPYAGPRLPP
3 Lung TPLPLIKPYAGPRLPP
5 Colon TPLPLIKPYAGPRLPP
8 Muscle TPLPLIKPYAGPRLPP
9 Skin TPLPLIKPYAGPRLPP
10 Small intestine TPLPLIKPYAGPRLPP
11 Spleen TPLPLIKPYAGPRLPP
12 Thyroid TPLPLIKPYAGPRLPP
13 Trachea TPLPLIKPYAGPRLPP
14 Thymus TPLPLIKPYAGPRLPP
19 Bone marrow TPLPLIKPYAGPRLPPD
20 Colon TPLPLIKPYAGPRLPPD
21 Kidney TPLPLIKPYAGPRLPPD
22 Liver TPLPLIKPYAGPRLPPD
23 Lung TPLPLIKPYAGPRLPPD
24 Pancreas TPLPLIKPYAGPRLPPD
25 Trachea TPLPLIKPYAGPRLPPD
30 Muscle TPLPLIKPYAGPRLPPD
31 Small intestine TPLPLIKPYAGPRLPPD
32 Spleen TPLPLIKPYAGPRLPPD
34 Thymus TPLPLIKPYAGPRLPPD
39 Thymus QMLEFAFRY
40 Kidney APNYRLKSL
41 Lung APNYRLKSL
43 Spleen APNYRLKSL
44 Cerebellum APNYRLKSL
45 Thymus APNYRLKSL
47 Bladder ILKDMGITEY
48 Lung ILKDMGITEY
49 Small intestine ILKDMGITEY
50 Thyroid ILKDMGITEY
51 Esophagus ILKDMGITEY
52 Skin ILKDMGITEY
54 Spleen ILKDMGITEY
56 Trachea ILKDMGITEY
58 Colon ILKDMGITEY
60 Tongue ILKDMGITEY
61 Prostate ILKDMGITEY
62 Adrenal gland ILKDMGITEY
63 Bone marrow ILKDMGITEY
64 Kidney ILKDMGITEY
65 Liver ILKDMGITEY
67 Lymph node ILKDMGITEY
73 Uterus ILKDMGITEY
74 Thymus ILKDMGITEY
75 Kidney TPLPLIKPYAGPRLPPDRY
76 Spleen TPLPLIKPYAGPRLPPDRY
77 Kidney TPLPLIKPYAGPRLPPDR
78 Lung TPLPLIKPYAGPRLPPDR
79 Spleen TPLPLIKPYAGPRLPPDR
80 Colon TPLPLIKPYAGPRLPPDR
82 Small intestine TPLPLIKPYAGPRLPPDR
84 Small intestine NQTPLPLIKPYAGPRLPPD
85 Ovary TANEANPLK