Data Sources

The BMEG is an expanding resource of interconnected data.


The Cancer Genome Atlas (TCGA) profiles the DNA, RNA, protein, and epigenetic levels of over 10,000 individuals across 33 cancer types. paper


The Genotype-Tissue Expression (GTEx) project. paper


The Cancer Cell Line Encyclopedia(CCLE): gene expression, chromosomal copy number and massively parallel sequencing data from 947 human cancer cell lines. paper


The Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files requiring analysis. paper


Cancer Therapeutics Response Portal (CTRP) catalogues response profiles of 481 compounds against 860 cancer cell lines.


The Genomics of Drug Sensitivity in Cancer (GDSC) database contains drug sensitivity data for almost 75 000 experiments, describing response to 138 anticancer drugs across almost 700 cancer cell lines. paper


The Ensembl project has been aggregating, processing, integrating and redistributing genomic datasets since the initial releases of the draft human genome, with the aim of accelerating genomics research through rapid open distribution of public data. paper


PFAM describes over 13,000 protein families


The UniProt Knowledgebase (UniProtKB): protein sequence and function. paper gene and variant annotation information. paper


Gene Ontology Consortium is a controlled vocabulary describing knowledge of gene and protein roles in cells. paper


Publication information from almost 30 million articles


The VICC G2P is a framework for aggregating and harmonizing variant interpretations to produce a meta-knowledgebase of 12,856 aggregate interpretations covering 3,437 unique variants in 415 genes, 357 diseases, and 791 drugs. paper