Data Sources

The BMEG is an expanding resource of interconnected data.

TCGA

The Cancer Genome Atlas (TCGA) profiles the DNA, RNA, protein, and epigenetic levels of over 10,000 individuals across 33 cancer types. paper

GTEx

The Genotype-Tissue Expression (GTEx) project. paper

CCLE

The Cancer Cell Line Encyclopedia(CCLE): gene expression, chromosomal copy number and massively parallel sequencing data from 947 human cancer cell lines. paper

MC3

The Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files requiring analysis. paper

CTRP

Cancer Therapeutics Response Portal (CTRP) catalogues response profiles of 481 compounds against 860 cancer cell lines.

GDSC

The Genomics of Drug Sensitivity in Cancer (GDSC) database contains drug sensitivity data for almost 75 000 experiments, describing response to 138 anticancer drugs across almost 700 cancer cell lines. paper

Ensembl

The Ensembl project has been aggregating, processing, integrating and redistributing genomic datasets since the initial releases of the draft human genome, with the aim of accelerating genomics research through rapid open distribution of public data. paper

PFAM

PFAM describes over 13,000 protein families

Uniprot

The UniProt Knowledgebase (UniProtKB): protein sequence and function. paper

myvariant.info

MyVariant.info: gene and variant annotation information. paper

GO

Gene Ontology Consortium is a controlled vocabulary describing knowledge of gene and protein roles in cells. paper

PubMed

Publication information from almost 30 million articles

G2P

The VICC G2P is a framework for aggregating and harmonizing variant interpretations to produce a meta-knowledgebase of 12,856 aggregate interpretations covering 3,437 unique variants in 415 genes, 357 diseases, and 791 drugs. paper