Genomics is the study of the structure, function, evolution, and mapping of genomes, or the complete set of genetic information for an organism. This field is critical for understanding the genetic basis of health and disease, and for developing new diagnostic and therapeutic tools. In this article, we will be exploring the 10 best datasets for genomic in 2023.
Dataset Name | Size | Download Link | Description |
---|---|---|---|
The Cancer Genome Atlas (TCGA) | 33 cancer types with over 10,000 samples | https://portal.gdc.cancer.gov/ | The Cancer Genome Atlas (TCGA) is a comprehensive genomic dataset that includes information from over 10,000 cancer samples from 33 different cancer types. |
1000 Genomes Project | Over 2,000 individuals from 26 populations | http://www.internationalgenome.org/data | The 1000 Genomes Project is a large-scale genomic sequencing project that aims to provide a detailed map of human genetic variation. It includes genomic data from over 2,000 individuals from 26 populations. |
Human Microbiome Project (HMP) | Over 11,000 samples from various body sites | https://www.hmpdacc.org/HMASM/ | The Human Microbiome Project (HMP) is a comprehensive dataset of human microbiome samples from various body sites, including gut, mouth, skin, and others. It includes over 11,000 samples. |
The Genotype-Tissue Expression (GTEx) Project | Over 50,000 samples from multiple tissues | https://gtexportal.org/home/datasets | The Genotype-Tissue Expression (GTEx) Project is a genomic dataset that includes information from over 50,000 samples from multiple tissues, including brain, heart, liver, and others. |
International HapMap Project | Over 1,000 individuals from multiple populations | http://hapmap.ncbi.nlm.nih.gov/downloads/ | The International HapMap Project is a large-scale genomic sequencing project that aims to provide a map of human genetic variation across multiple populations. It includes genomic data from over 1,000 individuals. |
The Exome Aggregation Consortium (ExAC) | Over 60,000 exomes from over 100,000 individuals | http://exac.broadinstitute.org/downloads | The Exome Aggregation Consortium (ExAC) is a comprehensive genomic dataset that includes information from over 60,000 exomes from over 100,000 individuals. It provides a resource for identifying the frequency and distribution of genetic variation in the human population. |
The Haplotype Reference Consortium (HRC) | Over 250,000 individuals | https://www.internationalgenome.org/data-portal/sample/HRC | The Haplotype Reference Consortium (HRC) is a genomic dataset that includes information from over 250,000 individuals. It provides a resource for understanding the distribution of genetic variation in the human population. |
The Genome of the Netherlands (GoNL) | Over 2,500 individuals from the Netherlands | https://molgenis24.target.rug.nl/geonl/ | The Genome of the Netherlands (GoNL) is a genomic dataset that includes information from over 2,500 individuals from the Netherlands. It provides a resource for understanding the distribution of genetic variation in the Dutch population. |
The UK Biobank | Over 500,000 individuals | https://www.ukbiobank.ac.uk/ | The UK Biobank is a large-scale genomic and health dataset that includes information from over 500,000 individuals. It provides a resource for understanding the relationship between genetic variation and disease, as well as the impact of lifestyle and environmental factors on health outcomes. |
The National Institute of Mental Health (NIMH) Genetics Repository | Over 10,000 individuals with psychiatric disorders | https://www.nimhgenetics.org/ | The National Institute of Mental Health (NIMH) Genetics Repository is a genomic dataset that includes information from over 10,000 individuals with psychiatric disorders, such as schizophrenia and bipolar disorder. It provides a resource for understanding the genetic basis of mental illness. |