Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 89
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-33103450

RESUMO

During nutritional overload and obesity, hepatocyte function is grossly altered, and a subset of hepatocytes begins to accumulate fat droplets, leading to non-alcoholic fatty liver disease (NAFLD). Recent single cell studies revealed how non-parenchymal cells, such as macrophages, hepatic stellate cells, and endothelial cells, heterogeneously respond to NAFLD. However, it remains to be characterized how hepatocytes, the major constituents of the liver, respond to nutritional overload in NAFLD. Here, using droplet-based single cell RNA-sequencing (Drop-seq), we characterized how the transcriptomic landscape of individual hepatocytes is altered in response to high-fat diet (HFD) and NAFLD. We showed that entire hepatocytes population undergoes substantial transcriptome changes upon HFD, although the patterns of alteration were highly heterogeneous with zonation-dependent and -independent effects. Periportal (zone 1) hepatocytes downregulated many zone 1-specific marker genes, while a small number of genes mediating gluconeogenesis were upregulated. Pericentral (zone 3) hepatocytes also downregulated many zone 3-specific genes; however, they upregulated several genes that promote HFD-induced fat droplet formation, consistent with findings that zone 3 hepatocytes accumulate more lipid droplets. Zone 3 hepatocytes also upregulated ketogenic pathways as an adaptive mechanism to HFD. Interestingly, many of the top HFD-induced genes, which encode proteins regulating lipid metabolism, were strongly co-expressed with each other in a subset of hepatocytes, producing a variegated pattern of spatial co-localization that is independent of metabolic zonation. In conclusion, our dataset provides a useful resource for understanding hepatocellular alteration during NAFLD at single cell level.

2.
Nat Commun ; 11(1): 5139, 2020 10 12.
Artigo em Inglês | MEDLINE | ID: mdl-33046696

RESUMO

Coronavirus disease 2019 (COVID-19) is caused by SARS-CoV-2, an emerging virus that utilizes host proteins ACE2 and TMPRSS2 as entry factors. Understanding the factors affecting the pattern and levels of expression of these genes is important for deeper understanding of SARS-CoV-2 tropism and pathogenesis. Here we explore the role of genetics and co-expression networks in regulating these genes in the airway, through the analysis of nasal airway transcriptome data from 695 children. We identify expression quantitative trait loci for both ACE2 and TMPRSS2, that vary in frequency across world populations. We find TMPRSS2 is part of a mucus secretory network, highly upregulated by type 2 (T2) inflammation through the action of interleukin-13, and that the interferon response to respiratory viruses highly upregulates ACE2 expression. IL-13 and virus infection mediated effects on ACE2 expression were also observed at the protein level in the airway epithelium. Finally, we define airway responses to common coronavirus infections in children, finding that these infections generate host responses similar to other viral species, including upregulation of IL6 and ACE2. Our results reveal possible mechanisms influencing SARS-CoV-2 infectivity and COVID-19 clinical outcomes.


Assuntos
Betacoronavirus/fisiologia , Infecções por Coronavirus/virologia , Interferons/metabolismo , Interleucina-13/metabolismo , Mucosa Nasal/patologia , Peptidil Dipeptidase A/genética , Pneumonia Viral/virologia , Serina Endopeptidases/genética , Criança , Infecções por Coronavirus/metabolismo , Infecções por Coronavirus/patologia , Células Epiteliais/metabolismo , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Variação Genética , Interações Hospedeiro-Patógeno , Humanos , Inflamação , Pessoa de Meia-Idade , Mucosa Nasal/metabolismo , Pandemias , Peptidil Dipeptidase A/metabolismo , Pneumonia Viral/metabolismo , Pneumonia Viral/patologia , Serina Endopeptidases/metabolismo , Internalização do Vírus
4.
Gastroenterology ; 2020 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-33058866

RESUMO

BACKGROUND AND AIMS: Susceptibility genes and the underlying mechanisms for the majority of risk loci identified by genome-wide association studies (GWAS) for colorectal cancer (CRC) risk remain largely unknown. We conducted a transcriptome-wide association study (TWAS) to identify putative susceptibility genes. METHODS: Gene-expression prediction models were built using transcriptome and genetic data from the 284 normal transverse colon tissues of European descendants from the Genotype-Tissue Expression (GTEx), and model performance was evaluated using data from The Cancer Genome Atlas (TCGA, n = 355). We applied the gene-expression prediction models and GWAS data to evaluate associations of genetically predicted gene-expression with CRC risk in 58,131 CRC cases and 67,347 controls of European ancestry. Dual-luciferase reporter assays and knockdown experiments in CRC cells and tumor xenografts were conducted. RESULTS: We identified 25 genes associated with CRC risk at a Bonferroni-corrected threshold of P < 9.1 × 10-6, including genes in four novel loci, PYGL (14q22.1), RPL28 (19q13.42), CAPN12 (19q13.2), MYH7B (20q11.22), and MAP1L3CA (20q11.22). In nine known GWAS-identified loci, we uncovered nine genes that have not been previously reported, whereas four genes remained statistically significant after adjusting for the lead risk variant of the locus. Through colocalization analysis in GWAS loci, we additionally identified 12 putative susceptibility genes that were supported by TWAS analysis at P < 0.01. We showed that risk allele of the lead risk variant rs1741640 affected the promoter activity of CABLES2. Knockdown experiments confirmed that CABLES2 plays a vital role in colorectal carcinogenesis. CONCLUSION: Our study reveals new putative susceptibility genes and provides new insight into the biological mechanisms underlying CRC development.

5.
Nat Commun ; 11(1): 4093, 2020 10 23.
Artigo em Inglês | MEDLINE | ID: mdl-33097703

RESUMO

A major challenge in genetic association studies is that most associated variants fall in the non-coding part of the human genome. We searched for variants associated with bone mineral density (BMD) after enriching the discovery cohort for loss-of-function (LoF) mutations by sequencing a subset of the Nord-Trøndelag Health Study, followed by imputation in the remaining sample (N = 19,705), and identified ten known BMD loci. However, one previously unreported variant, LoF mutation in MEPE, p.(Lys70IlefsTer26, minor allele frequency [MAF] = 0.8%), was associated with decreased ultradistal forearm BMD (P-value = 2.1 × 10-18), and increased osteoporosis (P-value = 4.2 × 10-5) and fracture risk (P-value = 1.6 × 10-5). The MEPE LoF association with BMD and fractures was further evaluated in 279,435 UK (MAF = 0.05%, heel bone estimated BMD P-value = 1.2 × 10-16, any fracture P-value = 0.05) and 375,984 Icelandic samples (MAF = 0.03%, arm BMD P-value = 0.12, forearm fracture P-value = 0.005). Screening for the MEPE LoF mutations before adulthood could potentially prevent osteoporosis and fractures due to the lifelong effect on BMD observed in the study. A key implication for precision medicine is that high-impact functional variants missing from the publicly available cosmopolitan panels could be clinically more relevant than polygenic risk scores.


Assuntos
Densidade Óssea/genética , Proteínas da Matriz Extracelular/genética , Fraturas Ósseas/genética , Estudos de Associação Genética , Predisposição Genética para Doença/genética , Glicoproteínas/genética , Fosfoproteínas/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Estudos de Coortes , Biologia Computacional , Feminino , Frequência do Gene , Testes Genéticos , Genoma Humano , Humanos , Islândia , Masculino , Pessoa de Meia-Idade , Osteoporose/genética
6.
Artigo em Inglês | MEDLINE | ID: mdl-32966749

RESUMO

RATIONALE: The 17q12-21.1 locus is one of the most highly replicated genetic associations with asthma. Individuals of African descent have lower LD in this region, which could facilitate identifying causal variants. OBJECTIVE: To identify functional variants at 17q12-21.1 associated with early-onset asthma among African American individuals. METHODS AND MEASUREMENTS: We evaluated African American participants from the Study of Asthma Phenotypes and Pharmacogenomic Interactions by Race-ethnicity (SAPPHIRE) (n=1,940), the Study of African Americans, Asthma, Genes & Environment (SAGE II) (n=885), and Study of the Genetic Causes of Complex Pediatric Disorders - Asthma (GCPD-A) (n=2,805). Associations with asthma onset at age <5 years were meta-analyzed across cohorts. The lead signal was reevaluated considering haplotypes informed by genetic ancestry (i.e., African vs. European). Both an expression quantitative trait locus (eQTL) analysis and phenome-wide association study (PheWAS) were performed on the lead variant. MAIN RESULTS: The meta-analyzed results from SAPPHIRE, SAGE II, and GCPD-A identified rs11078928 as the top association for early-onset asthma. A haplotype analysis suggested that the asthma association partitioned most closely with rs11078928 genotype. Genetic ancestry did not appear to influence the effect of this variant. In the eQTL analysis, rs11078928 was related to alternative splicing of gasdermin-B (GSDMB) transcripts. The PheWAS of rs11078928 suggested that this variant was predominantly associated with asthma and asthma-associated symptoms. CONCLUSIONS: A splice acceptor polymorphism appears to be a causal variant for asthma at the 17q12-21.1 locus. This variant appears to have the same magnitude of effect in individuals of African and European descent.

7.
Cell Rep ; 32(8): 108077, 2020 Aug 25.
Artigo em Inglês | MEDLINE | ID: mdl-32846134

RESUMO

DNA damage often induces heterogeneous cell-fate responses, such as cell-cycle arrest and apoptosis. Through single-cell RNA sequencing (scRNA-seq), we characterize the transcriptome response of cultured colon cancer cell lines to 5-fluorouracil (5FU)-induced DNA damage. After 5FU treatment, a single population of colon cancer cells adopts three distinct transcriptome phenotypes, which correspond to diversified cell-fate responses: apoptosis, cell-cycle checkpoint, and stress resistance. Although some genes are regulated uniformly across all groups of cells, many genes showed group-specific expression patterns mediating DNA damage responses specific to the corresponding cell fate. Some of these observations are reproduced at the protein level by flow cytometry and are replicated in cells treated with other 5FU-unrelated genotoxic drugs, camptothecin and etoposide. This work provides a resource for understanding heterogeneous DNA damage responses involving fractional killing and chemoresistance, which are among the major challenges in current cancer chemotherapy.

8.
Genetics ; 215(3): 869-886, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-32327564

RESUMO

Baseline lung function, quantified as forced expiratory volume in the first second of exhalation (FEV1), is a standard diagnostic criterion used by clinicians to identify and classify lung diseases. Using whole-genome sequencing data from the National Heart, Lung, and Blood Institute Trans-Omics for Precision Medicine project, we identified a novel genetic association with FEV1 on chromosome 12 in 867 African American children with asthma (P = 1.26 × 10-8, ß = 0.302). Conditional analysis within 1 Mb of the tag signal (rs73429450) yielded one major and two other weaker independent signals within this peak. We explored statistical and functional evidence for all variants in linkage disequilibrium with the three independent signals and yielded nine variants as the most likely candidates responsible for the association with FEV1 Hi-C data and expression QTL analysis demonstrated that these variants physically interacted with KITLG (KIT ligand, also known as SCF), and their minor alleles were associated with increased expression of the KITLG gene in nasal epithelial cells. Gene-by-air-pollution interaction analysis found that the candidate variant rs58475486 interacted with past-year ambient sulfur dioxide exposure (P = 0.003, ß = 0.32). This study identified a novel protective genetic association with FEV1, possibly mediated through KITLG, in African American children with asthma. This is the first study that has identified a genetic association between lung function and KITLG, which has established a role in orchestrating allergic inflammation in asthma.

9.
Genome Res ; 30(2): 185-194, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31980570

RESUMO

Detecting and estimating DNA sample contamination are important steps to ensure high-quality genotype calls and reliable downstream analysis. Existing methods rely on population allele frequency information for accurate estimation of contamination rates. Correctly specifying population allele frequencies for each individual in early stage of sequence analysis is impractical or even impossible for large-scale sequencing centers that simultaneously process samples from multiple studies across diverse populations. On the other hand, incorrectly specified allele frequencies may result in substantial bias in estimated contamination rates. For example, we observed that existing methods often fail to identify 10% contaminated samples at a typical 3% contamination exclusion threshold when genetic ancestry is misspecified. Such an incomplete screening of contaminated samples substantially inflates the estimated rate of genotyping errors even in deeply sequenced genomes and exomes. We propose a robust statistical method that accurately estimates DNA contamination and is agnostic to genetic ancestry of the intended or contaminating sample. Our method integrates the estimation of genetic ancestry and DNA contamination in a unified likelihood framework by leveraging individual-specific allele frequencies projected from reference genotypes onto principal component coordinates. Our method can also be used for estimating genetic ancestries, similar to LASER or TRACE, but simultaneously accounting for potential contamination. We demonstrate that our method robustly estimates contamination rates and genetic ancestries across populations and contamination scenarios. We further demonstrate that, in the presence of contamination, genetic ancestry inference can be substantially biased with existing methods that ignore contamination, while our method corrects for such biases.

10.
Pediatr Pulmonol ; 55(2): 533-540, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31665830

RESUMO

BACKGROUND: In cystic fibrosis (CF), the spectrum and frequency of CFTR variants differ by geography and race/ethnicity. CFTR variants in White patients are well-described compared with Latino patients. No studies of CFTR variants have been done in patients with CF in the Dominican Republic or Puerto Rico. METHODS: CFTR was sequenced in 61 Dominican Republican patients and 21 Puerto Rican patients with CF and greater than ​​​​60 mmol/L sweat chloride. The spectrum of CFTR variants was identified and the proportion of patients with 0, 1, or 2 CFTR variants identified was determined. The functional effects of identified CFTR variants were investigated using clinical annotation databases and computational prediction tools. RESULTS: Our study found 10% of Dominican patients had two CFTR variants identified compared with 81% of Puerto Rican patients. No CFTR variants were identified in 69% of Dominican patients and 10% of Puerto Rican patients. In Dominican patients, there were 19 identified CFTR variants, accounting for 25 out of 122 disease alleles (20%). In Puerto Rican patients, there were 16 identified CFTR variants, accounting for 36 out of 42 disease alleles (86%) in Puerto Rican patients. Thirty CFTR variants were identified overall. The most frequent variants for Dominican patients were p.Phe508del and p.Ala559Thr and for Puerto Rican patients were p.Phe508del, p.Arg1066Cys, p.Arg334Trp, and p.I507del. CONCLUSIONS: In this first description of the CFTR variants in patients with CF from the Dominican Republic and Puerto Rico, there was a low detection rate of two CFTR variants after full sequencing with the majority of patients from the Dominican Republic without identified variants.

11.
Nat Rev Nephrol ; 15(9): 590, 2019 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-31363178

RESUMO

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

14.
Nature ; 570(7759): 71-76, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31118516

RESUMO

Protein-coding genetic variants that strongly affect disease risk can yield relevant clues to disease pathogenesis. Here we report exome-sequencing analyses of 20,791 individuals with type 2 diabetes (T2D) and 24,440 non-diabetic control participants from 5 ancestries. We identify gene-level associations of rare variants (with minor allele frequencies of less than 0.5%) in 4 genes at exome-wide significance, including a series of more than 30 SLC30A8 alleles that conveys protection against T2D, and in 12 gene sets, including those corresponding to T2D drug targets (P = 6.1 × 10-3) and candidate genes from knockout mice (P = 5.2 × 10-3). Within our study, the strongest T2D gene-level signals for rare variants explain at most 25% of the heritability of the strongest common single-variant signals, and the gene-level effect sizes of the rare variants that we observed in established T2D drug targets will require 75,000-185,000 sequenced cases to achieve exome-wide significance. We propose a method to interpret these modest rare-variant associations and to incorporate these associations into future target or gene prioritization efforts.


Assuntos
Diabetes Mellitus Tipo 2/genética , Exoma/genética , Sequenciamento Completo do Exoma , Animais , Estudos de Casos e Controles , Técnicas de Apoio para a Decisão , Feminino , Frequência do Gene , Estudo de Associação Genômica Ampla , Humanos , Masculino , Camundongos , Camundongos Knockout
15.
Nat Commun ; 10(1): 1847, 2019 04 23.
Artigo em Inglês | MEDLINE | ID: mdl-31015462

RESUMO

Chronic kidney disease (CKD) is a growing health burden currently affecting 10-15% of adults worldwide. Estimated glomerular filtration rate (eGFR) as a marker of kidney function is commonly used to diagnose CKD. We analyze eGFR data from the Nord-Trøndelag Health Study and Michigan Genomics Initiative and perform a GWAS meta-analysis with public summary statistics, more than doubling the sample size of previous meta-analyses. We identify 147 loci (53 novel) associated with eGFR, including genes involved in transcriptional regulation, kidney development, cellular signaling, metabolism, and solute transport. Additionally, sex-stratified analysis identifies one locus with more significant effects in women than men. Using genetic risk scores constructed from these eGFR meta-analysis results, we show that associated variants are generally predictive of CKD with only modest improvements in detection compared with other known clinical risk factors. Collectively, these results yield additional insight into the genetic factors underlying kidney function and progression to CKD.


Assuntos
Loci Gênicos , Estudo de Associação Genômica Ampla , Taxa de Filtração Glomerular/genética , Insuficiência Renal Crônica/genética , Feminino , Carga Global da Doença , Humanos , Rim/fisiopatologia , Masculino , Prognóstico , Insuficiência Renal Crônica/diagnóstico , Insuficiência Renal Crônica/epidemiologia , Insuficiência Renal Crônica/fisiopatologia , Medição de Risco/métodos , Fatores de Risco , Fatores Sexuais
16.
Hum Genet ; 138(4): 307-326, 2019 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-30820706

RESUMO

Genome-wide association studies have reported 56 independently associated colorectal cancer (CRC) risk variants, most of which are non-coding and believed to exert their effects by modulating gene expression. The computational method PrediXcan uses cis-regulatory variant predictors to impute expression and perform gene-level association tests in GWAS without directly measured transcriptomes. In this study, we used reference datasets from colon (n = 169) and whole blood (n = 922) transcriptomes to test CRC association with genetically determined expression levels in a genome-wide analysis of 12,186 cases and 14,718 controls. Three novel associations were discovered from colon transverse models at FDR ≤ 0.2 and further evaluated in an independent replication including 32,825 cases and 39,933 controls. After adjusting for multiple comparisons, we found statistically significant associations using colon transcriptome models with TRIM4 (discovery P = 2.2 × 10- 4, replication P = 0.01), and PYGL (discovery P = 2.3 × 10- 4, replication P = 6.7 × 10- 4). Interestingly, both genes encode proteins that influence redox homeostasis and are related to cellular metabolic reprogramming in tumors, implicating a novel CRC pathway linked to cell growth and proliferation. Defining CRC risk regions as one megabase up- and downstream of one of the 56 independent risk variants, we defined 44 non-overlapping CRC-risk regions. Among these risk regions, we identified genes associated with CRC (P < 0.05) in 34/44 CRC-risk regions. Importantly, CRC association was found for two genes in the previously reported 2q25 locus, CXCR1 and CXCR2, which are potential cancer therapeutic targets. These findings provide strong candidate genes to prioritize for subsequent laboratory follow-up of GWAS loci. This study is the first to implement PrediXcan in a large colorectal cancer study and findings highlight the utility of integrating transcriptome data in GWAS for discovery of, and biological insight into, risk loci.


Assuntos
Neoplasias Colorretais/diagnóstico , Neoplasias Colorretais/genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Estudos de Casos e Controles , Neoplasias Colorretais/epidemiologia , Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Frequência do Gene , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Valor Preditivo dos Testes , Prognóstico , Fatores de Risco
17.
Plant Physiol ; 179(4): 1444-1456, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30718350

RESUMO

Single-cell RNA sequencing (scRNA-seq) has been used extensively to study cell-specific gene expression in animals, but it has not been widely applied to plants. Here, we describe the use of a commercially available droplet-based microfluidics platform for high-throughput scRNA-seq to obtain single-cell transcriptomes from protoplasts of more than 10,000 Arabidopsis (Arabidopsis thaliana) root cells. We find that all major tissues and developmental stages are represented in this single-cell transcriptome population. Further, distinct subpopulations and rare cell types, including putative quiescent center cells, were identified. A focused analysis of root epidermal cell transcriptomes defined developmental trajectories for individual cells progressing from meristematic through mature stages of root-hair and nonhair cell differentiation. In addition, single-cell transcriptomes were obtained from root epidermis mutants, enabling a comparative analysis of gene expression at single-cell resolution and providing an unprecedented view of the impact of the mutated genes. Overall, this study demonstrates the feasibility and utility of scRNA-seq in plants and provides a first-generation gene expression map of the Arabidopsis root at single-cell resolution.


Assuntos
Arabidopsis/metabolismo , Raízes de Plantas/metabolismo , Análise de Célula Única , Transcriptoma , Arabidopsis/citologia , Estudos de Viabilidade , Epiderme Vegetal/metabolismo , Raízes de Plantas/citologia , Protoplastos/metabolismo , Análise de Sequência de RNA
18.
Bioinformatics ; 35(1): 164-166, 2019 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-30204848

RESUMO

Summary: Estimating linkage disequilibrium (LD) is essential for a wide range of summary statistics-based association methods for genome-wide association studies. Large genetic datasets, e.g. the TOPMed WGS project and UK Biobank, enable more accurate and comprehensive LD estimates, but increase the computational burden of LD estimation. Here, we describe emeraLD (Efficient Methods for Estimation and Random Access of LD), a computational tool that leverages sparsity and haplotype structure to estimate LD up to 2 orders of magnitude faster than current tools. Availability and implementation: emeraLD is implemented in C++, and is open source under GPLv3. Source code and documentation are freely available at http://github.com/statgen/emeraLD. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Estudo de Associação Genômica Ampla , Desequilíbrio de Ligação , Software , Biologia Computacional , Haplótipos
19.
Nat Genet ; 51(1): 76-87, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30510241

RESUMO

To further dissect the genetic architecture of colorectal cancer (CRC), we performed whole-genome sequencing of 1,439 cases and 720 controls, imputed discovered sequence variants and Haplotype Reference Consortium panel variants into genome-wide association study data, and tested for association in 34,869 cases and 29,051 controls. Findings were followed up in an additional 23,262 cases and 38,296 controls. We discovered a strongly protective 0.3% frequency variant signal at CHD1. In a combined meta-analysis of 125,478 individuals, we identified 40 new independent signals at P < 5 × 10-8, bringing the number of known independent signals for CRC to ~100. New signals implicate lower-frequency variants, Krüppel-like factors, Hedgehog signaling, Hippo-YAP signaling, long noncoding RNAs and somatic drivers, and support a role for immune function. Heritability analyses suggest that CRC risk is highly polygenic, and larger, more comprehensive studies enabling rare variant analysis will improve understanding of biology underlying this risk and influence personalized screening strategies and drug development.


Assuntos
Neoplasias Colorretais/genética , Predisposição Genética para Doença/genética , Polimorfismo de Nucleotídeo Único/genética , Idoso , Estudos de Casos e Controles , Feminino , Estudo de Associação Genômica Ampla/métodos , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , RNA Longo não Codificante/genética , Fatores de Risco , Transdução de Sinais/genética
20.
Nat Commun ; 9(1): 4038, 2018 10 02.
Artigo em Inglês | MEDLINE | ID: mdl-30279509

RESUMO

Hundreds of thousands of human whole genome sequencing (WGS) datasets will be generated over the next few years. These data are more valuable in aggregate: joint analysis of genomes from many sources increases sample size and statistical power. A central challenge for joint analysis is that different WGS data processing pipelines cause substantial differences in variant calling in combined datasets, necessitating computationally expensive reprocessing. This approach is no longer tenable given the scale of current studies and data volumes. Here, we define WGS data processing standards that allow different groups to produce functionally equivalent (FE) results, yet still innovate on data processing pipelines. We present initial FE pipelines developed at five genome centers and show that they yield similar variant calling results and produce significantly less variability than sequencing replicates. This work alleviates a key technical bottleneck for genome aggregation and helps lay the foundation for community-wide human genetics studies.


Assuntos
Genética Humana/normas , Sequenciamento Completo do Genoma/normas , Genoma Humano , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA