Pesquisa | Portal Regional da BVS

1.

Insights from incorporating quantum computing into drug design workflows.

Lau, Bayo; Emani, Prashant S; Chapman, Jackson; Yao, Lijing; Lam, Tarsus; Merrill, Paul; Warrell, Jonathan; Gerstein, Mark B; Lam, Hugo Y K.

Bioinformatics ; 39(1)2023 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-36477833

RESUMO

MOTIVATION: While many quantum computing (QC) methods promise theoretical advantages over classical counterparts, quantum hardware remains limited. Exploiting near-term QC in computer-aided drug design (CADD) thus requires judicious partitioning between classical and quantum calculations. RESULTS: We present HypaCADD, a hybrid classical-quantum workflow for finding ligands binding to proteins, while accounting for genetic mutations. We explicitly identify modules of our drug-design workflow currently amenable to replacement by QC: non-intuitively, we identify the mutation-impact predictor as the best candidate. HypaCADD thus combines classical docking and molecular dynamics with quantum machine learning (QML) to infer the impact of mutations. We present a case study with the coronavirus (SARS-CoV-2) protease and associated mutants. We map a classical machine-learning module onto QC, using a neural network constructed from qubit-rotation gates. We have implemented this in simulation and on two commercial quantum computers. We find that the QML models can perform on par with, if not better than, classical baselines. In summary, HypaCADD offers a successful strategy for leveraging QC for CADD. AVAILABILITY AND IMPLEMENTATION: Jupyter Notebooks with Python code are freely available for academic use on GitHub: https://www.github.com/hypahub/hypacadd_notebook. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

COVID-19 , Software , Humanos , Fluxo de Trabalho , Metodologias Computacionais , Teoria Quântica , SARS-CoV-2 , Desenho de Fármacos , Simulação de Dinâmica Molecular

2.

Mitral regurgitation severity at left ventricular assist device implantation is associated with distinct myocardial transcriptomic signatures.

Duggal, Neal M; Lei, Ienglam; Wu, Xiaoting; Aaronson, Keith D; Pagani, Francis D; Lam, Hugo Y-K; Tang, Paul C.

J Thorac Cardiovasc Surg ; 166(1): 141-152.e1, 2023 07.

Artigo em Inglês | MEDLINE | ID: mdl-34689984

RESUMO

OBJECTIVES: We examined for differences in pre-left ventricular assist device (LVAD) implantation myocardial transcriptome signatures among patients with different degrees of mitral regurgitation (MR). METHODS: Between January 2018 and October 2019, we collected left ventricular (LV) cores during durable LVAD implantation (n = 72). A retrospective chart review was performed. Total RNA was isolated from LV cores and used to construct cDNA sequence libraries. The libraries were sequenced with the NovaSeq system, and data were quantified using Kallisto. Gene Set Enrichment Analysis (GSEA) and Gene Ontology analyses were performed, with a false discovery rate <0.05 considered significant. RESULTS: Comparing patients with preoperative mild or less MR (n = 30) and those with moderate-severe MR (n = 42), the moderate-severe MR group weighted less (P = .004) and had more tricuspid valve repairs (P = .043), without differences in demographics or comorbidities. We then compared both groups with a group of human donor hearts without heart failure (n = 8). Compared with the donor hearts, there were 3985 differentially expressed genes (DEGs) for mild or less MR and 4587 DEGs for moderate-severe MR. Specifically altered genes included 448 DEGs for specific for mild or less MR and 1050 DEGs for moderate-severe MR. On GSEA, common regulated genes showed increased immune gene expression and reduced expression of contraction and energetic genes. Of the 1050 genes specific for moderate-severe MR, there were additional up-regulated genes related to inflammation and reduced expression of genes related to cellular proliferation. CONCLUSIONS: Patients undergoing durable LVAD implantation with moderate-severe MR had increased activation of genes related to inflammation and reduction of cellular proliferation genes. This may have important implications for myocardial recovery.

Assuntos

Insuficiência Cardíaca , Transplante de Coração , Coração Auxiliar , Insuficiência da Valva Mitral , Humanos , Insuficiência da Valva Mitral/diagnóstico por imagem , Insuficiência da Valva Mitral/genética , Insuficiência da Valva Mitral/cirurgia , Transcriptoma , Estudos Retrospectivos , Resultado do Tratamento , Doadores de Tecidos , Insuficiência Cardíaca/genética , Insuficiência Cardíaca/cirurgia , Inflamação

3.

Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing.

Fang, Li Tai; Zhu, Bin; Zhao, Yongmei; Chen, Wanqiu; Yang, Zhaowei; Kerrigan, Liz; Langenbach, Kurt; de Mars, Maryellen; Lu, Charles; Idler, Kenneth; Jacob, Howard; Zheng, Yuanting; Ren, Luyao; Yu, Ying; Jaeger, Erich; Schroth, Gary P; Abaan, Ogan D; Talsania, Keyur; Lack, Justin; Shen, Tsai-Wei; Chen, Zhong; Stanbouly, Seta; Tran, Bao; Shetty, Jyoti; Kriga, Yuliya; Meerzaman, Daoud; Nguyen, Cu; Petitjean, Virginie; Sultan, Marc; Cam, Margaret; Mehta, Monika; Hung, Tiffany; Peters, Eric; Kalamegham, Rasika; Sahraeian, Sayed Mohammad Ebrahim; Mohiyuddin, Marghoob; Guo, Yunfei; Yao, Lijing; Song, Lei; Lam, Hugo Y K; Drabek, Jiri; Vojta, Petr; Maestro, Roberta; Gasparotto, Daniela; Kõks, Sulev; Reimann, Ene; Scherer, Andreas; Nordlund, Jessica; Liljedahl, Ulrika; Jensen, Roderick V.

Nat Biotechnol ; 39(9): 1151-1160, 2021 09.

Artigo em Inglês | MEDLINE | ID: mdl-34504347

RESUMO

The lack of samples for generating standardized DNA datasets for setting up a sequencing pipeline or benchmarking the performance of different algorithms limits the implementation and uptake of cancer genomics. Here, we describe reference call sets obtained from paired tumor-normal genomic DNA (gDNA) samples derived from a breast cancer cell line-which is highly heterogeneous, with an aneuploid genome, and enriched in somatic alterations-and a matched lymphoblastoid cell line. We partially validated both somatic mutations and germline variants in these call sets via whole-exome sequencing (WES) with different sequencing platforms and targeted sequencing with >2,000-fold coverage, spanning 82% of genomic regions with high confidence. Although the gDNA reference samples are not representative of primary cancer cells from a clinical sample, when setting up a sequencing pipeline, they not only minimize potential biases from technologies, assays and informatics but also provide a unique resource for benchmarking 'tumor-only' or 'matched tumor-normal' analyses.

Assuntos

Benchmarking , Neoplasias da Mama/genética , Análise Mutacional de DNA/normas , Sequenciamento de Nucleotídeos em Larga Escala/normas , Sequenciamento Completo do Genoma/normas , Linhagem Celular Tumoral , Conjuntos de Dados como Assunto , Células Germinativas , Humanos , Mutação , Padrões de Referência , Reprodutibilidade dos Testes

4.

"The Secret Life of Human Donor Hearts": An Examination of Transcriptomic Events During Cold Storage.

Lei, Ienglam; Wang, Zhong; Chen, Y Eugene; Ma, Peter X; Huang, Wei; Kim, Elaine; Lam, Hugo Y K; Goldstein, Daniel R; Aaronson, Keith D; Pagani, Francis D; Tang, Paul C.

Circ Heart Fail ; 13(4): e006409, 2020 04.

Artigo em Inglês | MEDLINE | ID: mdl-32264717

RESUMO

BACKGROUND: Ischemic tolerance of donor hearts has a major impact on the efficiency in utilization and clinical outcomes. Molecular events during storage may influence the severity of ischemic injury. METHODS: RNA sequencing was used to study the transcriptional profile of the human left ventricle (LV, n=4) and right ventricle (RV, n=4) after 0, 4, and 8 hours of cold storage in histidine-tryptophan-ketoglutarate preservation solution. Gene set enrichment analysis and gene ontology analysis was used to examine transcriptomic changes with cold storage. Terminal deoxynucleotidyl transferase 2´-Deoxyuridine, 5´-Triphosphate nick end labeling and p65 staining was used to examine for cell death and NFκB activation, respectively. RESULTS: The LV showed activation of genes related to inflammation and allograft rejection but downregulation of oxidative phosphorylation and fatty acid metabolism pathway genes. In contrast, inflammation-related genes were down-regulated in the RV and while oxidative phosphorylation genes were activated. These transcriptomic changes were most significant at the 8 hours with much lower differences observed between 0 and 4 hours. RNA velocity estimates corroborated the finding that immune-related genes were activated in the LV but not in the RV during storage. With increasing preservation duration, the LV showed an increase in nuclear translocation of NFκB (p65), whereas the RV showed increased cell death close to the endocardium especially at 8 hours. CONCLUSIONS: Our results demonstrated that the LV and RV of human donor hearts have distinct responses to cold ischemic storage. Transcriptomic changes related to inflammation, oxidative phosphorylation, and fatty acid metabolism pathways as well as cell death and NFκB activation were most pronounced after 8 hours of storage.

Assuntos

Temperatura Baixa/efeitos adversos , Transplante de Coração , Ventrículos do Coração/metabolismo , Preservação de Órgãos , Disfunção Primária do Enxerto/genética , Transcriptoma , Apoptose/efeitos dos fármacos , Apoptose/genética , Metabolismo Energético/efeitos dos fármacos , Metabolismo Energético/genética , Perfilação da Expressão Gênica , Glucose/farmacologia , Transplante de Coração/efeitos adversos , Ventrículos do Coração/efeitos dos fármacos , Ventrículos do Coração/patologia , Humanos , Inflamação/genética , Inflamação/patologia , Manitol/farmacologia , Preservação de Órgãos/efeitos adversos , Soluções para Preservação de Órgãos/farmacologia , Cloreto de Potássio/farmacologia , Disfunção Primária do Enxerto/patologia , Disfunção Primária do Enxerto/prevenção & controle , Procaína/farmacologia , Fatores de Risco , Fatores de Tempo , Transcriptoma/efeitos dos fármacos

5.

ecTMB: a robust method to estimate and classify tumor mutational burden.

Yao, Lijing; Fu, Yao; Mohiyuddin, Marghoob; Lam, Hugo Y K.

Sci Rep ; 10(1): 4983, 2020 03 18.

Artigo em Inglês | MEDLINE | ID: mdl-32188929

RESUMO

Tumor Mutational Burden (TMB) is a measure of the abundance of somatic mutations in a tumor, which has been shown to be an emerging biomarker for both anti-PD-(L)1 treatment and prognosis; however, multiple challenges still hinder the adoption of TMB as a biomarker. The key challenges are the inconsistency of tumor mutational burden measurement among assays and the lack of a meaningful threshold for TMB classification. Here we describe a new method, ecTMB (Estimation and Classification of TMB), which uses an explicit background mutation model to predict TMB robustly and to classify samples into biologically meaningful subtypes defined by tumor mutational burden.

Assuntos

Biomarcadores Tumorais/genética , DNA de Neoplasias/genética , Genoma Humano , Mutação , Neoplasias/classificação , Neoplasias/genética , Carga Tumoral , Análise Mutacional de DNA , DNA de Neoplasias/análise , Exoma , Humanos , Imunoterapia/métodos , Modelos Estatísticos , Neoplasias/tratamento farmacológico , Neoplasias/patologia , Prognóstico , Resultado do Tratamento

6.

Deep convolutional neural networks for accurate somatic mutation detection.

Sahraeian, Sayed Mohammad Ebrahim; Liu, Ruolin; Lau, Bayo; Podesta, Karl; Mohiyuddin, Marghoob; Lam, Hugo Y K.

Nat Commun ; 10(1): 1041, 2019 03 04.

Artigo em Inglês | MEDLINE | ID: mdl-30833567

RESUMO

Accurate detection of somatic mutations is still a challenge in cancer analysis. Here we present NeuSomatic, the first convolutional neural network approach for somatic mutation detection, which significantly outperforms previous methods on different sequencing platforms, sequencing strategies, and tumor purities. NeuSomatic summarizes sequence alignments into small matrices and incorporates more than a hundred features to capture mutation signals effectively. It can be used universally as a stand-alone somatic mutation detection method or with an ensemble of existing methods to achieve the highest accuracy.

Assuntos

Biologia Computacional/métodos , Análise Mutacional de DNA/métodos , Aprendizado de Máquina , Mutação , Redes Neurais de Computação , Biologia Computacional/instrumentação , Análise Mutacional de DNA/instrumentação , Bases de Dados Genéticas , Diploide , Exoma , Genes Neoplásicos , Humanos , Neoplasias/genética , Alinhamento de Sequência , Análise de Sequência de DNA/instrumentação , Análise de Sequência de DNA/métodos

7.

Circular DNA elements of chromosomal origin are common in healthy human somatic tissue.

Møller, Henrik Devitt; Mohiyuddin, Marghoob; Prada-Luengo, Iñigo; Sailani, M Reza; Halling, Jens Frey; Plomgaard, Peter; Maretty, Lasse; Hansen, Anders Johannes; Snyder, Michael P; Pilegaard, Henriette; Lam, Hugo Y K; Regenberg, Birgitte.

Nat Commun ; 9(1): 1069, 2018 03 14.

Artigo em Inglês | MEDLINE | ID: mdl-29540679

RESUMO

The human genome is generally organized into stable chromosomes, and only tumor cells are known to accumulate kilobase (kb)-sized extrachromosomal circular DNA elements (eccDNAs). However, it must be expected that kb eccDNAs exist in normal cells as a result of mutations. Here, we purify and sequence eccDNAs from muscle and blood samples from 16 healthy men, detecting ~100,000 unique eccDNA types from 16 million nuclei. Half of these structures carry genes or gene fragments and the majority are smaller than 25 kb. Transcription from eccDNAs suggests that eccDNAs reside in nuclei and recurrence of certain eccDNAs in several individuals implies DNA circularization hotspots. Gene-rich chromosomes contribute to more eccDNAs per megabase and the most transcribed protein-coding gene in muscle, TTN (titin), provides the most eccDNAs per gene. Thus, somatic genomes are rich in chromosome-derived eccDNAs that may influence phenotypes through altered gene copy numbers and transcription of full-length or truncated genes.

Assuntos

Cromossomos Humanos/genética , DNA Circular/genética , Humanos , Mutação/genética , Transcrição Gênica/genética

8.

Whole-genome sequencing of Atacama skeleton shows novel mutations linked with dysplasia.

Bhattacharya, Sanchita; Li, Jian; Sockell, Alexandra; Kan, Matthew J; Bava, Felice A; Chen, Shann-Ching; Ávila-Arcos, María C; Ji, Xuhuai; Smith, Emery; Asadi, Narges B; Lachman, Ralph S; Lam, Hugo Y K; Bustamante, Carlos D; Butte, Atul J; Nolan, Garry P.

Genome Res ; 28(4): 423-431, 2018 04.

Artigo em Inglês | MEDLINE | ID: mdl-29567674

RESUMO

Over a decade ago, the Atacama humanoid skeleton (Ata) was discovered in the Atacama region of Chile. The Ata specimen carried a strange phenotype-6-in stature, fewer than expected ribs, elongated cranium, and accelerated bone age-leading to speculation that this was a preserved nonhuman primate, human fetus harboring genetic mutations, or even an extraterrestrial. We previously reported that it was human by DNA analysis with an estimated bone age of about 6-8 yr at the time of demise. To determine the possible genetic drivers of the observed morphology, DNA from the specimen was subjected to whole-genome sequencing using the Illumina HiSeq platform with an average 11.5× coverage of 101-bp, paired-end reads. In total, 3,356,569 single nucleotide variations (SNVs) were found as compared to the human reference genome, 518,365 insertions and deletions (indels), and 1047 structural variations (SVs) were detected. Here, we present the detailed whole-genome analysis showing that Ata is a female of human origin, likely of Chilean descent, and its genome harbors mutations in genes (COL1A1, COL2A1, KMT2D, FLNB, ATR, TRIP11, PCNT) previously linked with diseases of small stature, rib anomalies, cranial malformations, premature joint fusion, and osteochondrodysplasia (also known as skeletal dysplasia). Together, these findings provide a molecular characterization of Ata's peculiar phenotype, which likely results from multiple known and novel putative gene mutations affecting bone development and ossification.

Assuntos

DNA Antigo/análise , Genoma Humano/genética , Osteocondrodisplasias/genética , Sequenciamento Completo do Genoma , Animais , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação INDEL , Anotação de Sequência Molecular , Mutação/genética , Osteocondrodisplasias/fisiopatologia , Fenótipo , Polimorfismo de Nucleotídeo Único/genética

9.

Gaining comprehensive biological insight into the transcriptome by performing a broad-spectrum RNA-seq analysis.

Sahraeian, Sayed Mohammad Ebrahim; Mohiyuddin, Marghoob; Sebra, Robert; Tilgner, Hagen; Afshar, Pegah T; Au, Kin Fai; Bani Asadi, Narges; Gerstein, Mark B; Wong, Wing Hung; Snyder, Michael P; Schadt, Eric; Lam, Hugo Y K.

Nat Commun ; 8(1): 59, 2017 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-28680106

RESUMO

RNA-sequencing (RNA-seq) is an essential technique for transcriptome studies, hundreds of analysis tools have been developed since it was debuted. Although recent efforts have attempted to assess the latest available tools, they have not evaluated the analysis workflows comprehensively to unleash the power within RNA-seq. Here we conduct an extensive study analysing a broad spectrum of RNA-seq workflows. Surpassing the expression analysis scope, our work also includes assessment of RNA variant-calling, RNA editing and RNA fusion detection techniques. Specifically, we examine both short- and long-read RNA-seq technologies, 39 analysis tools resulting in ~120 combinations, and ~490 analyses involving 15 samples with a variety of germline, cancer and stem cell data sets. We report the performance and propose a comprehensive RNA-seq analysis protocol, named RNACocktail, along with a computational pipeline achieving high accuracy. Validation on different samples reveals that our proposed protocol could help researchers extract more biologically relevant predictions by broad analysis of the transcriptome.RNA-seq is widely used for transcriptome analysis. Here, the authors analyse a wide spectrum of RNA-seq workflows and present a comprehensive analysis protocol named RNACocktail as well as a computational pipeline leveraging the widely used tools for accurate RNA-seq analysis.

Assuntos

Células-Tronco Embrionárias , Transcriptoma , Sequência de Bases , Linhagem Celular , Humanos

10.

Lessons from the CAGI-4 Hopkins clinical panel challenge.

Chandonia, John-Marc; Adhikari, Aashish; Carraro, Marco; Chhibber, Aparna; Cutting, Garry R; Fu, Yao; Gasparini, Alessandra; Jones, David T; Kramer, Andreas; Kundu, Kunal; Lam, Hugo Y K; Leonardi, Emanuela; Moult, John; Pal, Lipika R; Searls, David B; Shah, Sohela; Sunyaev, Shamil; Tosatto, Silvio C E; Yin, Yizhou; Buckley, Bethany A.

Hum Mutat ; 38(9): 1155-1168, 2017 09.

Artigo em Inglês | MEDLINE | ID: mdl-28397312

RESUMO

The CAGI-4 Hopkins clinical panel challenge was an attempt to assess state-of-the-art methods for clinical phenotype prediction from DNA sequence. Participants were provided with exonic sequences of 83 genes for 106 patients from the Johns Hopkins DNA Diagnostic Laboratory. Five groups participated in the challenge, predicting both the probability that each patient had each of the 14 possible classes of disease, as well as one or more causal variants. In cases where the Hopkins laboratory reported a variant, at least one predictor correctly identified the disease class in 36 of the 43 patients (84%). Even in cases where the Hopkins laboratory did not find a variant, at least one predictor correctly identified the class in 39 of the 63 patients (62%). Each prediction group correctly diagnosed at least one patient that was not successfully diagnosed by any other group. We discuss the causal variant predictions by different groups and their implications for further development of methods to assess variants of unknown significance. Our results suggest that clinically relevant variants may be missed when physicians order small panels targeted on a specific phenotype. We also quantify the false-positive rate of DNA-guided analysis in the absence of prior phenotypic indication.

Assuntos

Biologia Computacional/métodos , Análise de Sequência de DNA/métodos , Bases de Dados Genéticas , Predisposição Genética para Doença , Testes Genéticos , Humanos , Fenótipo

11.

LongISLND: in silico sequencing of lengthy and noisy datatypes.

Lau, Bayo; Mohiyuddin, Marghoob; Mu, John C; Fang, Li Tai; Bani Asadi, Narges; Dallett, Carolina; Lam, Hugo Y K.

Bioinformatics ; 32(24): 3829-3832, 2016 12 15.

Artigo em Inglês | MEDLINE | ID: mdl-27667791

RESUMO

LongISLND is a software package designed to simulate sequencing data according to the characteristics of third generation, single-molecule sequencing technologies. The general software architecture is easily extendable, as demonstrated by the emulation of Pacific Biosciences (PacBio) multi-pass sequencing with P5 and P6 chemistries, producing data in FASTQ, H5, and the latest PacBio BAM format. We demonstrate its utility by downstream processing with consensus building and variant calling. AVAILABILITY AND IMPLEMENTATION: LongISLND is implemented in Java and available at http://bioinform.github.io/longislnd CONTACT: hugo.lam@roche.comSupplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Simulação por Computador , Alinhamento de Sequência

12.

svclassify: a method to establish benchmark structural variant calls.

Parikh, Hemang; Mohiyuddin, Marghoob; Lam, Hugo Y K; Iyer, Hariharan; Chen, Desu; Pratt, Mark; Bartha, Gabor; Spies, Noah; Losert, Wolfgang; Zook, Justin M; Salit, Marc.

BMC Genomics ; 17: 64, 2016 Jan 16.

Artigo em Inglês | MEDLINE | ID: mdl-26772178

RESUMO

BACKGROUND: The human genome contains variants ranging in size from small single nucleotide polymorphisms (SNPs) to large structural variants (SVs). High-quality benchmark small variant calls for the pilot National Institute of Standards and Technology (NIST) Reference Material (NA12878) have been developed by the Genome in a Bottle Consortium, but no similar high-quality benchmark SV calls exist for this genome. Since SV callers output highly discordant results, we developed methods to combine multiple forms of evidence from multiple sequencing technologies to classify candidate SVs into likely true or false positives. Our method (svclassify) calculates annotations from one or more aligned bam files from many high-throughput sequencing technologies, and then builds a one-class model using these annotations to classify candidate SVs as likely true or false positives. RESULTS: We first used pedigree analysis to develop a set of high-confidence breakpoint-resolved large deletions. We then used svclassify to cluster and classify these deletions as well as a set of high-confidence deletions from the 1000 Genomes Project and a set of breakpoint-resolved complex insertions from Spiral Genetics. We find that likely SVs cluster separately from likely non-SVs based on our annotations, and that the SVs cluster into different types of deletions. We then developed a supervised one-class classification method that uses a training set of random non-SV regions to determine whether candidate SVs have abnormal annotations different from most of the genome. To test this classification method, we use our pedigree-based breakpoint-resolved SVs, SVs validated by the 1000 Genomes Project, and assembly-based breakpoint-resolved insertions, along with semi-automated visualization using svviz. CONCLUSIONS: We find that candidate SVs with high scores from multiple technologies have high concordance with PCR validation and an orthogonal consensus method MetaSV (99.7 % concordant), and candidate SVs with low scores are questionable. We distribute a set of 2676 high-confidence deletions and 68 high-confidence insertions with high svclassify scores from these call sets for benchmarking SV callers. We expect these methods to be particularly useful for establishing high-confidence SV calls for benchmark samples that have been characterized by multiple technologies.

Assuntos

Genoma Humano , Variação Estrutural do Genoma , Software , Benchmarking , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Anotação de Sequência Molecular , Linhagem , Polimorfismo de Nucleotídeo Único/genética

13.

An integrated map of structural variation in 2,504 human genomes.

Sudmant, Peter H; Rausch, Tobias; Gardner, Eugene J; Handsaker, Robert E; Abyzov, Alexej; Huddleston, John; Zhang, Yan; Ye, Kai; Jun, Goo; Fritz, Markus Hsi-Yang; Konkel, Miriam K; Malhotra, Ankit; Stütz, Adrian M; Shi, Xinghua; Casale, Francesco Paolo; Chen, Jieming; Hormozdiari, Fereydoun; Dayama, Gargi; Chen, Ken; Malig, Maika; Chaisson, Mark J P; Walter, Klaudia; Meiers, Sascha; Kashin, Seva; Garrison, Erik; Auton, Adam; Lam, Hugo Y K; Mu, Xinmeng Jasmine; Alkan, Can; Antaki, Danny; Bae, Taejeong; Cerveira, Eliza; Chines, Peter; Chong, Zechen; Clarke, Laura; Dal, Elif; Ding, Li; Emery, Sarah; Fan, Xian; Gujral, Madhusudan; Kahveci, Fatma; Kidd, Jeffrey M; Kong, Yu; Lameijer, Eric-Wubbo; McCarthy, Shane; Flicek, Paul; Gibbs, Richard A; Marth, Gabor; Mason, Christopher E; Menelaou, Androniki.

Nature ; 526(7571): 75-81, 2015 Oct 01.

Artigo em Inglês | MEDLINE | ID: mdl-26432246

RESUMO

Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.

Assuntos

Variação Genética/genética , Genoma Humano/genética , Mapeamento Físico do Cromossomo , Sequência de Aminoácidos , Predisposição Genética para Doença , Genética Médica , Genética Populacional , Estudo de Associação Genômica Ampla , Genômica , Genótipo , Haplótipos/genética , Homozigoto , Humanos , Dados de Sequência Molecular , Taxa de Mutação , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Análise de Sequência de DNA , Deleção de Sequência/genética

14.

An ensemble approach to accurately detect somatic mutations using SomaticSeq.

Fang, Li Tai; Afshar, Pegah Tootoonchi; Chhibber, Aparna; Mohiyuddin, Marghoob; Fan, Yu; Mu, John C; Gibeling, Greg; Barr, Sharon; Asadi, Narges Bani; Gerstein, Mark B; Koboldt, Daniel C; Wang, Wenyi; Wong, Wing H; Lam, Hugo Y K.

Genome Biol ; 16: 197, 2015 Sep 17.

Artigo em Inglês | MEDLINE | ID: mdl-26381235

RESUMO

SomaticSeq is an accurate somatic mutation detection pipeline implementing a stochastic boosting algorithm to produce highly accurate somatic mutation calls for both single nucleotide variants and small insertions and deletions. The workflow currently incorporates five state-of-the-art somatic mutation callers, and extracts over 70 individual genomic and sequencing features for each candidate site. A training set is provided to an adaptively boosted decision tree learner to create a classifier for predicting mutation statuses. We validate our results with both synthetic and real data. We report that SomaticSeq is able to achieve better overall accuracy than any individual tool incorporated.

Assuntos

Análise Mutacional de DNA/métodos , Aprendizado de Máquina , Neoplasias/genética , Humanos , Mutação INDEL

15.

Erratum: analysis of deletion breakpoints from 1,092 humans reveals details of mutation mechanisms.

Abyzov, Alexej; Li, Shantao; Kim, Daniel Rhee; Mohiyuddin, Marghoob; Stütz, Adrian M; Parrish, Nicholas F; Mu, Xinmeng Jasmine; Clark, Wyatt; Chen, Ken; Hurles, Matthew; Korbel, Jan O; Lam, Hugo Y K; Lee, Charles; Gerstein, Mark B.

Nat Commun ; 6: 8389, 2015 Sep 08.

Artigo em Inglês | MEDLINE | ID: mdl-26346554

16.

Leveraging long read sequencing from a single individual to provide a comprehensive resource for benchmarking variant calling methods.

Mu, John C; Tootoonchi Afshar, Pegah; Mohiyuddin, Marghoob; Chen, Xi; Li, Jian; Bani Asadi, Narges; Gerstein, Mark B; Wong, Wing H; Lam, Hugo Y K.

Sci Rep ; 5: 14493, 2015 Sep 28.

Artigo em Inglês | MEDLINE | ID: mdl-26412485

RESUMO

A high-confidence, comprehensive human variant set is critical in assessing accuracy of sequencing algorithms, which are crucial in precision medicine based on high-throughput sequencing. Although recent works have attempted to provide such a resource, they still do not encompass all major types of variants including structural variants (SVs). Thus, we leveraged the massive high-quality Sanger sequences from the HuRef genome to construct by far the most comprehensive gold set of a single individual, which was cross validated with deep Illumina sequencing, population datasets, and well-established algorithms. It was a necessary effort to completely reanalyze the HuRef genome as its previously published variants were mostly reported five years ago, suffering from compatibility, organization, and accuracy issues that prevent their direct use in benchmarking. Our extensive analysis and validation resulted in a gold set with high specificity and sensitivity. In contrast to the current gold sets of the NA12878 or HS1011 genomes, our gold set is the first that includes small variants, deletion SVs and insertion SVs up to a hundred thousand base-pairs. We demonstrate the utility of our HuRef gold set to benchmark several published SV detection tools.

Assuntos

Benchmarking , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Variação Genética , Genoma Humano , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos

17.

Analysis of deletion breakpoints from 1,092 humans reveals details of mutation mechanisms.

Abyzov, Alexej; Li, Shantao; Kim, Daniel Rhee; Mohiyuddin, Marghoob; Stütz, Adrian M; Parrish, Nicholas F; Mu, Xinmeng Jasmine; Clark, Wyatt; Chen, Ken; Hurles, Matthew; Korbel, Jan O; Lam, Hugo Y K; Lee, Charles; Gerstein, Mark B.

Nat Commun ; 6: 7256, 2015 Jun 01.

Artigo em Inglês | MEDLINE | ID: mdl-26028266

RESUMO

Investigating genomic structural variants at basepair resolution is crucial for understanding their formation mechanisms. We identify and analyse 8,943 deletion breakpoints in 1,092 samples from the 1000 Genomes Project. We find breakpoints have more nearby SNPs and indels than the genomic average, likely a consequence of relaxed selection. By investigating the correlation of breakpoints with DNA methylation, Hi-C interactions, and histone marks and the substitution patterns of nucleotides near them, we find that breakpoints with the signature of non-allelic homologous recombination (NAHR) are associated with open chromatin. We hypothesize that some NAHR deletions occur without DNA replication and cell division, in embryonic and germline cells. In contrast, breakpoints associated with non-homologous (NH) mechanisms often have sequence microinsertions, templated from later replicating genomic sites, spaced at two characteristic distances from the breakpoint. These microinsertions are consistent with template-switching events and suggest a particular spatiotemporal configuration for DNA during the events.

Assuntos

Pontos de Quebra do Cromossomo , DNA/metabolismo , Deleção de Genes , Genoma Humano/genética , Cromatina , Replicação do DNA , Recombinação Homóloga , Humanos , Mutação , Nucleotídeos , Deleção de Sequência

18.

MetaSV: an accurate and integrative structural-variant caller for next generation sequencing.

Mohiyuddin, Marghoob; Mu, John C; Li, Jian; Bani Asadi, Narges; Gerstein, Mark B; Abyzov, Alexej; Wong, Wing H; Lam, Hugo Y K.

Bioinformatics ; 31(16): 2741-4, 2015 Aug 15.

Artigo em Inglês | MEDLINE | ID: mdl-25861968

RESUMO

UNLABELLED: Structural variations (SVs) are large genomic rearrangements that vary significantly in size, making them challenging to detect with the relatively short reads from next-generation sequencing (NGS). Different SV detection methods have been developed; however, each is limited to specific kinds of SVs with varying accuracy and resolution. Previous works have attempted to combine different methods, but they still suffer from poor accuracy particularly for insertions. We propose MetaSV, an integrated SV caller which leverages multiple orthogonal SV signals for high accuracy and resolution. MetaSV proceeds by merging SVs from multiple tools for all types of SVs. It also analyzes soft-clipped reads from alignment to detect insertions accurately since existing tools underestimate insertion SVs. Local assembly in combination with dynamic programming is used to improve breakpoint resolution. Paired-end and coverage information is used to predict SV genotypes. Using simulation and experimental data, we demonstrate the effectiveness of MetaSV across various SV types and sizes. AVAILABILITY AND IMPLEMENTATION: Code in Python is at http://bioinform.github.io/metasv/. CONTACT: rd@bina.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Mutagênese Insercional , Deleção de Sequência

19.

VarSim: a high-fidelity simulation and validation framework for high-throughput genome sequencing with cancer applications.

Mu, John C; Mohiyuddin, Marghoob; Li, Jian; Bani Asadi, Narges; Gerstein, Mark B; Abyzov, Alexej; Wong, Wing H; Lam, Hugo Y K.

Bioinformatics ; 31(9): 1469-71, 2015 May 01.

Artigo em Inglês | MEDLINE | ID: mdl-25524895

RESUMO

SUMMARY: VarSim is a framework for assessing alignment and variant calling accuracy in high-throughput genome sequencing through simulation or real data. In contrast to simulating a random mutation spectrum, it synthesizes diploid genomes with germline and somatic mutations based on a realistic model. This model leverages information such as previously reported mutations to make the synthetic genomes biologically relevant. VarSim simulates and validates a wide range of variants, including single nucleotide variants, small indels and large structural variants. It is an automated, comprehensive compute framework supporting parallel computation and multiple read simulators. Furthermore, we developed a novel map data structure to validate read alignments, a strategy to compare variants binned in size ranges and a lightweight, interactive, graphical report to visualize validation results with detailed statistics. Thus far, it is the most comprehensive validation tool for secondary analysis in next generation sequencing. AVAILABILITY AND IMPLEMENTATION: Code in Java and Python along with instructions to download the reads and variants is at http://bioinform.github.io/varsim. CONTACT: rd@bina.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Simulação por Computador , Genômica , Humanos , Mutação , Neoplasias/genética , Alinhamento de Sequência

20.

Personal omics profiling reveals dynamic molecular and medical phenotypes.

Chen, Rui; Mias, George I; Li-Pook-Than, Jennifer; Jiang, Lihua; Lam, Hugo Y K; Chen, Rong; Miriami, Elana; Karczewski, Konrad J; Hariharan, Manoj; Dewey, Frederick E; Cheng, Yong; Clark, Michael J; Im, Hogune; Habegger, Lukas; Balasubramanian, Suganthi; O'Huallachain, Maeve; Dudley, Joel T; Hillenmeyer, Sara; Haraksingh, Rajini; Sharon, Donald; Euskirchen, Ghia; Lacroute, Phil; Bettinger, Keith; Boyle, Alan P; Kasowski, Maya; Grubert, Fabian; Seki, Scott; Garcia, Marco; Whirl-Carrillo, Michelle; Gallardo, Mercedes; Blasco, Maria A; Greenberg, Peter L; Snyder, Phyllis; Klein, Teri E; Altman, Russ B; Butte, Atul J; Ashley, Euan A; Gerstein, Mark; Nadeau, Kari C; Tang, Hua; Snyder, Michael.

Cell ; 148(6): 1293-307, 2012 Mar 16.

Artigo em Inglês | MEDLINE | ID: mdl-22424236

RESUMO

Personalized medicine is expected to benefit from combining genomic information with regular monitoring of physiological states by multiple high-throughput methods. Here, we present an integrative personal omics profile (iPOP), an analysis that combines genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single individual over a 14 month period. Our iPOP analysis revealed various medical risks, including type 2 diabetes. It also uncovered extensive, dynamic changes in diverse molecular components and biological pathways across healthy and diseased conditions. Extremely high-coverage genomic and transcriptomic data, which provide the basis of our iPOP, revealed extensive heteroallelic changes during healthy and diseased states and an unexpected RNA editing mechanism. This study demonstrates that longitudinal iPOP can be used to interpret healthy and diseased states by connecting genomic information with additional dynamic omics activity.

Assuntos

Genoma Humano , Genômica , Medicina de Precisão , Diabetes Mellitus Tipo 2/genética , Feminino , Perfilação da Expressão Gênica , Humanos , Masculino , Metabolômica , Pessoa de Meia-Idade , Mutação , Proteômica , Vírus Sinciciais Respiratórios/isolamento & purificação , Rhinovirus/isolamento & purificação

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA