Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 95
Filter
1.
bioRxiv ; 2024 Mar 03.
Article in English | MEDLINE | ID: mdl-38529502

ABSTRACT

Accurate genotyping of Killer cell Immunoglobulin-like Receptor (KIR) genes plays a pivotal role in enhancing our understanding of innate immune responses, disease correlations, and the advancement of personalized medicine. However, due to the high variability of the KIR region and high level of sequence similarity among different KIR genes, the currently available genotyping methods are unable to accurately infer copy numbers, genotypes and haplotypes of individual KIR genes from next-generation sequencing data. Here we introduce Geny, a new computational tool for precise genotyping of KIR genes. Geny utilizes available KIR haplotype databases and proposes a novel combination of expectation-maximization filtering schemes and integer linear programming-based combinatorial optimization models to resolve ambiguous reads, provide accurate copy number estimation and estimate the haplotype of each copy for the genes within the KIR region. We evaluated Geny on a large set of simulated short-read datasets covering the known validated KIR region assemblies and a set of Illumina short-read samples sequenced from 25 validated samples from the Human Pangenome Reference Consortium collection and showed that it outperforms the existing genotyping tools in terms of accuracy, precision and recall. We envision Geny becoming a valuable resource for understanding immune system response and consequently advancing the field of patient-centric medicine.

2.
Front Oncol ; 13: 1199741, 2023.
Article in English | MEDLINE | ID: mdl-37469403

ABSTRACT

Background: Next-generation sequencing (NGS), including whole genome sequencing (WGS) and whole exome sequencing (WES), is increasingly being used for clinic care. While NGS data have the potential to be repurposed to support clinical pharmacogenomics (PGx), current computational approaches have not been widely validated using clinical data. In this study, we assessed the accuracy of the Aldy computational method to extract PGx genotypes from WGS and WES data for 14 and 13 major pharmacogenes, respectively. Methods: Germline DNA was isolated from whole blood samples collected for 264 patients seen at our institutional molecular solid tumor board. DNA was used for panel-based genotyping within our institutional Clinical Laboratory Improvement Amendments- (CLIA-) certified PGx laboratory. DNA was also sent to other CLIA-certified commercial laboratories for clinical WGS or WES. Aldy v3.3 and v4.4 were used to extract PGx genotypes from these NGS data, and results were compared to the panel-based genotyping reference standard that contained 45 star allele-defining variants within CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, CYP3A4, CYP3A5, CYP4F2, DPYD, G6PD, NUDT15, SLCO1B1, TPMT, and VKORC1. Results: Mean WGS read depth was >30x for all variant regions except for G6PD (average read depth was 29 reads), and mean WES read depth was >30x for all variant regions. For 94 patients with WGS, Aldy v3.3 diplotype calls were concordant with those from the genotyping reference standard in 99.5% of cases when excluding diplotypes with additional major star alleles not tested by targeted genotyping, ambiguous phasing, and CYP2D6 hybrid alleles. Aldy v3.3 identified 15 additional clinically actionable star alleles not covered by genotyping within CYP2B6, CYP2C19, DPYD, SLCO1B1, and NUDT15. Within the WGS cohort, Aldy v4.4 diplotype calls were concordant with those from genotyping in 99.7% of cases. When excluding patients with CYP2D6 copy number variation, all Aldy v4.4 diplotype calls except for one CYP3A4 diplotype call were concordant with genotyping for 161 patients in the WES cohort. Conclusion: Aldy v3.3 and v4.4 called diplotypes for major pharmacogenes from clinical WES and WGS data with >99% accuracy. These findings support the use of Aldy to repurpose clinical NGS data to inform clinical PGx.

3.
Genome Res ; 33(7): 1089-1100, 2023 07.
Article in English | MEDLINE | ID: mdl-37316351

ABSTRACT

Recent studies exploring the impact of methylation in tumor evolution suggest that although the methylation status of many of the CpG sites are preserved across distinct lineages, others are altered as the cancer progresses. Because changes in methylation status of a CpG site may be retained in mitosis, they could be used to infer the progression history of a tumor via single-cell lineage tree reconstruction. In this work, we introduce the first principled distance-based computational method, Sgootr, for inferring a tumor's single-cell methylation lineage tree and for jointly identifying lineage-informative CpG sites that harbor changes in methylation status that are retained along the lineage. We apply Sgootr on single-cell bisulfite-treated whole-genome sequencing data of multiregionally sampled tumor cells from nine metastatic colorectal cancer patients, as well as multiregionally sampled single-cell reduced-representation bisulfite sequencing data from a glioblastoma patient. We show that the tumor lineages constructed reveal a simple model underlying tumor progression and metastatic seeding. A comparison of Sgootr against alternative approaches shows that Sgootr can construct lineage trees with fewer migration events and with more in concordance with the sequential-progression model of tumor evolution, with a running time a fraction of that used in prior studies. Lineage-informative CpG sites identified by Sgootr are in inter-CpG island (CGI) regions, as opposed to intra-CGIs, which have been the main regions of interest in genomic methylation-related analyses.


Subject(s)
DNA Methylation , Neoplasms , Humans , DNA Methylation/genetics , Sulfites , Sequence Analysis, DNA/methods , Genome , Neoplasms/genetics , CpG Islands/genetics
4.
Genome Res ; 33(1): 61-70, 2023 01.
Article in English | MEDLINE | ID: mdl-36657977

ABSTRACT

High-throughput sequencing provides sufficient means for determining genotypes of clinically important pharmacogenes that can be used to tailor medical decisions to individual patients. However, pharmacogene genotyping, also known as star-allele calling, is a challenging problem that requires accurate copy number calling, structural variation identification, variant calling, and phasing within each pharmacogene copy present in the sample. Here we introduce Aldy 4, a fast and efficient tool for genotyping pharmacogenes that uses combinatorial optimization for accurate star-allele calling across different sequencing technologies. Aldy 4 adds support for long reads and uses a novel phasing model and improved copy number and variant calling models. We compare Aldy 4 against the current state-of-the-art star-allele callers on a large and diverse set of samples and genes sequenced by various sequencing technologies, such as whole-genome and targeted Illumina sequencing, barcoded 10x Genomics, and Pacific Biosciences (PacBio) HiFi. We show that Aldy 4 is the most accurate star-allele caller with near-perfect accuracy in all evaluated contexts, and hope that Aldy remains an invaluable tool in the clinical toolbox even with the advent of long-read sequencing technologies.


Subject(s)
Pharmacogenetics , Polymorphism, Single Nucleotide , Humans , Alleles , Genotype , Genomics , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA
7.
Cell Syst ; 13(10): 808-816.e5, 2022 10 19.
Article in English | MEDLINE | ID: mdl-36265467

ABSTRACT

Human immunoglobulin heavy chain (IGH) locus on chromosome 14 includes more than 40 functional copies of the variable gene (IGHV), which are critical for the structure of antibodies that identify and neutralize pathogenic invaders as a part of the adaptive immune system. Because of its highly repetitive sequence composition, the IGH locus has been particularly difficult to assemble or genotype when using standard short-read sequencing technologies. Here, we introduce ImmunoTyper-SR, an algorithmic tool for the genotyping and CNV analysis of the germline IGHV genes on Illumina whole-genome sequencing (WGS) data using a combinatorial optimization formulation that resolves ambiguous read mappings. We have validated ImmunoTyper-SR on 12 individuals, whose IGHV allele composition had been independently validated, as well as concordance between WGS replicates from nine individuals. We then applied ImmunoTyper-SR on 585 COVID patients to investigate the associations between IGHV alleles and anti-type I IFN autoantibodies, which were previously associated with COVID-19 severity.


Subject(s)
COVID-19 , Immunoglobulin Variable Region , Humans , Immunoglobulin Variable Region/genetics , Genotype , COVID-19/genetics , High-Throughput Nucleotide Sequencing , Immunoglobulin Heavy Chains/genetics , Autoantibodies/genetics
8.
Nat Commun ; 13(1): 6430, 2022 10 28.
Article in English | MEDLINE | ID: mdl-36307411

ABSTRACT

Computational identification and quantification of distinct microbes from high throughput sequencing data is crucial for our understanding of human health. Existing methods either use accurate but computationally expensive alignment-based approaches or less accurate but computationally fast alignment-free approaches, which often fail to correctly assign reads to genomes. Here we introduce CAMMiQ, a combinatorial optimization framework to identify and quantify distinct genomes (specified by a database) in a metagenomic dataset. As a key methodological innovation, CAMMiQ uses substrings of variable length and those that appear in two genomes in the database, as opposed to the commonly used fixed-length, unique substrings. These substrings allow to accurately decouple mixtures of highly similar genomes resulting in higher accuracy than the leading alternatives, without requiring additional computational resources, as demonstrated on commonly used benchmarking datasets. Importantly, we show that CAMMiQ can distinguish closely related bacterial strains in simulated metagenomic and real single-cell metatranscriptomic data.


Subject(s)
Metagenome , Metagenomics , Humans , Metagenomics/methods , Metagenome/genetics , High-Throughput Nucleotide Sequencing/methods , Bacteria/genetics , Algorithms , Sequence Analysis, DNA/methods
9.
J Mol Diagn ; 24(6): 576-585, 2022 06.
Article in English | MEDLINE | ID: mdl-35452844

ABSTRACT

Germline whole exome sequencing from molecular tumor boards has the potential to be repurposed to support clinical pharmacogenomics. However, accurately calling pharmacogenomics-relevant genotypes from exome sequencing data remains challenging. Accordingly, this study assessed the analytical validity of the computational tool, Aldy, in calling pharmacogenomics-relevant genotypes from exome sequencing data for 13 major pharmacogenes. Germline DNA from whole blood was obtained for 164 subjects seen at an institutional molecular solid tumor board. All subjects had whole exome sequencing from Ashion Analytics and panel-based genotyping from an institutional pharmacogenomics laboratory. Aldy version 3.3 was operationalized on the LifeOmic Precision Health Cloud with copy number fixed to two copies per gene. Aldy results were compared with those from genotyping for 56 star allele-defining variants within CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, CYP3A4, CYP3A5, CYP4F2, DPYD, G6PD, NUDT15, SLCO1B1, and TPMT. Read depth was >100× for all variants except CYP3A4∗22. For 75 subjects in the validation cohort, all 3393 Aldy variant calls were concordant with genotyping. Aldy calls for 736 diplotypes containing alleles assessed by both platforms were also concordant. Aldy identified additional star alleles not covered by targeted genotyping for 139 diplotypes. Aldy accurately called variants and diplotypes for 13 major pharmacogenes, except for CYP2D6 variants involving copy number variations, thus allowing repurposing of whole exome sequencing to support clinical pharmacogenomics.


Subject(s)
Cytochrome P-450 CYP2D6 , Pharmacogenetics , Cytochrome P-450 CYP2D6/genetics , Cytochrome P-450 CYP3A/genetics , DNA Copy Number Variations/genetics , Genotype , High-Throughput Nucleotide Sequencing , Humans , Liver-Specific Organic Anion Transporter 1/genetics , Pharmacogenetics/methods , Exome Sequencing
10.
bioRxiv ; 2022 Feb 02.
Article in English | MEDLINE | ID: mdl-35132409

ABSTRACT

Human immunoglobulin heavy chain (IGH) locus on chromosome 14 includes more than 40 functional copies of the variable gene (IGHV), which, together with the joining genes (IGHJ), diversity genes (IGHD), constant genes (IGHC) and immunoglobulin light chains, code for antibodies that identify and neutralize pathogenic invaders as a part of the adaptive immune system. Because of its highly repetitive sequence composition, the IGH locus has been particularly difficult to assemble or genotype through the use of standard short read sequencing technologies. Here we introduce ImmunoTyper-SR, an algorithmic method for genotype and CNV analysis of the germline IGHV genes using Illumina whole genome sequencing (WGS) data. ImmunoTyper-SR is based on a novel combinatorial optimization formulation that aims to minimize the total edit distance between reads and their assigned IGHV alleles from a given database, with constraints on the number and distribution of reads across each called allele. We have validated ImmunoTyper-SR on 12 individuals with Illumina WGS data from the 1000 Genomes Project, whose IGHV allele composition have been studied extensively through the use of long read and targeted sequencing platforms, as well as nine individuals from the NIAID COVID Consortium who have been subjected to WGS twice. We have then applied ImmunoTyper-SR on 585 samples from the NIAID COVID Consortium to investigate associations between distinct IGHV alleles and anti-type I IFN autoantibodies which have been linked to COVID-19 severity.

11.
Pac Symp Biocomput ; 27: 397-401, 2022.
Article in English | MEDLINE | ID: mdl-34890166

ABSTRACT

Cancer results from an evolutionary process that yields a heterogeneous tumor with distinct subpopulations and varying sets of somatic mutations. This perspective discusses computational methods to infer models of evolutionary processes in cancer that aim to improve our understanding of tumorigenesis and ultimately enhance current clinical practice.


Subject(s)
Computational Biology , Neoplasms , Humans , Mutation , Neoplasms/genetics
12.
Nat Comput Sci ; 2(9): 577-583, 2022 Sep.
Article in English | MEDLINE | ID: mdl-38177468

ABSTRACT

We introduce HUNTRESS, a computational method for mutational intratumor heterogeneity inference from noisy genotype matrices derived from single-cell sequencing data, the running time of which is linear with the number of cells and quadratic with the number of mutations. We prove that, under reasonable conditions, HUNTRESS computes the true progression history of a tumor with high probability. On simulated and real tumor sequencing data, HUNTRESS is demonstrated to be faster than available alternatives with comparable or better accuracy. Additionally, the progression histories of tumors inferred by HUNTRESS on real single-cell sequencing datasets agree with the best known evolution scenarios for the associated tumors.


Subject(s)
Neoplasms , Humans , Neoplasms/genetics , Sequence Analysis , Mutation
14.
Cell Syst ; 12(10): 983-993.e7, 2021 10 20.
Article in English | MEDLINE | ID: mdl-34450045

ABSTRACT

Genotype imputation is an essential tool in genomics research, whereby missing genotypes are inferred using reference genomes to enhance downstream analyses. Recently, public imputation servers have allowed researchers to leverage large-scale genomic data resources for imputation. However, privacy concerns about uploading one's genetic data to a server limit the utility of these services. We introduce a secure hardware-based solution for privacy-preserving genotype imputation, which keeps the input genomes private by processing them within Intel SGX's trusted execution environment. Our solution features SMac, an efficient and secure imputation algorithm designed for Intel SGX, which employs a state-of-the-art imputation strategy also utilized by existing imputation servers. SMac achieves imputation accuracy equivalent to existing tools and provides protection against known side-channel attacks on SGX while maintaining scalability. We also show the necessity of our enhanced security by identifying vulnerabilities in existing imputation software. Our work represents a step toward privacy-preserving genomic analysis services.


Subject(s)
Genomics , Privacy , Algorithms , Genotype , Software
15.
Cell ; 184(8): 2239-2254.e39, 2021 04 15.
Article in English | MEDLINE | ID: mdl-33831375

ABSTRACT

Intra-tumor heterogeneity (ITH) is a mechanism of therapeutic resistance and therefore an important clinical challenge. However, the extent, origin, and drivers of ITH across cancer types are poorly understood. To address this, we extensively characterize ITH across whole-genome sequences of 2,658 cancer samples spanning 38 cancer types. Nearly all informative samples (95.1%) contain evidence of distinct subclonal expansions with frequent branching relationships between subclones. We observe positive selection of subclonal driver mutations across most cancer types and identify cancer type-specific subclonal patterns of driver gene mutations, fusions, structural variants, and copy number alterations as well as dynamic changes in mutational processes between subclonal expansions. Our results underline the importance of ITH and its drivers in tumor evolution and provide a pan-cancer resource of comprehensively annotated subclonal events from whole-genome sequencing data.


Subject(s)
Genetic Heterogeneity , Neoplasms/genetics , DNA Copy Number Variations , DNA, Neoplasm/chemistry , DNA, Neoplasm/metabolism , Databases, Genetic , Drug Resistance, Neoplasm/genetics , Humans , Neoplasms/pathology , Polymorphism, Single Nucleotide , Whole Genome Sequencing
16.
iScience ; 23(11): 101655, 2020 Nov 20.
Article in English | MEDLINE | ID: mdl-33117968

ABSTRACT

Principled computational approaches for tumor phylogeny reconstruction via single-cell sequencing typically aim to build the most likely perfect phylogeny tree from the noisy genotype matrix - which represents genotype calls of single cells. This problem is NP-hard, and as a result, existing approaches aim to solve relatively small instances of it through combinatorial optimization techniques or Bayesian inference. As expected, even when the goal is to infer basic topological features of the tumor phylogeny, rather than reconstructing the topology entirely, these approaches could be prohibitively slow. In this paper, we introduce fast deep learning solutions to the problems of inferring whether the most likely tree has a linear (chain) or branching topology and whether a perfect phylogeny is feasible from a given genotype matrix. We also present a reinforcement learning approach for reconstructing the most likely tumor phylogeny. This preliminary work demonstrates that data-driven approaches can reconstruct key features of tumor evolution.

17.
iScience ; 23(9): 101508, 2020 Sep 25.
Article in English | MEDLINE | ID: mdl-32896768

ABSTRACT

[This corrects the article DOI: 10.1016/j.isci.2020.100883.].

18.
Bioinformatics ; 36(Suppl_1): i169-i176, 2020 07 01.
Article in English | MEDLINE | ID: mdl-32657358

ABSTRACT

MOTIVATION: Recent advances in single-cell sequencing (SCS) offer an unprecedented insight into tumor emergence and evolution. Principled approaches to tumor phylogeny reconstruction via SCS data are typically based on general computational methods for solving an integer linear program, or a constraint satisfaction program, which, although guaranteeing convergence to the most likely solution, are very slow. Others based on Monte Carlo Markov Chain or alternative heuristics not only offer no such guarantee, but also are not faster in practice. As a result, novel methods that can scale up to handle the size and noise characteristics of emerging SCS data are highly desirable to fully utilize this technology. RESULTS: We introduce PhISCS-BnB (phylogeny inference using SCS via branch and bound), a branch and bound algorithm to compute the most likely perfect phylogeny on an input genotype matrix extracted from an SCS dataset. PhISCS-BnB not only offers an optimality guarantee, but is also 10-100 times faster than the best available methods on simulated tumor SCS data. We also applied PhISCS-BnB on a recently published large melanoma dataset derived from the sublineages of a cell line involving 20 clones with 2367 mutations, which returned the optimal tumor phylogeny in <4 h. The resulting phylogeny agrees with and extends the published results by providing a more detailed picture on the clonal evolution of the tumor. AVAILABILITY AND IMPLEMENTATION: https://github.com/algo-cancer/PhISCS-BnB. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Neoplasms , Humans , Markov Chains , Neoplasms/genetics , Phylogeny , Sequence Analysis , Software
19.
Bioinformatics ; 36(Suppl_1): i427-i435, 2020 07 01.
Article in English | MEDLINE | ID: mdl-32657374

ABSTRACT

MOTIVATION: As multi-region, time-series and single-cell sequencing data become more widely available; it is becoming clear that certain tumors share evolutionary characteristics with others. In the last few years, several computational methods have been developed with the goal of inferring the subclonal composition and evolutionary history of tumors from tumor biopsy sequencing data. However, the phylogenetic trees that they report differ significantly between tumors (even those with similar characteristics). RESULTS: In this article, we present a novel combinatorial optimization method, CONETT, for detection of recurrent tumor evolution trajectories. Our method constructs a consensus tree of conserved evolutionary trajectories based on the information about temporal order of alteration events in a set of tumors. We apply our method to previously published datasets of 100 clear-cell renal cell carcinoma and 99 non-small-cell lung cancer patients and identify both conserved trajectories that were reported in the original studies, as well as new trajectories. AVAILABILITY AND IMPLEMENTATION: CONETT is implemented in C++ and available at https://github.com/ehodzic/CONETT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Carcinoma, Non-Small-Cell Lung , Lung Neoplasms , Humans , Phylogeny , Software
20.
Nat Methods ; 17(3): 295-301, 2020 03.
Article in English | MEDLINE | ID: mdl-32132732

ABSTRACT

Genome-wide association studies (GWAS), especially on rare diseases, may necessitate exchange of sensitive genomic data between multiple institutions. Since genomic data sharing is often infeasible due to privacy concerns, cryptographic methods, such as secure multiparty computation (SMC) protocols, have been developed with the aim of offering privacy-preserving collaborative GWAS. Unfortunately, the computational overhead of these methods remain prohibitive for human-genome-scale data. Here we introduce SkSES (https://github.com/ndokmai/sgx-genome-variants-search), a hardware-software hybrid approach for privacy-preserving collaborative GWAS, which improves the running time of the most advanced cryptographic protocols by two orders of magnitude. The SkSES approach is based on trusted execution environments (TEEs) offered by current-generation microprocessors-in particular, Intel's SGX. To overcome the severe memory limitation of the TEEs, SkSES employs novel 'sketching' algorithms that maintain essential statistical information on genomic variants in input VCF files. By additionally incorporating efficient data compression and population stratification reduction methods, SkSES identifies the top k genomic variants in a cohort quickly, accurately and in a privacy-preserving manner.


Subject(s)
Computational Biology/methods , Genome-Wide Association Study , Genomics/methods , Algorithms , Genetic Variation , Genome, Human , Genotype , Humans , Phenotype , Polymorphism, Single Nucleotide , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...