Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 53
Filtrar
1.
bioRxiv ; 2024 May 03.
Artículo en Inglés | MEDLINE | ID: mdl-38746331

RESUMEN

Cancer is an evolutionary disease driven by mutations in asexually-reproducing somatic cells. In asexual microbes, bias reversals in the mutation spectrum can speed adaptation by increasing access to previously undersampled beneficial mutations. By analyzing tumors from 20 tissues, along with normal tissue and the germline, we demonstrate this effect in cancer. Non-hypermutated tumors reverse the germline mutation bias and have consistent spectra across tissues. These spectra changes carry the signature of hypoxia, and they facilitate positive selection in cancer genes. Hypermutated and non-hypermutated tumors thus acquire driver mutations differently: hypermutated tumors by higher mutation rates and non-hypermutated tumors by changing the mutation spectrum to reverse the germline mutation bias.

2.
Mol Biol Evol ; 41(5)2024 May 03.
Artículo en Inglés | MEDLINE | ID: mdl-38636507

RESUMEN

Inferring past demographic history of natural populations from genomic data is of central concern in many studies across research fields. Previously, our group had developed dadi, a widely used demographic history inference method based on the allele frequency spectrum (AFS) and maximum composite-likelihood optimization. However, dadi's optimization procedure can be computationally expensive. Here, we present donni (demography optimization via neural network inference), a new inference method based on dadi that is more efficient while maintaining comparable inference accuracy. For each dadi-supported demographic model, donni simulates the expected AFS for a range of model parameters then trains a set of Mean Variance Estimation neural networks using the simulated AFS. Trained networks can then be used to instantaneously infer the model parameters from future genomic data summarized by an AFS. We demonstrate that for many demographic models, donni can infer some parameters, such as population size changes, very well and other parameters, such as migration rates and times of demographic events, fairly well. Importantly, donni provides both parameter and confidence interval estimates from input AFS with accuracy comparable to parameters inferred by dadi's likelihood optimization while bypassing its long and computationally intensive evaluation process. donni's performance demonstrates that supervised machine learning algorithms may be a promising avenue for developing more sustainable and computationally efficient demographic history inference methods.


Asunto(s)
Frecuencia de los Genes , Modelos Genéticos , Aprendizaje Automático Supervisado , Genética de Población/métodos , Redes Neurales de la Computación , Humanos
3.
bioRxiv ; 2024 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-38405827

RESUMEN

Inferring past demographic history of natural populations from genomic data is of central concern in many studies across research fields. Previously, our group had developed dadi, a widely used demographic history inference method based on the allele frequency spectrum (AFS) and maximum composite likelihood optimization. However, dadi's optimization procedure can be computationally expensive. Here, we developed donni (demography optimization via neural network inference), a new inference method based on dadi that is more efficient while maintaining comparable inference accuracy. For each dadi-supported demographic model, donni simulates the expected AFS for a range of model parameters then trains a set of Mean Variance Estimation neural networks using the simulated AFS. Trained networks can then be used to instantaneously infer the model parameters from future input data AFS. We demonstrated that for many demographic models, donni can infer some parameters, such as population size changes, very well and other parameters, such as migration rates and times of demographic events, fairly well. Importantly, donni provides both parameter and confidence interval estimates from input AFS with accuracy comparable to parameters inferred by dadi's likelihood optimization while bypassing its long and computationally intensive evaluation process. donni's performance demonstrates that supervised machine learning algorithms may be a promising avenue for developing more sustainable and computationally efficient demographic history inference methods.

4.
Am Nat ; 202(4): 503-518, 2023 10.
Artículo en Inglés | MEDLINE | ID: mdl-37792927

RESUMEN

AbstractRecent experimental evidence demonstrates that shifts in mutational biases-for example, increases in transversion frequency-can change the distribution of fitness effects of mutations (DFE). In particular, reducing or reversing a prevailing bias can increase the probability that a de novo mutation is beneficial. It has also been shown that mutator bacteria are more likely to emerge if the beneficial mutations they generate have a larger effect size than observed in the wild type. Here, we connect these two results, demonstrating that mutator strains that reduce or reverse a prevailing bias have a positively shifted DFE, which in turn can dramatically increase their emergence probability. Since changes in mutation rate and bias are often coupled through the gain and loss of DNA repair enzymes, our results predict that the invasion of mutator strains will be facilitated by shifts in mutation bias that offer improved access to previously undersampled beneficial mutations.


Asunto(s)
Tasa de Mutación , Mutación
5.
bioRxiv ; 2023 Jun 16.
Artículo en Inglés | MEDLINE | ID: mdl-37398279

RESUMEN

Summary: dadi is a popular software package for inferring models of demographic history and natural selection from population genomic data. But using dadi requires Python scripting and manual parallelization of optimization jobs. We developed dadi-cli to simplify dadi usage and also enable straighforward distributed computing. Availability and Implementation: dadi-cli is implemented in Python and released under the Apache License 2.0. The source code is available at https://github.com/xin-huang/dadi-cli . dadi-cli can be installed via PyPI and conda, and is also available through Cacao on Jetstream2 https://cacao.jetstream-cloud.org/ .

6.
Elife ; 122023 06 21.
Artículo en Inglés | MEDLINE | ID: mdl-37342968

RESUMEN

Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic datasets remains a major obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity and quality of available genetic data, and the sophistication of inference and simulation software. However, implementing these simulations still requires substantial time and specialized knowledge. These challenges are especially pronounced for simulating genomes for species that are not well-studied, since it is not always clear what information is required to produce simulations with a level of realism sufficient to confidently answer a given question. The community-developed framework stdpopsim seeks to lower this barrier by facilitating the simulation of complex population genetic models using up-to-date information. The initial version of stdpopsim focused on establishing this framework using six well-characterized model species (Adrion et al., 2020). Here, we report on major improvements made in the new release of stdpopsim (version 0.2), which includes a significant expansion of the species catalog and substantial additions to simulation capabilities. Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog more than threefold and broadened coverage across the tree of life. During the process of expanding the catalog, we have identified common sticking points and developed the best practices for setting up genome-scale simulations. We describe the input data required for generating a realistic simulation, suggest good practices for obtaining the relevant information from the literature, and discuss common pitfalls and major considerations. These improvements to stdpopsim aim to further promote the use of realistic whole-genome population genetic simulations, especially in non-model organisms, making them available, transparent, and accessible to everyone.


Asunto(s)
Genoma , Programas Informáticos , Simulación por Computador , Genética de Población , Genómica
7.
Genetics ; 224(4)2023 08 09.
Artículo en Inglés | MEDLINE | ID: mdl-37279657

RESUMEN

Polyploidy is an important generator of evolutionary novelty across diverse groups in the Tree of Life, including many crops. However, the impact of whole-genome duplication depends on the mode of formation: doubling within a single lineage (autopolyploidy) versus doubling after hybridization between two different lineages (allopolyploidy). Researchers have historically treated these two scenarios as completely separate cases based on patterns of chromosome pairing, but these cases represent ideals on a continuum of chromosomal interactions among duplicated genomes. Understanding the history of polyploid species thus demands quantitative inferences of demographic history and rates of exchange between subgenomes. To meet this need, we developed diffusion models for genetic variation in polyploids with subgenomes that cannot be bioinformatically separated and with potentially variable inheritance patterns, implementing them in the dadi software. We validated our models using forward SLiM simulations and found that our inference approach is able to accurately infer evolutionary parameters (timing, bottleneck size) involved with the formation of auto- and allotetraploids, as well as exchange rates in segmental allotetraploids. We then applied our models to empirical data for allotetraploid shepherd's purse (Capsella bursa-pastoris), finding evidence for allelic exchange between the subgenomes. Taken together, our model provides a foundation for demographic modeling in polyploids using diffusion equations, which will help increase our understanding of the impact of demography and selection in polyploid lineages.


Asunto(s)
Capsella , Poliploidía , Evolución Biológica , Hibridación Genética , Capsella/genética , Demografía
8.
Genetics ; 222(3)2022 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-36173327

RESUMEN

Understanding the demographic history of populations is a key goal in population genetics, and with improving methods and data, ever more complex models are being proposed and tested. Demographic models of current interest typically consist of a set of discrete populations, their sizes and growth rates, and continuous and pulse migrations between those populations over a number of epochs, which can require dozens of parameters to fully describe. There is currently no standard format to define such models, significantly hampering progress in the field. In particular, the important task of translating the model descriptions in published work into input suitable for population genetic simulators is labor intensive and error prone. We propose the Demes data model and file format, built on widely used technologies, to alleviate these issues. Demes provide a well-defined and unambiguous model of populations and their properties that is straightforward to implement in software, and a text file format that is designed for simplicity and clarity. We provide thoroughly tested implementations of Demes parsers in multiple languages including Python and C, and showcase initial support in several simulators and inference methods. An introduction to the file format and a detailed specification are available at https://popsim-consortium.github.io/demes-spec-docs/.


Asunto(s)
Genética de Población , Programas Informáticos , Demografía
9.
Elife ; 112022 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-35787784

RESUMEN

Background: Lymphatic malformations (LMs) often pose treatment challenges due to a large size or a critical location that could lead to disfigurement, and there are no standardized treatment approaches for either refractory or unresectable cases. Methods: We examined the genomic landscape of a patient cohort of LMs (n = 30 cases) that underwent comprehensive genomic profiling using a large-panel next-generation sequencing assay. Immunohistochemical analyses were completed in parallel. Results: These LMs had low mutational burden with hotspot PIK3CA mutations (n = 20) and NRAS (n = 5) mutations being most frequent, and mutually exclusive. All LM cases with Kaposi sarcoma-like (kaposiform) histology had NRAS mutations. One index patient presented with subacute abdominal pain and was diagnosed with a large retroperitoneal LM harboring a somatic PIK3CA gain-of-function mutation (H1047R). The patient achieved a rapid and durable radiologic complete response, as defined in RECIST1.1, to the PI3Kα inhibitor alpelisib within the context of a personalized N-of-1 clinical trial (NCT03941782). In translational correlative studies, canonical PI3Kα pathway activation was confirmed by immunohistochemistry and human LM-derived lymphatic endothelial cells carrying an allele with an activating mutation at the same locus were sensitive to alpelisib treatment in vitro, which was demonstrated by a concentration-dependent drop in measurable impedance, an assessment of cell status. Conclusions: Our findings establish that LM patients with conventional or kaposiform histology have distinct, yet targetable, driver mutations. Funding: R.P. and W.A. are supported by awards from the Levy-Longenbaugh Fund. S.G. is supported by awards from the Hugs for Brady Foundation. This work has been funded in part by the NCI Cancer Center Support Grants (CCSG; P30) to the University of Arizona Cancer Center (CA023074), the University of New Mexico Comprehensive Cancer Center (CA118100), and the Rutgers Cancer Institute of New Jersey (CA072720). B.K.M. was supported by National Science Foundation via Graduate Research Fellowship DGE-1143953. Clinical trial number: NCT03941782.


Asunto(s)
Antineoplásicos , Fosfatidilinositol 3-Quinasa Clase I , GTP Fosfohidrolasas , Linfangioma , Anomalías Linfáticas , Proteínas de la Membrana , Tiazoles , Antineoplásicos/farmacología , Antineoplásicos/uso terapéutico , Fosfatidilinositol 3-Quinasa Clase I/antagonistas & inhibidores , Fosfatidilinositol 3-Quinasa Clase I/genética , Fosfatidilinositol 3-Quinasa Clase I/metabolismo , Fosfatidilinositol 3-Quinasa Clase Ia/metabolismo , Células Endoteliales/efectos de los fármacos , Células Endoteliales/metabolismo , GTP Fosfohidrolasas/genética , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Inmunohistoquímica , Linfangioma/tratamiento farmacológico , Linfangioma/genética , Anomalías Linfáticas/tratamiento farmacológico , Anomalías Linfáticas/genética , Proteínas de la Membrana/genética , Mutación , Análisis de Secuencia de ADN , Tiazoles/farmacología , Tiazoles/uso terapéutico
10.
Genome Biol Evol ; 14(7)2022 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-35675379

RESUMEN

As both natural selection and population history can affect genome-wide patterns of variation, disentangling the contributions of each has remained as a major challenge in population genetics. We here discuss historical and recent progress towards this goal-highlighting theoretical and computational challenges that remain to be addressed, as well as inherent difficulties in dealing with model complexity and model violations-and offer thoughts on potentially fruitful next steps.


Asunto(s)
Variación Genética , Modelos Genéticos , Genética de Población , Genoma , Selección Genética
11.
Cell ; 185(11): 1842-1859.e18, 2022 05 26.
Artículo en Inglés | MEDLINE | ID: mdl-35561686

RESUMEN

The precise genetic origins of the first Neolithic farming populations in Europe and Southwest Asia, as well as the processes and the timing of their differentiation, remain largely unknown. Demogenomic modeling of high-quality ancient genomes reveals that the early farmers of Anatolia and Europe emerged from a multiphase mixing of a Southwest Asian population with a strongly bottlenecked western hunter-gatherer population after the last glacial maximum. Moreover, the ancestors of the first farmers of Europe and Anatolia went through a period of extreme genetic drift during their westward range expansion, contributing highly to their genetic distinctiveness. This modeling elucidates the demographic processes at the root of the Neolithic transition and leads to a spatial interpretation of the population history of Southwest Asia and Europe during the late Pleistocene and early Holocene.


Asunto(s)
Agricultores , Genoma , Agricultura , ADN Mitocondrial/genética , Europa (Continente) , Flujo Genético , Genómica , Historia Antigua , Migración Humana , Humanos
12.
Mol Ecol ; 31(9): 2511-2527, 2022 05.
Artículo en Inglés | MEDLINE | ID: mdl-35152496

RESUMEN

Largely understudied, mesophotic coral ecosystems lie below shallow reefs (at >30 m depth) and comprise ecologically distinct communities. Brooding reproductive modes appear to predominate among mesophotic-specialist corals and may limit genetic connectivity among populations. Using reduced representation genomic sequencing, we assessed spatial population genetic structure at 50 m depth in an ecologically important mesophotic-specialist species Agaricia grahamae, among locations in the Southern Caribbean. We also tested for hybridisation with the closely related (but depth-generalist) species Agaricia lamarcki, within their sympatric depth zone (50 m). In contrast to our expectations, no spatial genetic structure was detected between the reefs of Curaçao and Bonaire (~40 km apart) within A. grahamae. However, cryptic taxa were discovered within both taxonomic species, with those in A. lamarcki (incompletely) partitioned by depth and those in A. grahamae occurring sympatrically (at the same depth). Hybrid analyses and demographic modelling identified contemporary and historical gene flow among cryptic taxa, both within and between A. grahamae and A. lamarcki. These results (1) indicate that spatial connectivity and subsequent replenishment may be possible between islands of moderate geographic distances for A. grahamae, an ecologically important mesophotic species, (2) that cryptic taxa occur in the mesophotic zone and environmental selection along shallow to mesophotic depth gradients may drive divergence in depth-generalists such as A. lamarcki, and (3) highlight that gene flow links taxa within this relativity diverse Caribbean genus.


Asunto(s)
Antozoos , Animales , Antozoos/genética , Arrecifes de Coral , Ecosistema , Flujo Génico , Reproducción
13.
Mol Biol Evol ; 38(10): 4588-4602, 2021 09 27.
Artículo en Inglés | MEDLINE | ID: mdl-34043790

RESUMEN

The effect of a mutation on fitness may differ between populations depending on environmental and genetic context, but little is known about the factors that underlie such differences. To quantify genome-wide correlations in mutation fitness effects, we developed a novel concept called a joint distribution of fitness effects (DFE) between populations. We then proposed a new statistic w to measure the DFE correlation between populations. Using simulation, we showed that inferring the DFE correlation from the joint allele frequency spectrum is statistically precise and robust. Using population genomic data, we inferred DFE correlations of populations in humans, Drosophila melanogaster, and wild tomatoes. In these species, we found that the overall correlation of the joint DFE was inversely related to genetic differentiation. In humans and D. melanogaster, deleterious mutations had a lower DFE correlation than tolerated mutations, indicating a complex joint DFE. Altogether, the DFE correlation can be reliably inferred, and it offers extensive insight into the genetics of population divergence.


Asunto(s)
Drosophila melanogaster , Aptitud Genética , Animales , Drosophila melanogaster/genética , Frecuencia de los Genes , Genoma , Modelos Genéticos , Mutación
14.
Mol Ecol Resour ; 21(8): 2676-2688, 2021 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-33682305

RESUMEN

Inferring the frequency and mode of hybridization among closely related organisms is an important step for understanding the process of speciation and can help to uncover reticulated patterns of phylogeny more generally. Phylogenomic methods to test for the presence of hybridization come in many varieties and typically operate by leveraging expected patterns of genealogical discordance in the absence of hybridization. An important assumption made by these tests is that the data (genes or SNPs) are independent given the species tree. However, when the data are closely linked, it is especially important to consider their nonindependence. Recently, deep learning techniques such as convolutional neural networks (CNNs) have been used to perform population genetic inferences with linked SNPs coded as binary images. Here, we use CNNs for selecting among candidate hybridization scenarios using the tree topology (((P1 , P2 ), P3 ), Out) and a matrix of pairwise nucleotide divergence (dXY ) calculated in windows across the genome. Using coalescent simulations to train and independently test a neural network showed that our method, HyDe-CNN, was able to accurately perform model selection for hybridization scenarios across a wide breath of parameter space. We then used HyDe-CNN to test models of admixture in Heliconius butterflies, as well as comparing it to phylogeny-based introgression statistics. Given the flexibility of our approach, the dropping cost of long-read sequencing and the continued improvement of CNN architectures, we anticipate that inferences of hybridization using deep learning methods like ours will help researchers to better understand patterns of admixture in their study organisms.


Asunto(s)
Mariposas Diurnas , Animales , Mariposas Diurnas/genética , Cromosomas , Especiación Genética , Hibridación Genética , Redes Neurales de la Computación , Filogenia
15.
Mol Biol Evol ; 38(5): 2177-2178, 2021 05 04.
Artículo en Inglés | MEDLINE | ID: mdl-33480999

RESUMEN

dadi is a popular but computationally intensive program for inferring models of demographic history and natural selection from population genetic data. I show that running dadi on a Graphics Processing Unit can dramatically speed computation compared with the CPU implementation, with minimal user burden. Motivated by this speed increase, I also extended dadi to four- and five-population models. This functionality is available in dadi version 2.1.0, https://bitbucket.org/gutenkunstlab/dadi/.


Asunto(s)
Metodologías Computacionales , Genética de Población/métodos , Modelos Genéticos , Selección Genética , Programas Informáticos
16.
Elife ; 92020 06 23.
Artículo en Inglés | MEDLINE | ID: mdl-32573438

RESUMEN

The explosion in population genomic data demands ever more complex modes of analysis, and increasingly, these analyses depend on sophisticated simulations. Recent advances in population genetic simulation have made it possible to simulate large and complex models, but specifying such models for a particular simulation engine remains a difficult and error-prone task. Computational genetics researchers currently re-implement simulation models independently, leading to inconsistency and duplication of effort. This situation presents a major barrier to empirical researchers seeking to use simulations for power analyses of upcoming studies or sanity checks on existing genomic data. Population genetics, as a field, also lacks standard benchmarks by which new tools for inference might be measured. Here, we describe a new resource, stdpopsim, that attempts to rectify this situation. Stdpopsim is a community-driven open source project, which provides easy access to a growing catalog of published simulation models from a range of organisms and supports multiple simulation engine backends. This resource is available as a well-documented python library with a simple command-line interface. We share some examples demonstrating how stdpopsim can be used to systematically compare demographic inference methods, and we encourage a broader community of developers to contribute to this growing resource.


Asunto(s)
Genética de Población , Biblioteca Genómica , Modelos Genéticos , Animales , Arabidopsis/genética , Perros/genética , Drosophila melanogaster/genética , Escherichia coli/genética , Genética de Población/métodos , Genética de Población/organización & administración , Genoma/genética , Genoma Humano/genética , Humanos , Pongo abelii/genética
17.
Mol Biol Evol ; 37(7): 2124-2136, 2020 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-32068861

RESUMEN

Demographic inference using the site frequency spectrum (SFS) is a common way to understand historical events affecting genetic variation. However, most methods for estimating demography from the SFS assume random mating within populations, precluding these types of analyses in inbred populations. To address this issue, we developed a model for the expected SFS that includes inbreeding by parameterizing individual genotypes using beta-binomial distributions. We then take the convolution of these genotype probabilities to calculate the expected frequency of biallelic variants in the population. Using simulations, we evaluated the model's ability to coestimate demography and inbreeding using one- and two-population models across a range of inbreeding levels. We also applied our method to two empirical examples, American pumas (Puma concolor) and domesticated cabbage (Brassica oleracea var. capitata), inferring models both with and without inbreeding to compare parameter estimates and model fit. Our simulations showed that we are able to accurately coestimate demographic parameters and inbreeding even for highly inbred populations (F = 0.9). In contrast, failing to include inbreeding generally resulted in inaccurate parameter estimates in simulated data and led to poor model fit in our empirical analyses. These results show that inbreeding can have a strong effect on demographic inference, a pattern that was especially noticeable for parameters involving changes in population size. Given the importance of these estimates for informing practices in conservation, agriculture, and elsewhere, our method provides an important advancement for accurately estimating the demographic histories of these species.


Asunto(s)
Endogamia , Modelos Genéticos , Animales , Brassica/genética , Simulación por Computador , Polimorfismo de Nucleótido Simple , Dinámica Poblacional , Puma/genética
18.
NAR Genom Bioinform ; 2(1): lqaa004, 2020 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-32051931

RESUMEN

Detecting somatic mutations withins tumors is key to understanding treatment resistance, patient prognosis and tumor evolution. Mutations at low allelic frequency, those present in only a small portion of tumor cells, are particularly difficult to detect. Many algorithms have been developed to detect such mutations, but none models a key aspect of tumor biology. Namely, every tumor has its own profile of mutation types that it tends to generate. We present BATCAVE (Bayesian Analysis Tools for Context-Aware Variant Evaluation), an algorithm that first learns the individual tumor mutational profile and mutation rate then uses them in a prior for evaluating potential mutations. We also present an R implementation of the algorithm, built on the popular caller MuTect. Using simulations, we show that adding the BATCAVE algorithm to MuTect improves variant detection. It also improves the calibration of posterior probabilities, enabling more principled tradeoff between precision and recall. We also show that BATCAVE performs well on real data. Our implementation is computationally inexpensive and straightforward to incorporate into existing MuTect pipelines. More broadly, the algorithm can be added to other variant callers, and it can be extended to include additional biological features that affect mutation generation.

19.
Hum Genomics ; 12(1): 38, 2018 08 13.
Artículo en Inglés | MEDLINE | ID: mdl-30103832

RESUMEN

The past decade has seen major investment in genome-wide association studies (GWAS). Among the many goals of GWAS, a major one is to identify and motivate research on novel genes involved in complex human disease. To assess whether this goal is being met, we quantified the effect of GWAS on the overall distribution of biomedical research publications and on the subsequent publication history of genes newly associated with complex disease. We found that the historical skew of publications toward genes involved in Mendelian disease has not changed since the advent of GWAS. Genes newly implicated by GWAS in complex disease do experience additional publications compared to control genes, and they are more likely to become exceptionally studied. But the magnitude of both effects has declined over the past decade. Our results suggest that reforms to encourage follow-up studies may be needed for GWAS to most successfully guide biomedical research toward the molecular mechanisms underlying complex human disease.


Asunto(s)
Investigación Biomédica , Enfermedades Genéticas Congénitas/genética , Estudio de Asociación del Genoma Completo , Regulación de la Expresión Génica/genética , Humanos , Polimorfismo de Nucleótido Simple , Publicaciones
20.
Bioinformatics ; 34(10): 1713-1718, 2018 05 15.
Artículo en Inglés | MEDLINE | ID: mdl-29325072

RESUMEN

Motivation: Tumor genome sequencing offers great promise for guiding research and therapy, but spurious variant calls can arise from multiple sources. Mouse contamination can generate many spurious calls when sequencing patient-derived xenografts. Paralogous genome sequences can also generate spurious calls when sequencing any tumor. We developed a BLAST-based algorithm, Mouse And Paralog EXterminator (MAPEX), to identify and filter out spurious calls from both these sources. Results: When calling variants from xenografts, MAPEX has similar sensitivity and specificity to more complex algorithms. When applied to any tumor, MAPEX also automatically flags calls that potentially arise from paralogous sequences. Our implementation, mapexr, runs quickly and easily on a desktop computer. MAPEX is thus a useful addition to almost any pipeline for calling genetic variants in tumors. Availability and implementation: The mapexr package for R is available at https://github.com/bmannakee/mapexr under the MIT license. Contact: mannakee@email.arizona.edu or rgutenk@email.arizona.edu or eknudsen@email.arizona.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Variación Genética , Neoplasias/genética , Algoritmos , Animales , Xenoinjertos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Ratones , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...