Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 44
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 24(2)2023 03 19.
Artículo en Inglés | MEDLINE | ID: mdl-36869848

RESUMEN

Sampling circulating tumor DNA (ctDNA) using liquid biopsies offers clinically important benefits for monitoring cancer progression. A single ctDNA sample represents a mixture of shed tumor DNA from all known and unknown lesions within a patient. Although shedding levels have been suggested to hold the key to identifying targetable lesions and uncovering treatment resistance mechanisms, the amount of DNA shed by any one specific lesion is still not well characterized. We designed the Lesion Shedding Model (LSM) to order lesions from the strongest to the poorest shedding for a given patient. By characterizing the lesion-specific ctDNA shedding levels, we can better understand the mechanisms of shedding and more accurately interpret ctDNA assays to improve their clinical impact. We verified the accuracy of the LSM under controlled conditions using a simulation approach as well as testing the model on three cancer patients. The LSM obtained an accurate partial order of the lesions according to their assigned shedding levels in simulations and its accuracy in identifying the top shedding lesion was not significantly impacted by number of lesions. Applying LSM to three cancer patients, we found that indeed there were lesions that consistently shed more than others into the patients' blood. In two of the patients, the top shedding lesion was one of the only clinically progressing lesions at the time of biopsy suggesting a connection between high ctDNA shedding and clinical progression. The LSM provides a much needed framework with which to understand ctDNA shedding and to accelerate discovery of ctDNA biomarkers. The LSM source code has been available in the IBM BioMedSciAI Github (https://github.com/BiomedSciAI/Geno4SD).


Asunto(s)
ADN Tumoral Circulante , Neoplasias , Humanos , Biomarcadores de Tumor/genética , Neoplasias/tratamiento farmacológico , ADN de Neoplasias/genética , ADN Tumoral Circulante/genética , Biopsia , Mutación
2.
Blood ; 142(5): 421-433, 2023 08 03.
Artículo en Inglés | MEDLINE | ID: mdl-37146250

RESUMEN

Although BCL2 mutations are reported as later occurring events leading to venetoclax resistance, many other mechanisms of progression have been reported though remain poorly understood. Here, we analyze longitudinal tumor samples from 11 patients with disease progression while receiving venetoclax to characterize the clonal evolution of resistance. All patients tested showed increased in vitro resistance to venetoclax at the posttreatment time point. We found the previously described acquired BCL2-G101V mutation in only 4 of 11 patients, with 2 patients showing a very low variant allele fraction (0.03%-4.68%). Whole-exome sequencing revealed acquired loss(8p) in 4 of 11 patients, of which 2 patients also had gain (1q21.2-21.3) in the same cells affecting the MCL1 gene. In vitro experiments showed that CLL cells from the 4 patients with loss(8p) were more resistant to venetoclax than cells from those without it, with the cells from 2 patients also carrying gain (1q21.2-21.3) showing increased sensitivity to MCL1 inhibition. Progression samples with gain (1q21.2-21.3) were more susceptible to the combination of MCL1 inhibitor and venetoclax. Differential gene expression analysis comparing bulk RNA sequencing data from pretreatment and progression time points of all patients showed upregulation of proliferation, B-cell receptor (BCR), and NF-κB gene sets including MAPK genes. Cells from progression time points demonstrated upregulation of surface immunoglobulin M and higher pERK levels compared with those from the preprogression time point, suggesting an upregulation of BCR signaling that activates the MAPK pathway. Overall, our data suggest several mechanisms of acquired resistance to venetoclax in CLL that could pave the way for rationally designed combination treatments for patients with venetoclax-resistant CLL.


Asunto(s)
Antineoplásicos , Leucemia Linfocítica Crónica de Células B , Humanos , Antineoplásicos/farmacología , Compuestos Bicíclicos Heterocíclicos con Puentes/farmacología , Resistencia a Antineoplásicos/genética , Secuenciación del Exoma , Leucemia Linfocítica Crónica de Células B/tratamiento farmacológico , Leucemia Linfocítica Crónica de Células B/genética , Leucemia Linfocítica Crónica de Células B/patología , Proteína 1 de la Secuencia de Leucemia de Células Mieloides/genética , Proteínas Proto-Oncogénicas c-bcl-2
3.
BMC Genomics ; 22(Suppl 5): 518, 2021 Nov 16.
Artículo en Inglés | MEDLINE | ID: mdl-34789161

RESUMEN

BACKGROUND: All diseases containing genetic material undergo genetic evolution and give rise to heterogeneity including cancer and infection. Although these illnesses are biologically very different, the ability for phylogenetic retrodiction based on the genomic reads is common between them and thus tree-based principles and assumptions are shared. Just as the different frequencies of tumor genomic variants presupposes the existence of multiple tumor clones and provides a handle to computationally infer them, we postulate that the different variant frequencies in viral reads offers the means to infer multiple co-infecting sublineages. RESULTS: We present a common methodological framework to infer the phylogenomics from genomic data, be it reads of SARS-CoV-2 of multiple COVID-19 patients or bulk DNAseq of the tumor of a cancer patient. We describe the Concerti computational framework for inferring phylogenies in each of the two scenarios.To demonstrate the accuracy of the method, we reproduce some known results in both scenarios. We also make some additional discoveries. CONCLUSIONS: Concerti successfully extracts and integrates information from multi-point samples, enabling the discovery of clinically plausible phylogenetic trees that capture the heterogeneity known to exist both spatially and temporally. These models can have direct therapeutic implications by highlighting "birth" of clones that may harbor resistance mechanisms to treatment, "death" of subclones with drug targets, and acquisition of functionally pertinent mutations in clones that may have seemed clinically irrelevant. Specifically in this paper we uncover new potential parallel mutations in the evolution of the SARS-CoV-2 virus. In the context of cancer, we identify new clones harboring resistant mutations to therapy.


Asunto(s)
COVID-19 , Neoplasias , Células Clonales , Humanos , Mutación , Neoplasias/genética , Filogenia , SARS-CoV-2
4.
Int J Mol Sci ; 22(17)2021 Sep 06.
Artículo en Inglés | MEDLINE | ID: mdl-34502564

RESUMEN

Papillomaviruses (PVs) are a heterogeneous group of DNA viruses that can infect fish, birds, reptiles, and mammals. PVs infecting humans (HPVs) phylogenetically cluster into five genera (Alpha-, Beta-, Gamma-, Mu- and Nu-PV), with differences in tissue tropism and carcinogenicity. The evolutionary features associated with the divergence of Papillomaviridae are not well understood. Using a combination of k-mer distributions, genetic metrics, and phylogenetic algorithms, we sought to evaluate the characteristics and differences of Alpha-, Beta- and Gamma-PVs constituting the majority of HPV genomes. A total of 640 PVs including 442 HPV types, 27 non-human primate PV types, and 171 non-primate animal PV types were evaluated. Our analyses revealed the highest genetic diversity amongst Gamma-PVs compared to the Alpha and Beta PVs, suggesting reduced selective pressures on Gamma-PVs. Using a sequence alignment-free trimer (k = 3) phylogeny algorithm, we reconstructed a phylogeny that grouped most HPV types into a monophyletic clade that was further split into three branches similar to alignment-based classifications. Interestingly, a subset of low-risk Alpha HPVs (the species Alpha-2, 3, 4, and 14) split from other HPVs and were clustered with non-human primate PVs. Surprisingly, the trimer-constructed phylogeny grouped the Gamma-6 species types originally isolated from the cervicovaginal region with the main Alpha-HPV clade. These data indicate that characterization of papillomavirus heterogeneity via orthogonal approaches reveals novel insights into the biological understanding of HPV genomes.


Asunto(s)
ADN Viral/genética , Evolución Molecular , Variación Genética , Genoma Viral/genética , Papillomaviridae/genética , Algoritmos , Animales , Análisis por Conglomerados , Codón/genética , Islas de CpG/genética , Metilación de ADN , ADN Viral/análisis , Humanos , Papillomaviridae/clasificación , Papillomaviridae/fisiología , Infecciones por Papillomavirus/virología , Filogenia , Análisis de Secuencia de ADN/métodos
5.
PLoS Comput Biol ; 15(8): e1007332, 2019 08.
Artículo en Inglés | MEDLINE | ID: mdl-31469830

RESUMEN

The confluence of deep sequencing and powerful machine learning is providing an unprecedented peek at the darkest of the dark genomic matter, the non-coding genomic regions lacking any functional annotation. While deep sequencing uncovers rare tumor variants, the heterogeneity of the disease confounds the best of machine learning (ML) algorithms. Here we set out to answer if the dark-matter of the genome encompass signals that can distinguish the fine subtypes of disease that are otherwise genomically indistinguishable. We introduce a novel stochastic regularization, ReVeaL, that empowers ML to discriminate subtle cancer subtypes even from the same 'cell of origin'. Analogous to heritability, implicitly defined on whole genome, we use predictability (F1 score) definable on portions of the genome. In an effort to distinguish cancer subtypes using dark-matter DNA, we applied ReVeaL to a new WGS dataset from 727 patient samples with seven forms of hematological cancers and assessed the predictivity over several genomic regions including genic, non-dark, non-coding, non-genic, and dark. ReVeaL enabled improved discrimination of cancer subtypes for all segments of the genome. The non-genic, non-coding and dark-matter had the highest F1 scores, with dark-matter having the highest level of predictability. Based on ReVeaL's predictability of different genomic regions, dark-matter contains enough signal to significantly discriminate fine subtypes of disease. Hence, the agglomeration of rare variants, even in the hitherto unannotated and ill-understood regions of the genome, may play a substantial role in the disease etiology and deserve much more attention.


Asunto(s)
Algoritmos , ADN de Neoplasias/genética , Neoplasias Hematológicas/clasificación , Neoplasias Hematológicas/genética , Modelos Genéticos , Biología Computacional , Bases de Datos de Ácidos Nucleicos , Frecuencia de los Genes , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Aprendizaje Automático , Polimorfismo de Nucleótido Simple , ARN no Traducido/genética , Procesos Estocásticos , Secuenciación Completa del Genoma
6.
Bioinformatics ; 34(20): 3454-3460, 2018 10 15.
Artículo en Inglés | MEDLINE | ID: mdl-30204840

RESUMEN

Motivation: Although the nucleosome occupancy along a genome can be in part predicted by in vitro experiments, it has been recently observed that the chromatin organization presents important differences in vitro with respect to in vivo. Such differences mainly regard the hierarchical and regular structures of the nucleosome fiber, whose existence has long been assumed, and in part also observed in vitro, but that does not apparently occur in vivo. It is also well known that the DNA sequence has a role in determining the nucleosome occupancy. Therefore, an important issue is to understand if, and to what extent, the structural differences in the chromatin organization between in vitro and in vivo have a counterpart in terms of the underlying genomic sequences. Results: We present the first quantitative comparison between the in vitro and in vivo nucleosome maps of two model organisms (S. cerevisiae and C. elegans). The comparison is based on the construction of weighted k-mer dictionaries. Our findings show that there is a good level of sequence conservation between in vitro and in vivo in both the two organisms, in contrast to the abovementioned important differences in chromatin structural organization. Moreover, our results provide evidence that the two organisms predispose themselves differently, in terms of sequence composition and both in vitro and in vivo, for the nucleosome occupancy. This leads to the conclusion that, although the notion of a genome encoding for its own nucleosome occupancy is general, the intrinsic histone k-mer sequence preferences tend to be species-specific. Availability and implementation: The files containing the dictionaries and the main results of the analysis are available at http://math.unipa.it/rombo/material. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genoma , Análisis de Secuencia , Animales , Caenorhabditis elegans/genética , Cromatina/genética , Células Eucariotas , Histonas/genética , Nucleosomas , Saccharomyces cerevisiae/genética
7.
Ann Bot ; 124(4): 717-730, 2019 10 29.
Artículo en Inglés | MEDLINE | ID: mdl-31241131

RESUMEN

BACKGROUND AND AIMS: Perennial grasses are a global resource as forage, and for alternative uses in bioenergy and as raw materials for the processing industry. Marginal lands can be valuable for perennial biomass grass production, if perennial biomass grasses can cope with adverse abiotic environmental stresses such as drought and waterlogging. METHODS: In this study, two perennial grass species, reed canary grass (Phalaris arundinacea) and cocksfoot (Dactylis glomerata) were subjected to drought and waterlogging stress to study their responses for insights to improving environmental stress tolerance. Physiological responses were recorded, reference transcriptomes established and differential gene expression investigated between control and stress conditions. We applied a robust non-parametric method, RoDEO, based on rank ordering of transcripts to investigate differential gene expression. Furthermore, we extended and validated vRoDEO for comparing samples with varying sequencing depths. KEY RESULTS: This allowed us to identify expressed genes under drought and waterlogging whilst using only a limited number of RNA sequencing experiments. Validating the methodology, several differentially expressed candidate genes involved in the stage 3 step-wise scheme in detoxification and degradation of xenobiotics were recovered, while several novel stress-related genes classified as of unknown function were discovered. CONCLUSIONS: Reed canary grass is a species coping particularly well with flooding conditions, but this study adds novel information on how its transcriptome reacts under drought stress. We built extensive transcriptomes for the two investigated C3 species cocksfoot and reed canary grass under both extremes of water stress to provide a clear comparison amongst the two species to broaden our horizon for comparative studies, but further confirmation of the data would be ideal to obtain a more detailed picture.


Asunto(s)
Sequías , Phalaris , Biomasa , Dactylis , Estrés Fisiológico , Transcriptoma
9.
Oncologist ; 23(2): 179-185, 2018 02.
Artículo en Inglés | MEDLINE | ID: mdl-29158372

RESUMEN

BACKGROUND: Using next-generation sequencing (NGS) to guide cancer therapy has created challenges in analyzing and reporting large volumes of genomic data to patients and caregivers. Specifically, providing current, accurate information on newly approved therapies and open clinical trials requires considerable manual curation performed mainly by human "molecular tumor boards" (MTBs). The purpose of this study was to determine the utility of cognitive computing as performed by Watson for Genomics (WfG) compared with a human MTB. MATERIALS AND METHODS: One thousand eighteen patient cases that previously underwent targeted exon sequencing at the University of North Carolina (UNC) and subsequent analysis by the UNCseq informatics pipeline and the UNC MTB between November 7, 2011, and May 12, 2015, were analyzed with WfG, a cognitive computing technology for genomic analysis. RESULTS: Using a WfG-curated actionable gene list, we identified additional genomic events of potential significance (not discovered by traditional MTB curation) in 323 (32%) patients. The majority of these additional genomic events were considered actionable based upon their ability to qualify patients for biomarker-selected clinical trials. Indeed, the opening of a relevant clinical trial within 1 month prior to WfG analysis provided the rationale for identification of a new actionable event in nearly a quarter of the 323 patients. This automated analysis took <3 minutes per case. CONCLUSION: These results demonstrate that the interpretation and actionability of somatic NGS results are evolving too rapidly to rely solely on human curation. Molecular tumor boards empowered by cognitive computing could potentially improve patient care by providing a rapid, comprehensive approach for data analysis and consideration of up-to-date availability of clinical trials. IMPLICATIONS FOR PRACTICE: The results of this study demonstrate that the interpretation and actionability of somatic next-generation sequencing results are evolving too rapidly to rely solely on human curation. Molecular tumor boards empowered by cognitive computing can significantly improve patient care by providing a fast, cost-effective, and comprehensive approach for data analysis in the delivery of precision medicine. Patients and physicians who are considering enrollment in clinical trials may benefit from the support of such tools applied to genomic data.


Asunto(s)
Protocolos de Quimioterapia Combinada Antineoplásica/uso terapéutico , Neoplasias/tratamiento farmacológico , Biomarcadores de Tumor , Estudios de Casos y Controles , Terapia Combinada , Estudios de Seguimiento , Regulación Neoplásica de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Metástasis Linfática , Invasividad Neoplásica , Recurrencia Local de Neoplasia/tratamiento farmacológico , Recurrencia Local de Neoplasia/patología , Neoplasias/patología , Pronóstico , Estudios Retrospectivos , Tasa de Supervivencia
10.
Bioinformatics ; 32(7): 1048-56, 2016 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-26644417

RESUMEN

MOTIVATION: Simulating complex evolution scenarios of multiple populations is an important task for answering many basic questions relating to population genomics. Apart from the population samples, the underlying Ancestral Recombinations Graph (ARG) is an additional important means in hypothesis checking and reconstruction studies. Furthermore, complex simulations require a plethora of interdependent parameters making even the scenario-specification highly non-trivial. RESULTS: We present an algorithm SimRA that simulates generic multiple population evolution model with admixture. It is based on random graphs that improve dramatically in time and space requirements of the classical algorithm of single populations.Using the underlying random graphs model, we also derive closed forms of expected values of the ARG characteristics i.e., height of the graph, number of recombinations, number of mutations and population diversity in terms of its defining parameters. This is crucial in aiding the user to specify meaningful parameters for the complex scenario simulations, not through trial-and-error based on raw compute power but intelligent parameter estimation. To the best of our knowledge this is the first time closed form expressions have been computed for the ARG properties. We show that the expected values closely match the empirical values through simulations.Finally, we demonstrate that SimRA produces the ARG in compact forms without compromising any accuracy. We demonstrate the compactness and accuracy through extensive experiments. AVAILABILITY AND IMPLEMENTATION: SimRA (Simulation based on Random graph Algorithms) source, executable, user manual and sample input-output sets are available for downloading at: https://github.com/ComputationalGenomics/SimRA CONTACT: : parida@us.ibm.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Genética de Población , Filogenia , Genoma Humano , Humanos , Linaje , Grupos de Población , Recombinación Genética
11.
Bioinformatics ; 32(6): 835-42, 2016 03 15.
Artículo en Inglés | MEDLINE | ID: mdl-26576651

RESUMEN

MOTIVATION: Thanks to research spanning nearly 30 years, two major models have emerged that account for nucleosome organization in chromatin: statistical and sequence specific. The first is based on elegant, easy to compute, closed-form mathematical formulas that make no assumptions of the physical and chemical properties of the underlying DNA sequence. Moreover, they need no training on the data for their computation. The latter is based on some sequence regularities but, as opposed to the statistical model, it lacks the same type of closed-form formulas that, in this case, should be based on the DNA sequence only. RESULTS: We contribute to close this important methodological gap between the two models by providing three very simple formulas for the sequence specific one. They are all based on well-known formulas in Computer Science and Bioinformatics, and they give different quantifications of how complex a sequence is. In view of how remarkably well they perform, it is very surprising that measures of sequence complexity have not even been considered as candidates to close the mentioned gap. We provide experimental evidence that the intrinsic level of combinatorial organization and information-theoretic content of subsequences within a genome are strongly correlated to the level of DNA encoded nucleosome organization discovered by Kaplan et al Our results establish an important connection between the intrinsic complexity of subsequences in a genome and the intrinsic, i.e. DNA encoded, nucleosome organization of eukaryotic genomes. It is a first step towards a mathematical characterization of this latter 'encoding'. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. CONTACT: futro@us.ibm.com.


Asunto(s)
Genoma , Nucleosomas , Cromatina , ADN , Eucariontes
12.
Brief Bioinform ; 15(3): 390-406, 2014 May.
Artículo en Inglés | MEDLINE | ID: mdl-24347576

RESUMEN

High-throughput sequencing technologies produce large collections of data, mainly DNA sequences with additional information, requiring the design of efficient and effective methodologies for both their compression and storage. In this context, we first provide a classification of the main techniques that have been proposed, according to three specific research directions that have emerged from the literature and, for each, we provide an overview of the current techniques. Finally, to make this review useful to researchers and technicians applying the existing software and tools, we include a synopsis of the main characteristics of the described approaches, including details on their implementation and availability. Performance of the various methods is also highlighted, although the state of the art does not lend itself to a consistent and coherent comparison among all the methods presented here.


Asunto(s)
Biología Computacional/métodos , Compresión de Datos/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Algoritmos , Compresión de Datos/estadística & datos numéricos , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Metagenómica/estadística & datos numéricos , Alineación de Secuencia , Programas Informáticos
13.
Bioinformatics ; 31(18): 2939-46, 2015 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-26007227

RESUMEN

MOTIVATION: Information-theoretic and compositional analysis of biological sequences, in terms of k-mer dictionaries, has a well established role in genomic and proteomic studies. Much less so in epigenomics, although the role of k-mers in chromatin organization and nucleosome positioning is particularly relevant. Fundamental questions concerning the informational content and compositional structure of nucleosome favouring and disfavoring sequences with respect to their basic building blocks still remain open. RESULTS: We present the first analysis on the role of k-mers in the composition of nucleosome enriched and depleted genomic regions (NER and NDR for short) that is: (i) exhaustive and within the bounds dictated by the information-theoretic content of the sample sets we use and (ii) informative for comparative epigenomics. We analize four different organisms and we propose a paradigmatic formalization of k-mer dictionaries, providing two different and complementary views of the k-mers involved in NER and NDR. The first extends well known studies in this area, its comparative nature being its major merit. The second, very novel, brings to light the rich variety of k-mers involved in influencing nucleosome positioning, for which an initial classification in terms of clusters is also provided. Although such a classification offers many insights, the following deserves to be singled-out: short poly(dA:dT) tracts are reported in the literature as fundamental for nucleosome depletion, however a global quantitative look reveals that their role is much less prominent than one would expect based on previous studies. AVAILABILITY AND IMPLEMENTATION: Dictionaries, clusters and Supplementary Material are available online at http://math.unipa.it/rombo/epigenomics/. CONTACT: simona.rombo@unipa.it SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Ensamble y Desensamble de Cromatina/genética , Epigenómica , Nucleosomas/genética , Análisis de Secuencia de ADN/métodos , Animales , Genoma , Humanos
14.
BMC Genomics ; 15 Suppl 6: S18, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25573273

RESUMEN

BACKGROUND: Reed canary grass (Phalaris arundinacea) is an economically important forage and bioenergy grass of the temperate regions of the world. Despite its economic importance, it is lacking in public genomic data. We explore comparative exomics of the grass cultivars in the context of response to salt exposure. The limited data set poses challenges to the computational pipeline. METHODS: As a prerequisite for the comparative study, we generate the Phalaris reference transcriptome sequence, one of the first steps in addressing the issue of paucity of processed genomic data in this species. In addition, the differential expression (DE) and active-but-stable genes for salt stress conditions were analyzed by a novel method that was experimentally verified on human RNA-seq data. For the comparative exomics, we focus on the DE and stable genic regions, with respect to salt stress, of the genome. RESULTS AND CONCLUSIONS: In our comparative study, we find that phylogeny of the DE and stable genic regions of the Phalaris cultivars are distinct. At the same time we find the phylogeny of the entire expressed reference transcriptome matches the phylogeny of only the stable genes. Thus the behavior of the different cultivars is distinguished by the salt stress response. This is also reflected in the genomic distinctions in the DE genic regions. These observations have important implications in the choice of cultivars, and their breeding, for bio-energy fuels. Further, we identified genes that are representative of DE under salt stress and could provide vital clues in our understanding of the stress handling mechanisms in general.


Asunto(s)
Exoma , Genómica/métodos , Phalaris/genética , Tolerancia a la Sal/genética , Estrés Fisiológico/genética , Algoritmos , Perfilación de la Expresión Génica , Regulación de la Expresión Génica de las Plantas , Secuenciación de Nucleótidos de Alto Rendimiento , Fenotipo , Transcriptoma
15.
iScience ; 27(3): 109209, 2024 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-38439972

RESUMEN

GWAS focuses on significance loosing false positives; machine learning probes sub-significant features relying on predictivity. Yet, these are far from orthogonal. We sought to explore how these inform each other in sub-genome-wide significant situations to define relevance for predictive features. We introduce the SVM-based RubricOE that selects heavily cross-validated feature sets, and LDpred2 PRS as a strong contrast to SVM, to explore significance and predictivity. Our Alzheimer's test case notoriously lacks strong genetic signals except for few very strong phenotype-SNP associations, which suits the problem we are exploring. We found that the most significant SNPs among ML and PRS-selected SNPs captured most of the predictivity, while weaker associations tend also to contribute weakly to predictivity. SNPs with weak associations tend not to contribute to predictivity, but deletion of these features does not injure it. Significance provides a ranking that helps identify weakly predictive features.

16.
BMC Bioinformatics ; 14 Suppl 1: S6, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23369037

RESUMEN

BACKGROUND: Clustering is one of the most well known activities in scientific investigation and the object of research in many disciplines, ranging from statistics to computer science. Following Handl et al., it can be summarized as a three step process: (1) choice of a distance function; (2) choice of a clustering algorithm; (3) choice of a validation method. Although such a purist approach to clustering is hardly seen in many areas of science, genomic data require that level of attention, if inferences made from cluster analysis have to be of some relevance to biomedical research. RESULTS: A procedure is proposed for the assessment of the discriminative ability of a distance function. That is, the evaluation of the ability of a distance function to capture structure in a dataset. It is based on the introduction of a new external validation index, referred to as Balanced Misclassification Index (BMI, for short) and of a nontrivial modification of the well known Receiver Operating Curve (ROC, for short), which we refer to as Corrected ROC (CROC, for short). The main results are: (a) a quantitative and qualitative method to describe the intrinsic separation ability of a distance; (b) a quantitative method to assess the performance of a clustering algorithm in conjunction with the intrinsic separation ability of a distance function. The proposed procedure is more informative than the ones available in the literature due to the adopted tools. Indeed, the first one allows to map distances and clustering solutions as graphical objects on a plane, and gives information about the bias of the clustering algorithm with respect to a distance. The second tool is a new external validity index which shows similar performances with respect to the state of the art, but with more flexibility, allowing for a broader spectrum of applications. In fact, it allows not only to quantify the merit of each clustering solution but also to quantify the agglomerative or divisive errors due to the algorithm. CONCLUSIONS: The new methodology has been used to experimentally study three popular distance functions, namely, Euclidean distance d2, Pearson correlation dr and mutual information dMI. Based on the results of the experiments, we have that the Euclidean and Pearson correlation distances have a good intrinsic discrimination ability. Conversely, the mutual information distance does not seem to offer the same flexibility and versatility as the other two distances. Apparently, that is due to well known problems in its estimation. since it requires that a dataset must have a substantial number of features to be reliable. Nevertheless, taking into account such a fact, together with results presented in Priness et al., one receives an indication that dMI may be superior to the other distances considered in this study only in conjunction with clustering algorithms specifically designed for its use. In addition, it results that K-means, Average Link, and Complete link clustering algorithms are in most cases able to improve the discriminative ability of the distances considered in this study with respect to clustering. The methodology has a range of applicability that goes well beyond microarray data since it is independent of the nature of the input data. The only requirement is that the input data must have the same format of a "feature matrix". In particular it can be used to cluster ChIP-seq data.


Asunto(s)
Algoritmos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Análisis por Conglomerados , Transcriptoma
17.
BMC Genomics ; 14 Suppl 1: S10, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23369091

RESUMEN

BACKGROUND: Reconstructability of population history, from genetic information of extant individuals, is studied under a simulation setting. We do not address the issue of accuracy of the reconstruction algorithms: we assume the availability of the theoretical best algorithm. On the other hand, we focus on the fraction (1 - f) of the common genetic history that is irreconstructible or impenetrable. Thus the fraction, f, gives an upper bound on the extent of estimability. In other words, there exists no method that can reconstruct a fraction larger than f of the entire common genetic history. For the realization of such a study, we first define a natural measure of the amount of genetic history. Next, we use a population simulator (from literature) that has at least two features. Firstly, it has the capability of providing samples from different demographies, to effectively reflect reality. Secondly, it also provides the underlying relevant genetic history, captured in its entirety, where such a measure is applicable. Finally, to compute f, we use an information content measure of the relevant genetic history. The simulator of choice provided the following demographies: Africans, Europeans, Asians and Afro-Americans. RESULTS: We observe that higher the rate of recombination, lower the value of f, while f is invariant over varying mutation rates, in each of the demographies. The value of f increases with the number of samples, reaching a plateau and suggesting that in all the demographies at least about one-third of the relevant genetic history is impenetrable. The most surprising observation is that the the sum of the reconstructible history of the subsegments is indeed larger than the reconstructible history of the whole segment. In particular, longer the chromosomal segment, smaller the value of f, in all the demographies. CONCLUSIONS: We present the very first framework for measuring the fraction of the relevant genetic history of a population that is mathematically elusive. Our observed results on the tested demographies suggest that it may be better to aggregate the analysis of smaller chunks of chromosomal segments than fewer large chunks. Also, no matter the richness of samples in a population, at least one-third of the population genetic history is impenetrable. The framework also opens up possible new lines of investigation along the following. Given the characteristics of a population, possibly derived from observed extant individuals, to estimate the (1) optimal sample size and (2) optimal sequence length for the most informative analysis.


Asunto(s)
Genética de Población , Negro o Afroamericano/genética , Pueblo Asiatico/genética , Genoma Humano , Humanos , Tasa de Mutación , Recombinación Genética , Población Blanca/genética
18.
Bioinformatics ; 28(2): 282-3, 2012 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-22113082

RESUMEN

MOTIVATION: Recent advances in sequencing technology have resulted in the dramatic increase of sequencing data, which, in turn, requires efficient management of computational resources, such as computing time, memory requirements as well as prototyping of computational pipelines. RESULTS: We present GenomicTools, a flexible computational platform, comprising both a command-line set of tools and a C++ API, for the analysis and manipulation of high-throughput sequencing data such as DNA-seq, RNA-seq, ChIP-seq and MethylC-seq. GenomicTools implements a variety of mathematical operations between sets of genomic regions thereby enabling the prototyping of computational pipelines that can address a wide spectrum of tasks ranging from pre-processing and quality control to meta-analyses. Additionally, the GenomicTools platform is designed to analyze large datasets of any size by minimizing memory requirements. In practical applications, where comparable, GenomicTools outperforms existing tools in terms of both time and memory usage. AVAILABILITY: The GenomicTools platform (version 2.0.0) was implemented in C++. The source code, documentation, user manual, example datasets and scripts are available online at http://code.google.com/p/ibm-cbc-genomic-tools.


Asunto(s)
Genómica/métodos , Programas Informáticos , Biología Computacional/métodos , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos
19.
BMC Genet ; 14: 48, 2013 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-23742238

RESUMEN

BACKGROUND: We address the task of extracting accurate haplotypes from genotype data of individuals of large F1 populations for mapping studies. While methods for inferring parental haplotype assignments on large F1 populations exist in theory, these approaches do not work in practice at high levels of accuracy. RESULTS: We have designed iXora (Identifying crossovers and recombining alleles), a robust method for extracting reliable haplotypes of a mapping population, as well as parental haplotypes, that runs in linear time. Each allele in the progeny is assigned not just to a parent, but more precisely to a haplotype inherited from the parent. iXora shows an improvement of at least 15% in accuracy over similar systems in literature. Furthermore, iXora provides an easy-to-use, comprehensive environment for association studies and hypothesis checking in populations of related individuals. CONCLUSIONS: iXora provides detailed resolution in parental inheritance, along with the capability of handling very large populations, which allows for accurate haplotype extraction and trait association. iXora is available for non-commercial use from http://researcher.ibm.com/project/3430.


Asunto(s)
Haplotipos , Sitios de Carácter Cuantitativo , Intercambio Genético , Humanos , Recombinación Genética
20.
Blood Adv ; 7(9): 1929-1943, 2023 05 09.
Artículo en Inglés | MEDLINE | ID: mdl-36287227

RESUMEN

Covalent inhibitors of Bruton tyrosine kinase (BTK) have transformed the therapy of chronic lymphocytic leukemia (CLL), but continuous therapy has been complicated by the development of resistance. The most common resistance mechanism in patients whose disease progresses on covalent BTK inhibitors (BTKis) is a mutation in the BTK 481 cysteine residue to which the inhibitors bind covalently. Pirtobrutinib is a highly selective, noncovalent BTKi with substantial clinical activity in patients whose disease has progressed on covalent BTKi, regardless of BTK mutation status. Using in vitro ibrutinib-resistant models and cells from patients with CLL, we show that pirtobrutinib potently inhibits BTK-mediated functions including B-cell receptor (BCR) signaling, cell viability, and CCL3/CCL4 chemokine production in both BTK wild-type and C481S mutant CLL cells. We demonstrate that primary CLL cells from responding patients on the pirtobrutinib trial show reduced BCR signaling, cell survival, and CCL3/CCL4 chemokine secretion. At time of progression, these primary CLL cells show increasing resistance to pirtobrutinib in signaling inhibition, cell viability, and cytokine production. We employed longitudinal whole-exome sequencing on 2 patients whose disease progressed on pirtobrutinib and identified selection of alternative-site BTK mutations, providing clinical evidence that secondary BTK mutations lead to resistance to noncovalent BTKis.


Asunto(s)
Leucemia Linfocítica Crónica de Células B , Humanos , Agammaglobulinemia Tirosina Quinasa , Leucemia Linfocítica Crónica de Células B/tratamiento farmacológico , Leucemia Linfocítica Crónica de Células B/genética , Leucemia Linfocítica Crónica de Células B/metabolismo , Quimiocina CCL4/genética , Quimiocina CCL4/uso terapéutico , Resistencia a Antineoplásicos/genética , Inhibidores de Proteínas Quinasas/farmacología , Inhibidores de Proteínas Quinasas/uso terapéutico , Pirimidinas/farmacología , Pirimidinas/uso terapéutico , Mutación
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA