Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 51
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Cell ; 180(3): 568-584.e23, 2020 02 06.
Artículo en Inglés | MEDLINE | ID: mdl-31981491

RESUMEN

We present the largest exome sequencing study of autism spectrum disorder (ASD) to date (n = 35,584 total samples, 11,986 with ASD). Using an enhanced analytical framework to integrate de novo and case-control rare variation, we identify 102 risk genes at a false discovery rate of 0.1 or less. Of these genes, 49 show higher frequencies of disruptive de novo variants in individuals ascertained to have severe neurodevelopmental delay, whereas 53 show higher frequencies in individuals ascertained to have ASD; comparing ASD cases with mutations in these groups reveals phenotypic differences. Expressed early in brain development, most risk genes have roles in regulation of gene expression or neuronal communication (i.e., mutations effect neurodevelopmental and neurophysiological changes), and 13 fall within loci recurrently hit by copy number variants. In cells from the human cortex, expression of risk genes is enriched in excitatory and inhibitory neuronal lineages, consistent with multiple paths to an excitatory-inhibitory imbalance underlying ASD.


Asunto(s)
Trastorno Autístico/genética , Corteza Cerebral/crecimiento & desarrollo , Secuenciación del Exoma/métodos , Regulación del Desarrollo de la Expresión Génica , Neurobiología/métodos , Estudios de Casos y Controles , Linaje de la Célula , Estudios de Cohortes , Exoma , Femenino , Frecuencia de los Genes , Predisposición Genética a la Enfermedad , Humanos , Masculino , Mutación Missense , Neuronas/metabolismo , Fenotipo , Factores Sexuales , Análisis de la Célula Individual/métodos
2.
Genome Res ; 32(6): 1170-1182, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-35697522

RESUMEN

Accurate and efficient detection of copy number variants (CNVs) is of critical importance owing to their significant association with complex genetic diseases. Although algorithms that use whole-genome sequencing (WGS) data provide stable results with mostly valid statistical assumptions, copy number detection on whole-exome sequencing (WES) data shows comparatively lower accuracy. This is unfortunate as WES data are cost-efficient, compact, and relatively ubiquitous. The bottleneck is primarily due to the noncontiguous nature of the targeted capture: biases in targeted genomic hybridization, GC content, targeting probes, and sample batching during sequencing. Here, we present a novel deep learning model, DECoNT, which uses the matched WES and WGS data, and learns to correct the copy number variations reported by any off-the-shelf WES-based germline CNV caller. We train DECoNT on the 1000 Genomes Project data, and we show that we can efficiently triple the duplication call precision and double the deletion call precision of the state-of-the-art algorithms. We also show that our model consistently improves the performance independent of (1) sequencing technology, (2) exome capture kit, and (3) CNV caller. Using DECoNT as a universal exome CNV call polisher has the potential to improve the reliability of germline CNV detection on WES data sets.


Asunto(s)
Aprendizaje Profundo , Exoma , Algoritmos , Variaciones en el Número de Copia de ADN , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Reproducibilidad de los Resultados , Secuenciación del Exoma
3.
Bioinformatics ; 39(11)2023 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-37952175

RESUMEN

MOTIVATION: Online assessment of tumor characteristics during surgery is important and has the potential to establish an intra-operative surgeon feedback mechanism. With the availability of such feedback, surgeons could decide to be more liberal or conservative regarding the resection of the tumor. While there are methods to perform metabolomics-based tumor pathology prediction, their model complexity predictive performance is limited by the small dataset sizes. Furthermore, the information conveyed by the feedback provided on the tumor tissue could be improved both in terms of content and accuracy. RESULTS: In this study, we propose a metabolic pathway-informed deep learning model (PiDeeL) to perform survival analysis and pathology assessment based on metabolite concentrations. We show that incorporating pathway information into the model architecture substantially reduces parameter complexity and achieves better survival analysis and pathological classification performance. With these design decisions, we show that PiDeeL improves tumor pathology prediction performance of the state-of-the-art in terms of the Area Under the ROC Curve by 3.38% and the Area Under the Precision-Recall Curve by 4.06%. Similarly, with respect to the time-dependent concordance index (c-index), PiDeeL achieves better survival analysis performance (improvement of 4.3%) when compared to the state-of-the-art. Moreover, we show that importance analyses performed on input metabolite features as well as pathway-specific neurons of PiDeeL provide insights into tumor metabolism. We foresee that the use of this model in the surgery room will help surgeons adjust the surgery plan on the fly and will result in better prognosis estimates tailored to surgical procedures. AVAILABILITY AND IMPLEMENTATION: The code is released at https://github.com/ciceklab/PiDeeL. The data used in this study are released at https://zenodo.org/record/7228791.


Asunto(s)
Aprendizaje Profundo , Glioma , Humanos , Redes y Vías Metabólicas , Análisis de Supervivencia , Área Bajo la Curva
4.
Bioinformatics ; 40(5)2022 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-38718189

RESUMEN

MOTIVATION: Combination drug therapies are effective treatments for cancer. However, the genetic heterogeneity of the patients and exponentially large space of drug pairings pose significant challenges for finding the right combination for a specific patient. Current in silico prediction methods can be instrumental in reducing the vast number of candidate drug combinations. However, existing powerful methods are trained with cancer cell line gene expression data, which limits their applicability in clinical settings. While synergy measurements on cell line models are available at large scale, patient-derived samples are too few to train a complex model. On the other hand, patient-specific single-drug response data are relatively more available. RESULTS: In this work, we propose a deep learning framework, Personalized Deep Synergy Predictor (PDSP), that enables us to use the patient-specific single drug response data for customizing patient drug synergy predictions. PDSP is first trained to learn synergy scores of drug pairs and their single drug responses for a given cell line using drug structures and large scale cell line gene expression data. Then, the model is fine-tuned for patients with their patient gene expression data and associated single drug response measured on the patient ex vivo samples. In this study, we evaluate PDSP on data from three leukemia patients and observe that it improves the prediction accuracy by 27% compared to models trained on cancer cell line data. AVAILABILITY AND IMPLEMENTATION: PDSP is available at https://github.com/hikuru/PDSP.

5.
Bioinformatics ; 38(16): 3935-3941, 2022 08 10.
Artículo en Inglés | MEDLINE | ID: mdl-35762943

RESUMEN

MOTIVATION: Synthesizing genes to be expressed in other organisms is an essential tool in biotechnology. While the many-to-one mapping from codons to amino acids makes the genetic code degenerate, codon usage in a particular organism is not random either. This bias in codon use may have a remarkable effect on the level of gene expression. A number of measures have been developed to quantify a given codon sequence's strength to express a gene in a host organism. Codon optimization aims to find a codon sequence that will optimize one or more of these measures. Efficient computational approaches are needed since the possible number of codon sequences grows exponentially as the number of amino acids increases. RESULTS: We develop a unifying modeling approach for codon optimization. With our mathematical formulations based on graph/network representations of amino acid sequences, any combination of measures can be optimized in the same framework by finding a path satisfying additional limitations in an acyclic layered network. We tested our approach on bi-objectives commonly used in the literature, namely, Codon Pair Bias versus Codon Adaptation Index and Relative Codon Pair Bias versus Relative Codon Bias. However, our framework is general enough to handle any number of objectives concurrently with certain restrictions or preferences on the use of specific nucleotide sequences. We implemented our models using Python's Gurobi interface and showed the efficacy of our approach even for the largest proteins available. We also provided experimentation showing that highly expressed genes have objective values close to the optimized values in the bi-objective codon design problem. AVAILABILITY AND IMPLEMENTATION: http://alpersen.bilkent.edu.tr/NetworkCodon.zip. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aminoácidos , Código Genético , Codón , Secuencia de Aminoácidos
6.
Bioinformatics ; 38(4): 908-917, 2022 01 27.
Artículo en Inglés | MEDLINE | ID: mdl-34864867

RESUMEN

MOTIVATION: Genome-wide association studies show that variants in individual genomic loci alone are not sufficient to explain the heritability of complex, quantitative phenotypes. Many computational methods have been developed to address this issue by considering subsets of loci that can collectively predict the phenotype. This problem can be considered a challenging instance of feature selection in which the number of dimensions (loci that are screened) is much larger than the number of samples. While currently available methods can achieve decent phenotype prediction performance, they either do not scale to large datasets or have parameters that require extensive tuning. RESULTS: We propose a fast and simple algorithm, Macarons, to select a small, complementary subset of variants by avoiding redundant pairs that are likely to be in linkage disequilibrium. Our method features two interpretable parameters that control the time/performance trade-off without requiring parameter tuning. In our computational experiments, we show that Macarons consistently achieves similar or better prediction performance than state-of-the-art selection methods while having a simpler premise and being at least two orders of magnitude faster. Overall, Macarons can seamlessly scale to the human genome with ∼107 variants in a matter of minutes while taking the dependencies between the variants into account. AVAILABILITYAND IMPLEMENTATION: Macarons is available in Matlab and Python at https://github.com/serhan-yilmaz/macarons. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Estudio de Asociación del Genoma Completo , Humanos , Estudio de Asociación del Genoma Completo/métodos , Fenotipo , Desequilibrio de Ligamiento , Genoma Humano , Polimorfismo de Nucleótido Simple
7.
Bioinformatics ; 38(12): 3238-3244, 2022 06 13.
Artículo en Inglés | MEDLINE | ID: mdl-35512389

RESUMEN

MOTIVATION: Identification and removal of micro-scale residual tumor tissue during brain tumor surgery are key for survival in glioma patients. For this goal, High-Resolution Magic Angle Spinning Nuclear Magnetic Resonance (HRMAS NMR) spectroscopy-based assessment of tumor margins during surgery has been an effective method. However, the time required for metabolite quantification and the need for human experts such as a pathologist to be present during surgery are major bottlenecks of this technique. While machine learning techniques that analyze the NMR spectrum in an untargeted manner (i.e. using the full raw signal) have been shown to effectively automate this feedback mechanism, high dimensional and noisy structure of the NMR signal limits the attained performance. RESULTS: In this study, we show that identifying informative regions in the HRMAS NMR spectrum and using them for tumor margin assessment improves the prediction power. We use the spectra normalized with the ERETIC (electronic reference to access in vivo concentrations) method which uses an external reference signal to calibrate the HRMAS NMR spectrum. We train models to predict quantities of metabolites from annotated regions of this spectrum. Using these predictions for tumor margin assessment provides performance improvements up to 4.6% the Area Under the ROC Curve (AUC-ROC) and 2.8% the Area Under the Precision-Recall Curve (AUC-PR). We validate the importance of various tumor biomarkers and identify a novel region between 7.97 ppm and 8.09 ppm as a new candidate for a glioma biomarker. AVAILABILITY AND IMPLEMENTATION: The code is released at https://github.com/ciceklab/targeted_brain_tumor_margin_assessment. The data underlying this article are available in Zenodo, at https://doi.org/10.5281/zenodo.5781769. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Neoplasias Encefálicas , Glioma , Humanos , Neoplasias Encefálicas/diagnóstico por imagen , Neoplasias Encefálicas/cirugía , Metabolómica/métodos , Espectroscopía de Resonancia Magnética/métodos , Glioma/diagnóstico por imagen , Glioma/cirugía , Imagen por Resonancia Magnética
8.
Am J Hum Genet ; 102(6): 1031-1047, 2018 06 07.
Artículo en Inglés | MEDLINE | ID: mdl-29754769

RESUMEN

Analysis of de novo mutations (DNMs) from sequencing data of nuclear families has identified risk genes for many complex diseases, including multiple neurodevelopmental and psychiatric disorders. Most of these efforts have focused on mutations in protein-coding sequences. Evidence from genome-wide association studies (GWASs) strongly suggests that variants important to human diseases often lie in non-coding regions. Extending DNM-based approaches to non-coding sequences is challenging, however, because the functional significance of non-coding mutations is difficult to predict. We propose a statistical framework for analyzing DNMs from whole-genome sequencing (WGS) data. This method, TADA-Annotations (TADA-A), is a major advance of the TADA method we developed earlier for DNM analysis in coding regions. TADA-A is able to incorporate many functional annotations such as conservation and enhancer marks, to learn from data which annotations are informative of pathogenic mutations, and to combine both coding and non-coding mutations at the gene level to detect risk genes. It also supports meta-analysis of multiple DNM studies, while adjusting for study-specific technical effects. We applied TADA-A to WGS data of ∼300 autism-affected family trios across five studies and discovered several autism risk genes. The software is freely available for all research uses.


Asunto(s)
Mapeo Cromosómico , Predisposición Genética a la Enfermedad , Mutación/genética , Estadística como Asunto , Secuenciación Completa del Genoma , Trastorno Autístico/genética , Calibración , Elementos de Facilitación Genéticos/genética , Humanos , Anotación de Secuencia Molecular , Tasa de Mutación , Empalme del ARN/genética , Factores de Riesgo , Secuenciación del Exoma
9.
Bioinformatics ; 36(Suppl_2): i903-i910, 2020 12 30.
Artículo en Inglés | MEDLINE | ID: mdl-33381836

RESUMEN

MOTIVATION: Big data era in genomics promises a breakthrough in medicine, but sharing data in a private manner limit the pace of field. Widely accepted 'genomic data sharing beacon' protocol provides a standardized and secure interface for querying the genomic datasets. The data are only shared if the desired information (e.g. a certain variant) exists in the dataset. Various studies showed that beacons are vulnerable to re-identification (or membership inference) attacks. As beacons are generally associated with sensitive phenotype information, re-identification creates a significant risk for the participants. Unfortunately, proposed countermeasures against such attacks have failed to be effective, as they do not consider the utility of beacon protocol. RESULTS: In this study, for the first time, we analyze the mitigation effect of the kinship relationships among beacon participants against re-identification attacks. We argue that having multiple family members in a beacon can garble the information for attacks since a substantial number of variants are shared among kin-related people. Using family genomes from HapMap and synthetically generated datasets, we show that having one of the parents of a victim in the beacon causes (i) significant decrease in the power of attacks and (ii) substantial increase in the number of queries needed to confirm an individual's beacon membership. We also show how the protection effect attenuates when more distant relatives, such as grandparents are included alongside the victim. Furthermore, we quantify the utility loss due adding relatives and show that it is smaller compared with flipping based techniques.


Asunto(s)
Genómica , Difusión de la Información , Familia , Fenotipo , Humanos
10.
Bioinformatics ; 36(12): 3669-3679, 2020 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-32167530

RESUMEN

MOTIVATION: Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject's genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively. RESULTS: We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward-Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/CMU-SAFARI/Apollo. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento , Polonia , Análisis de Secuencia de ADN , Tecnología
11.
PLoS Comput Biol ; 16(11): e1008184, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-33175838

RESUMEN

Complete resection of the tumor is important for survival in glioma patients. Even if the gross total resection was achieved, left-over micro-scale tissue in the excision cavity risks recurrence. High Resolution Magic Angle Spinning Nuclear Magnetic Resonance (HRMAS NMR) technique can distinguish healthy and malign tissue efficiently using peak intensities of biomarker metabolites. The method is fast, sensitive and can work with small and unprocessed samples, which makes it a good fit for real-time analysis during surgery. However, only a targeted analysis for the existence of known tumor biomarkers can be made and this requires a technician with chemistry background, and a pathologist with knowledge on tumor metabolism to be present during surgery. Here, we show that we can accurately perform this analysis in real-time and can analyze the full spectrum in an untargeted fashion using machine learning. We work on a new and large HRMAS NMR dataset of glioma and control samples (n = 565), which are also labeled with a quantitative pathology analysis. Our results show that a random forest based approach can distinguish samples with tumor cells and controls accurately and effectively with a median AUC of 85.6% and AUPR of 93.4%. We also show that we can further distinguish benign and malignant samples with a median AUC of 87.1% and AUPR of 96.1%. We analyze the feature (peak) importance for classification to interpret the results of the classifier. We validate that known malignancy biomarkers such as creatine and 2-hydroxyglutarate play an important role in distinguishing tumor and normal cells and suggest new biomarker regions. The code is released at http://github.com/ciceklab/HRMAS_NC.


Asunto(s)
Neoplasias Encefálicas/diagnóstico por imagen , Glioma/diagnóstico por imagen , Aprendizaje Automático , Espectroscopía de Resonancia Magnética/métodos , Márgenes de Escisión , Algoritmos , Biopsia , Neoplasias Encefálicas/patología , Neoplasias Encefálicas/cirugía , Estudios de Cohortes , Glioma/patología , Glioma/cirugía , Humanos , Periodo Intraoperatorio
12.
Hum Mutat ; 41(8): e7-e45, 2020 08.
Artículo en Inglés | MEDLINE | ID: mdl-32579787

RESUMEN

The last decade has proven that amyotrophic lateral sclerosis (ALS) is clinically and genetically heterogeneous, and that the genetic component in sporadic cases might be stronger than expected. This study investigates 1,200 patients to revisit ALS in the ethnically heterogeneous yet inbred Turkish population. Familial ALS (fALS) accounts for 20% of our cases. The rates of consanguinity are 30% in fALS and 23% in sporadic ALS (sALS). Major ALS genes explained the disease cause in only 35% of fALS, as compared with ~70% in Europe and North America. Whole exome sequencing resulted in a discovery rate of 42% (53/127). Whole genome analyses in 623 sALS cases and 142 population controls, sequenced within Project MinE, revealed well-established fALS gene variants, solidifying the concept of incomplete penetrance in ALS. Genome-wide association studies (GWAS) with whole genome sequencing data did not indicate a new risk locus. Coupling GWAS with a coexpression network of disease-associated candidates, points to a significant enrichment for cell cycle- and division-related genes. Within this network, literature text-mining highlights DECR1, ATL1, HDAC2, GEMIN4, and HNRNPA3 as important genes. Finally, information on ALS-related gene variants in the Turkish cohort sequenced within Project MinE was compiled in the GeNDAL variant browser (www.gendal.org).


Asunto(s)
Esclerosis Amiotrófica Lateral/genética , Bases de Datos Genéticas , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Internet , Fenotipo , Turquía , Secuenciación Completa del Genoma
13.
J Proteome Res ; 19(1): 292-299, 2020 01 03.
Artículo en Inglés | MEDLINE | ID: mdl-31679342

RESUMEN

Meningiomas are in most cases benign brain tumors. The WHO 2016 classification defines three grades of meningiomas. This classification had a prognosis value because grade III meningiomas have a worse prognosis value compared to grades I and II meningiomas. However, some benign or atypical meningiomas can have a clinical aggressive behavior. There are currently no reliable markers which allow distinguishing between the meningiomas with a good prognosis and those which may recur. High-resolution magic angle spinning (HRMAS) spectrometry is a noninvasive method able to determine the metabolite profile of a tissue sample. We retrospectively analyzed 62 meningioma samples by using HRMAS spectrometry (43 metabolites). We described a metabolic profile defined by a high concentration for acetate, threonine, N-acetyl-lysine, hydroxybutyrate, myoinositol, ascorbate, scylloinositol, and total choline and a low concentration for aspartate, glucose, isoleucine, valine, adenosine, arginine, and alanine. This metabolomic signature was associated with poor prognosis histological markers [Ki-67 ≥ 40%, high histological grade and negative progesterone receptor (PR) expression]. We also described a similar metabolomic spectrum between grade III and grade I meningiomas. Moreover, all grade I meningiomas with a low Ki-67 expression and a positive PR expression did not have the same metabolomic profile. Metabolomic analysis could be used to determine an aggressive meningioma in order to discuss a personalized treatment. Further studies are needed to confirm these results and to correlate this metabolic profile with survival data.


Asunto(s)
Neoplasias Encefálicas/metabolismo , Espectroscopía de Resonancia Magnética/métodos , Meningioma/metabolismo , Aminoácidos/análisis , Aminoácidos/metabolismo , Biopsia , Neoplasias Encefálicas/cirugía , Proliferación Celular , Humanos , Antígeno Ki-67/metabolismo , Meningioma/patología , Meningioma/cirugía , Metabolómica/métodos
14.
Bioinformatics ; 35(18): 3433-3440, 2019 09 15.
Artículo en Inglés | MEDLINE | ID: mdl-30759247

RESUMEN

MOTIVATION: Whole exome sequencing (WES) studies for autism spectrum disorder (ASD) could identify only around six dozen risk genes to date because the genetic architecture of the disorder is highly complex. To speed the gene discovery process up, a few network-based ASD gene discovery algorithms were proposed. Although these methods use static gene interaction networks, functional clustering of genes is bound to evolve during neurodevelopment and disruptions are likely to have a cascading effect on the future associations. Thus, approaches that disregard the dynamic nature of neurodevelopment are limited. RESULTS: Here, we present a spatio-temporal gene discovery algorithm, which leverages information from evolving gene co-expression networks of neurodevelopment. The algorithm solves a prize-collecting Steiner forest-based problem on co-expression networks, adapted to model neurodevelopment and transfer information from precursor neurodevelopmental windows. The decisions made by the algorithm can be traced back, adding interpretability to the results. We apply the algorithm on ASD WES data of 3871 samples and identify risk clusters using BrainSpan co-expression networks of early- and mid-fetal periods. On an independent dataset, we show that incorporation of the temporal dimension increases the predictive power: predicted clusters are hit more and show higher enrichment in ASD-related functions compared with the state-of-the-art. AVAILABILITY AND IMPLEMENTATION: The code is available at http://ciceklab.cs.bilkent.edu.tr/st-steiner. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Estudios de Asociación Genética , Algoritmos , Trastorno del Espectro Autista , Análisis por Conglomerados , Redes Reguladoras de Genes , Humanos , Programas Informáticos
15.
Bioinformatics ; 35(3): 365-371, 2019 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-30052749

RESUMEN

Motivation: Genomic data-sharing beacons aim to provide a secure, easy to implement and standardized interface for data-sharing by only allowing yes/no queries on the presence of specific alleles in the dataset. Previously deemed secure against re-identification attacks, beacons were shown to be vulnerable despite their stringent policy. Recent studies have demonstrated that it is possible to determine whether the victim is in the dataset, by repeatedly querying the beacon for his/her single-nucleotide polymorphisms (SNPs). Here, we propose a novel re-identification attack and show that the privacy risk is more serious than previously thought. Results: Using the proposed attack, even if the victim systematically hides informative SNPs, it is possible to infer the alleles at positions of interest as well as the beacon query results with very high confidence. Our method is based on the fact that alleles at different loci are not necessarily independent. We use linkage disequilibrium and a high-order Markov chain-based algorithm for inference. We show that in a simulated beacon with 65 individuals from the European population, we can infer membership of individuals with 95% confidence with only 5 queries, even when SNPs with MAF <0.05 are hidden. We need less than 0.5% of the number of queries that existing works require, to determine beacon membership under the same conditions. We show that countermeasures such as hiding certain parts of the genome or setting a query budget for the user would fail to protect the privacy of the participants. Availability and implementation: Software is available at http://ciceklab.cs.bilkent.edu.tr/beacon_attack. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genómica , Difusión de la Información , Desequilibrio de Ligamiento , Algoritmos , Alelos , Femenino , Humanos , Masculino , Polimorfismo de Nucleótido Simple , Programas Informáticos
16.
Nucleic Acids Res ; 46(21): e125, 2018 11 30.
Artículo en Inglés | MEDLINE | ID: mdl-30124947

RESUMEN

Choosing whether to use second or third generation sequencing platforms can lead to trade-offs between accuracy and read length. Several types of studies require long and accurate reads. In such cases researchers often combine both technologies and the erroneous long reads are corrected using the short reads. Current approaches rely on various graph or alignment based techniques and do not take the error profile of the underlying technology into account. Efficient machine learning algorithms that address these shortcomings have the potential to achieve more accurate integration of these two technologies. We propose Hercules, the first machine learning-based long read error correction algorithm. Hercules models every long read as a profile Hidden Markov Model with respect to the underlying platform's error profile. The algorithm learns a posterior transition/emission probability distribution for each long read to correct errors in these reads. We show on two DNA-seq BAC clones (CH17-157L1 and CH17-227A2) that Hercules-corrected reads have the highest mapping rate among all competing algorithms and have the highest accuracy when the breadth of coverage is high. On a large human CHM1 cell line WGS data set, Hercules is one of the few scalable algorithms; and among those, it achieves the highest accuracy.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Aprendizaje Automático , Programas Informáticos , Humanos , Reproducibilidad de los Resultados
17.
Metabolomics ; 15(5): 69, 2019 04 29.
Artículo en Inglés | MEDLINE | ID: mdl-31037432

RESUMEN

INTRODUCTION: The identification of frequent acquired mutations shows that patients with oligodendrogliomas have divergent biology with differing prognoses regardless of histological classification. A better understanding of molecular features as well as their metabolic pathways is essential. OBJECTIVES: The aim of this study was to examine the relationship between the tumor metabolome, six genomic aberrations (isocitrate dehydrogenase1 [IDH1] mutation, 1p/19q codeletion, tumor protein p53 [TP53] mutation, O6-methylguanin-DNA methyltransferase [MGMT] promoter methylation, epidermal growth factor receptor [EGFR] amplification, phosphate and tensin homolog [PTEN] methylation), and the patients' survival time. METHODS: We applied 1H high-resolution magic-angle spinning (HRMAS) nuclear magnetic resonance (NMR) spectroscopy to 72 resected oligodendrogliomas. RESULTS: The presence of IDH1, TP53, 1p19q codeletion, MGMT promoter methylation reduced the relative risk of death, whereas PTEN methylation and EGFR amplification were associated with poor prognosis. Increased concentration of 2-hydroxyglutarate (2HG), N-acetyl-aspartate (NAA), myo-inositol and the glycerophosphocholine/phosphocholine (GPC/PC) ratio were good prognostic factors. Increasing the concentration of serine, glycine, glutamate and alanine led to an increased relative risk of death. CONCLUSION: HRMAS NMR spectroscopy provides accurate information on the metabolomics of oligodendrogliomas, making it possible to find new biomarkers indicative of survival. It enables rapid characterization of intact tissue and could be used as an intraoperative method.


Asunto(s)
Metabolómica , Oligodendroglioma/genética , Oligodendroglioma/metabolismo , Adulto , Humanos , Espectroscopía de Resonancia Magnética , Índice de Severidad de la Enfermedad , Análisis de Supervivencia , Factores de Tiempo
18.
HPB (Oxford) ; 21(10): 1354-1361, 2019 10.
Artículo en Inglés | MEDLINE | ID: mdl-30914156

RESUMEN

BACKGROUND: Posthepatectomy liver failure (PHLF) is the main limitation to extending liver resection but its pathophysiology is not yet fully understood. The aim of the study was to describe the metabolic adaptations that occur with PHLF. METHODS: A retrospective study of 82 patients using nuclear magnetic resonance metabolomics to identify and quantify intra-hepatic metabolites was performed. The metabolite levels were compared using metabolic network analysis ADEMA between fatal PHLF (FLF) and non fatal PHLF and according to PHLF/ACLF grading. RESULTS: Metabolomic profiles were significantly different between patients presenting FLF and non FLF or grade 3 ACLF versus < grade 3 ACLF. In the patients undergoing hepatectomy, valine, alanine and glycerophosphocholine were identified as powerful biomarkers to predict FLF (AUROC 0.806, 0.802 and 0.856 respectively). Network analysis showed an activation of aerobic glycolysis with glutaminolysis as observed in highly proliferating systems. Inversely, ACLF3 showed deprivation of glucose and lactate compared to lower ACLF grade. CONCLUSION: Clinical andbiological severity of ACLF and PHLF correlate with specific metabolic adaptations. Metabolomics can predict fatal liver failure after hepatectomy and underline significant differences in the metabolic patterns of ACLF and PHLF.


Asunto(s)
Insuficiencia Hepática Crónica Agudizada/metabolismo , Biomarcadores/metabolismo , Hepatectomía/efectos adversos , Hígado/metabolismo , Metabolómica/métodos , Complicaciones Posoperatorias , Insuficiencia Hepática Crónica Agudizada/diagnóstico , Insuficiencia Hepática Crónica Agudizada/etiología , Anciano , Alanina/metabolismo , Carcinoma Hepatocelular/cirugía , Femenino , Estudios de Seguimiento , Humanos , Hígado/patología , Neoplasias Hepáticas/cirugía , Espectroscopía de Resonancia Magnética , Masculino , Persona de Mediana Edad , Hidrolasas Diéster Fosfóricas/metabolismo , Curva ROC , Estudios Retrospectivos , Factores de Riesgo , Valina/metabolismo
19.
Bioinformatics ; 30(12): i175-84, 2014 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-24931981

RESUMEN

MOTIVATION: Discovering the transcriptional regulatory architecture of the metabolism has been an important topic to understand the implications of transcriptional fluctuations on metabolism. The reporter algorithm (RA) was proposed to determine the hot spots in metabolic networks, around which transcriptional regulation is focused owing to a disease or a genetic perturbation. Using a z-score-based scoring scheme, RA calculates the average statistical change in the expression levels of genes that are neighbors to a target metabolite in the metabolic network. The RA approach has been used in numerous studies to analyze cellular responses to the downstream genetic changes. In this article, we propose a mutual information-based multivariate reporter algorithm (MIRA) with the goal of eliminating the following problems in detecting reporter metabolites: (i) conventional statistical methods suffer from small sample sizes, (ii) as z-score ranges from minus to plus infinity, calculating average scores can lead to canceling out opposite effects and (iii) analyzing genes one by one, then aggregating results can lead to information loss. MIRA is a multivariate and combinatorial algorithm that calculates the aggregate transcriptional response around a metabolite using mutual information. We show that MIRA's results are biologically sound, empirically significant and more reliable than RA. RESULTS: We apply MIRA to gene expression analysis of six knockout strains of Escherichia coli and show that MIRA captures the underlying metabolic dynamics of the switch from aerobic to anaerobic respiration. We also apply MIRA to an Autism Spectrum Disorder gene expression dataset. Results indicate that MIRA reports metabolites that highly overlap with recently found metabolic biomarkers in the autism literature. Overall, MIRA is a promising algorithm for detecting metabolic drug targets and understanding the relation between gene expression and metabolic activity. AVAILABILITY AND IMPLEMENTATION: The code is implemented in C# language using .NET framework. Project is available upon request.


Asunto(s)
Algoritmos , Redes y Vías Metabólicas/genética , Niño , Trastornos Generalizados del Desarrollo Infantil/genética , Trastornos Generalizados del Desarrollo Infantil/metabolismo , Escherichia coli/genética , Escherichia coli/metabolismo , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Humanos , Transcripción Genética
20.
PLoS Comput Biol ; 9(1): e1002859, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23341761

RESUMEN

Metabolomics is a relatively new "omics" platform, which analyzes a discrete set of metabolites detected in bio-fluids or tissue samples of organisms. It has been used in a diverse array of studies to detect biomarkers and to determine activity rates for pathways based on changes due to disease or drugs. Recent improvements in analytical methodology and large sample throughput allow for creation of large datasets of metabolites that reflect changes in metabolic dynamics due to disease or a perturbation in the metabolic network. However, current methods of comprehensive analyses of large metabolic datasets (metabolomics) are limited, unlike other "omics" approaches where complex techniques for analyzing coexpression/coregulation of multiple variables are applied. This paper discusses the shortcomings of current metabolomics data analysis techniques, and proposes a new multivariate technique (ADEMA) based on mutual information to identify expected metabolite level changes with respect to a specific condition. We show that ADEMA better predicts De Novo Lipogenesis pathway metabolite level changes in samples with Cystic Fibrosis (CF) than prediction based on the significance of individual metabolite level changes. We also applied ADEMA's classification scheme on three different cohorts of CF and wildtype mice. ADEMA was able to predict whether an unknown mouse has a CF or a wildtype genotype with 1.0, 0.84, and 0.9 accuracy for each respective dataset. ADEMA results had up to 31% higher accuracy as compared to other classification algorithms. In conclusion, ADEMA advances the state-of-the-art in metabolomics analysis, by providing accurate and interpretable classification results.


Asunto(s)
Algoritmos , Metabolómica , Animales , Lipogénesis , Ratones , Modelos Teóricos , Análisis Multivariante
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA