Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 51
Filtrar
1.
Cell ; 180(3): 568-584.e23, 2020 02 06.
Artigo em Inglês | MEDLINE | ID: mdl-31981491

RESUMO

We present the largest exome sequencing study of autism spectrum disorder (ASD) to date (n = 35,584 total samples, 11,986 with ASD). Using an enhanced analytical framework to integrate de novo and case-control rare variation, we identify 102 risk genes at a false discovery rate of 0.1 or less. Of these genes, 49 show higher frequencies of disruptive de novo variants in individuals ascertained to have severe neurodevelopmental delay, whereas 53 show higher frequencies in individuals ascertained to have ASD; comparing ASD cases with mutations in these groups reveals phenotypic differences. Expressed early in brain development, most risk genes have roles in regulation of gene expression or neuronal communication (i.e., mutations effect neurodevelopmental and neurophysiological changes), and 13 fall within loci recurrently hit by copy number variants. In cells from the human cortex, expression of risk genes is enriched in excitatory and inhibitory neuronal lineages, consistent with multiple paths to an excitatory-inhibitory imbalance underlying ASD.


Assuntos
Transtorno Autístico/genética , Córtex Cerebral/crescimento & desenvolvimento , Sequenciamento do Exoma/métodos , Regulação da Expressão Gênica no Desenvolvimento , Neurobiologia/métodos , Estudos de Casos e Controles , Linhagem da Célula , Estudos de Coortes , Exoma , Feminino , Frequência do Gene , Predisposição Genética para Doença , Humanos , Masculino , Mutação de Sentido Incorreto , Neurônios/metabolismo , Fenótipo , Fatores Sexuais , Análise de Célula Única/métodos
2.
Genome Res ; 32(6): 1170-1182, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35697522

RESUMO

Accurate and efficient detection of copy number variants (CNVs) is of critical importance owing to their significant association with complex genetic diseases. Although algorithms that use whole-genome sequencing (WGS) data provide stable results with mostly valid statistical assumptions, copy number detection on whole-exome sequencing (WES) data shows comparatively lower accuracy. This is unfortunate as WES data are cost-efficient, compact, and relatively ubiquitous. The bottleneck is primarily due to the noncontiguous nature of the targeted capture: biases in targeted genomic hybridization, GC content, targeting probes, and sample batching during sequencing. Here, we present a novel deep learning model, DECoNT, which uses the matched WES and WGS data, and learns to correct the copy number variations reported by any off-the-shelf WES-based germline CNV caller. We train DECoNT on the 1000 Genomes Project data, and we show that we can efficiently triple the duplication call precision and double the deletion call precision of the state-of-the-art algorithms. We also show that our model consistently improves the performance independent of (1) sequencing technology, (2) exome capture kit, and (3) CNV caller. Using DECoNT as a universal exome CNV call polisher has the potential to improve the reliability of germline CNV detection on WES data sets.


Assuntos
Aprendizado Profundo , Exoma , Algoritmos , Variações do Número de Cópias de DNA , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Reprodutibilidade dos Testes , Sequenciamento do Exoma
3.
Bioinformatics ; 39(11)2023 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-37952175

RESUMO

MOTIVATION: Online assessment of tumor characteristics during surgery is important and has the potential to establish an intra-operative surgeon feedback mechanism. With the availability of such feedback, surgeons could decide to be more liberal or conservative regarding the resection of the tumor. While there are methods to perform metabolomics-based tumor pathology prediction, their model complexity predictive performance is limited by the small dataset sizes. Furthermore, the information conveyed by the feedback provided on the tumor tissue could be improved both in terms of content and accuracy. RESULTS: In this study, we propose a metabolic pathway-informed deep learning model (PiDeeL) to perform survival analysis and pathology assessment based on metabolite concentrations. We show that incorporating pathway information into the model architecture substantially reduces parameter complexity and achieves better survival analysis and pathological classification performance. With these design decisions, we show that PiDeeL improves tumor pathology prediction performance of the state-of-the-art in terms of the Area Under the ROC Curve by 3.38% and the Area Under the Precision-Recall Curve by 4.06%. Similarly, with respect to the time-dependent concordance index (c-index), PiDeeL achieves better survival analysis performance (improvement of 4.3%) when compared to the state-of-the-art. Moreover, we show that importance analyses performed on input metabolite features as well as pathway-specific neurons of PiDeeL provide insights into tumor metabolism. We foresee that the use of this model in the surgery room will help surgeons adjust the surgery plan on the fly and will result in better prognosis estimates tailored to surgical procedures. AVAILABILITY AND IMPLEMENTATION: The code is released at https://github.com/ciceklab/PiDeeL. The data used in this study are released at https://zenodo.org/record/7228791.


Assuntos
Aprendizado Profundo , Glioma , Humanos , Redes e Vias Metabólicas , Análise de Sobrevida , Área Sob a Curva
4.
Bioinformatics ; 40(5)2022 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-38718189

RESUMO

MOTIVATION: Combination drug therapies are effective treatments for cancer. However, the genetic heterogeneity of the patients and exponentially large space of drug pairings pose significant challenges for finding the right combination for a specific patient. Current in silico prediction methods can be instrumental in reducing the vast number of candidate drug combinations. However, existing powerful methods are trained with cancer cell line gene expression data, which limits their applicability in clinical settings. While synergy measurements on cell line models are available at large scale, patient-derived samples are too few to train a complex model. On the other hand, patient-specific single-drug response data are relatively more available. RESULTS: In this work, we propose a deep learning framework, Personalized Deep Synergy Predictor (PDSP), that enables us to use the patient-specific single drug response data for customizing patient drug synergy predictions. PDSP is first trained to learn synergy scores of drug pairs and their single drug responses for a given cell line using drug structures and large scale cell line gene expression data. Then, the model is fine-tuned for patients with their patient gene expression data and associated single drug response measured on the patient ex vivo samples. In this study, we evaluate PDSP on data from three leukemia patients and observe that it improves the prediction accuracy by 27% compared to models trained on cancer cell line data. AVAILABILITY AND IMPLEMENTATION: PDSP is available at https://github.com/hikuru/PDSP.

5.
Bioinformatics ; 38(16): 3935-3941, 2022 08 10.
Artigo em Inglês | MEDLINE | ID: mdl-35762943

RESUMO

MOTIVATION: Synthesizing genes to be expressed in other organisms is an essential tool in biotechnology. While the many-to-one mapping from codons to amino acids makes the genetic code degenerate, codon usage in a particular organism is not random either. This bias in codon use may have a remarkable effect on the level of gene expression. A number of measures have been developed to quantify a given codon sequence's strength to express a gene in a host organism. Codon optimization aims to find a codon sequence that will optimize one or more of these measures. Efficient computational approaches are needed since the possible number of codon sequences grows exponentially as the number of amino acids increases. RESULTS: We develop a unifying modeling approach for codon optimization. With our mathematical formulations based on graph/network representations of amino acid sequences, any combination of measures can be optimized in the same framework by finding a path satisfying additional limitations in an acyclic layered network. We tested our approach on bi-objectives commonly used in the literature, namely, Codon Pair Bias versus Codon Adaptation Index and Relative Codon Pair Bias versus Relative Codon Bias. However, our framework is general enough to handle any number of objectives concurrently with certain restrictions or preferences on the use of specific nucleotide sequences. We implemented our models using Python's Gurobi interface and showed the efficacy of our approach even for the largest proteins available. We also provided experimentation showing that highly expressed genes have objective values close to the optimized values in the bi-objective codon design problem. AVAILABILITY AND IMPLEMENTATION: http://alpersen.bilkent.edu.tr/NetworkCodon.zip. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aminoácidos , Código Genético , Códon , Sequência de Aminoácidos
6.
Bioinformatics ; 38(4): 908-917, 2022 01 27.
Artigo em Inglês | MEDLINE | ID: mdl-34864867

RESUMO

MOTIVATION: Genome-wide association studies show that variants in individual genomic loci alone are not sufficient to explain the heritability of complex, quantitative phenotypes. Many computational methods have been developed to address this issue by considering subsets of loci that can collectively predict the phenotype. This problem can be considered a challenging instance of feature selection in which the number of dimensions (loci that are screened) is much larger than the number of samples. While currently available methods can achieve decent phenotype prediction performance, they either do not scale to large datasets or have parameters that require extensive tuning. RESULTS: We propose a fast and simple algorithm, Macarons, to select a small, complementary subset of variants by avoiding redundant pairs that are likely to be in linkage disequilibrium. Our method features two interpretable parameters that control the time/performance trade-off without requiring parameter tuning. In our computational experiments, we show that Macarons consistently achieves similar or better prediction performance than state-of-the-art selection methods while having a simpler premise and being at least two orders of magnitude faster. Overall, Macarons can seamlessly scale to the human genome with ∼107 variants in a matter of minutes while taking the dependencies between the variants into account. AVAILABILITYAND IMPLEMENTATION: Macarons is available in Matlab and Python at https://github.com/serhan-yilmaz/macarons. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Estudo de Associação Genômica Ampla , Humanos , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Desequilíbrio de Ligação , Genoma Humano , Polimorfismo de Nucleotídeo Único
7.
Bioinformatics ; 38(12): 3238-3244, 2022 06 13.
Artigo em Inglês | MEDLINE | ID: mdl-35512389

RESUMO

MOTIVATION: Identification and removal of micro-scale residual tumor tissue during brain tumor surgery are key for survival in glioma patients. For this goal, High-Resolution Magic Angle Spinning Nuclear Magnetic Resonance (HRMAS NMR) spectroscopy-based assessment of tumor margins during surgery has been an effective method. However, the time required for metabolite quantification and the need for human experts such as a pathologist to be present during surgery are major bottlenecks of this technique. While machine learning techniques that analyze the NMR spectrum in an untargeted manner (i.e. using the full raw signal) have been shown to effectively automate this feedback mechanism, high dimensional and noisy structure of the NMR signal limits the attained performance. RESULTS: In this study, we show that identifying informative regions in the HRMAS NMR spectrum and using them for tumor margin assessment improves the prediction power. We use the spectra normalized with the ERETIC (electronic reference to access in vivo concentrations) method which uses an external reference signal to calibrate the HRMAS NMR spectrum. We train models to predict quantities of metabolites from annotated regions of this spectrum. Using these predictions for tumor margin assessment provides performance improvements up to 4.6% the Area Under the ROC Curve (AUC-ROC) and 2.8% the Area Under the Precision-Recall Curve (AUC-PR). We validate the importance of various tumor biomarkers and identify a novel region between 7.97 ppm and 8.09 ppm as a new candidate for a glioma biomarker. AVAILABILITY AND IMPLEMENTATION: The code is released at https://github.com/ciceklab/targeted_brain_tumor_margin_assessment. The data underlying this article are available in Zenodo, at https://doi.org/10.5281/zenodo.5781769. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Neoplasias Encefálicas , Glioma , Humanos , Neoplasias Encefálicas/diagnóstico por imagem , Neoplasias Encefálicas/cirurgia , Metabolômica/métodos , Espectroscopia de Ressonância Magnética/métodos , Glioma/diagnóstico por imagem , Glioma/cirurgia , Imageamento por Ressonância Magnética
8.
Am J Hum Genet ; 102(6): 1031-1047, 2018 06 07.
Artigo em Inglês | MEDLINE | ID: mdl-29754769

RESUMO

Analysis of de novo mutations (DNMs) from sequencing data of nuclear families has identified risk genes for many complex diseases, including multiple neurodevelopmental and psychiatric disorders. Most of these efforts have focused on mutations in protein-coding sequences. Evidence from genome-wide association studies (GWASs) strongly suggests that variants important to human diseases often lie in non-coding regions. Extending DNM-based approaches to non-coding sequences is challenging, however, because the functional significance of non-coding mutations is difficult to predict. We propose a statistical framework for analyzing DNMs from whole-genome sequencing (WGS) data. This method, TADA-Annotations (TADA-A), is a major advance of the TADA method we developed earlier for DNM analysis in coding regions. TADA-A is able to incorporate many functional annotations such as conservation and enhancer marks, to learn from data which annotations are informative of pathogenic mutations, and to combine both coding and non-coding mutations at the gene level to detect risk genes. It also supports meta-analysis of multiple DNM studies, while adjusting for study-specific technical effects. We applied TADA-A to WGS data of ∼300 autism-affected family trios across five studies and discovered several autism risk genes. The software is freely available for all research uses.


Assuntos
Mapeamento Cromossômico , Predisposição Genética para Doença , Mutação/genética , Estatística como Assunto , Sequenciamento Completo do Genoma , Transtorno Autístico/genética , Calibragem , Elementos Facilitadores Genéticos/genética , Humanos , Anotação de Sequência Molecular , Taxa de Mutação , Splicing de RNA/genética , Fatores de Risco , Sequenciamento do Exoma
9.
Bioinformatics ; 36(Suppl_2): i903-i910, 2020 12 30.
Artigo em Inglês | MEDLINE | ID: mdl-33381836

RESUMO

MOTIVATION: Big data era in genomics promises a breakthrough in medicine, but sharing data in a private manner limit the pace of field. Widely accepted 'genomic data sharing beacon' protocol provides a standardized and secure interface for querying the genomic datasets. The data are only shared if the desired information (e.g. a certain variant) exists in the dataset. Various studies showed that beacons are vulnerable to re-identification (or membership inference) attacks. As beacons are generally associated with sensitive phenotype information, re-identification creates a significant risk for the participants. Unfortunately, proposed countermeasures against such attacks have failed to be effective, as they do not consider the utility of beacon protocol. RESULTS: In this study, for the first time, we analyze the mitigation effect of the kinship relationships among beacon participants against re-identification attacks. We argue that having multiple family members in a beacon can garble the information for attacks since a substantial number of variants are shared among kin-related people. Using family genomes from HapMap and synthetically generated datasets, we show that having one of the parents of a victim in the beacon causes (i) significant decrease in the power of attacks and (ii) substantial increase in the number of queries needed to confirm an individual's beacon membership. We also show how the protection effect attenuates when more distant relatives, such as grandparents are included alongside the victim. Furthermore, we quantify the utility loss due adding relatives and show that it is smaller compared with flipping based techniques.


Assuntos
Genômica , Disseminação de Informação , Família , Fenótipo , Humanos
10.
Bioinformatics ; 36(12): 3669-3679, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32167530

RESUMO

MOTIVATION: Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject's genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively. RESULTS: We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward-Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/CMU-SAFARI/Apollo. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software , Sequenciamento de Nucleotídeos em Larga Escala , Polônia , Análise de Sequência de DNA , Tecnologia
11.
PLoS Comput Biol ; 16(11): e1008184, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-33175838

RESUMO

Complete resection of the tumor is important for survival in glioma patients. Even if the gross total resection was achieved, left-over micro-scale tissue in the excision cavity risks recurrence. High Resolution Magic Angle Spinning Nuclear Magnetic Resonance (HRMAS NMR) technique can distinguish healthy and malign tissue efficiently using peak intensities of biomarker metabolites. The method is fast, sensitive and can work with small and unprocessed samples, which makes it a good fit for real-time analysis during surgery. However, only a targeted analysis for the existence of known tumor biomarkers can be made and this requires a technician with chemistry background, and a pathologist with knowledge on tumor metabolism to be present during surgery. Here, we show that we can accurately perform this analysis in real-time and can analyze the full spectrum in an untargeted fashion using machine learning. We work on a new and large HRMAS NMR dataset of glioma and control samples (n = 565), which are also labeled with a quantitative pathology analysis. Our results show that a random forest based approach can distinguish samples with tumor cells and controls accurately and effectively with a median AUC of 85.6% and AUPR of 93.4%. We also show that we can further distinguish benign and malignant samples with a median AUC of 87.1% and AUPR of 96.1%. We analyze the feature (peak) importance for classification to interpret the results of the classifier. We validate that known malignancy biomarkers such as creatine and 2-hydroxyglutarate play an important role in distinguishing tumor and normal cells and suggest new biomarker regions. The code is released at http://github.com/ciceklab/HRMAS_NC.


Assuntos
Neoplasias Encefálicas/diagnóstico por imagem , Glioma/diagnóstico por imagem , Aprendizado de Máquina , Espectroscopia de Ressonância Magnética/métodos , Margens de Excisão , Algoritmos , Biópsia , Neoplasias Encefálicas/patologia , Neoplasias Encefálicas/cirurgia , Estudos de Coortes , Glioma/patologia , Glioma/cirurgia , Humanos , Período Intraoperatório
12.
Hum Mutat ; 41(8): e7-e45, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32579787

RESUMO

The last decade has proven that amyotrophic lateral sclerosis (ALS) is clinically and genetically heterogeneous, and that the genetic component in sporadic cases might be stronger than expected. This study investigates 1,200 patients to revisit ALS in the ethnically heterogeneous yet inbred Turkish population. Familial ALS (fALS) accounts for 20% of our cases. The rates of consanguinity are 30% in fALS and 23% in sporadic ALS (sALS). Major ALS genes explained the disease cause in only 35% of fALS, as compared with ~70% in Europe and North America. Whole exome sequencing resulted in a discovery rate of 42% (53/127). Whole genome analyses in 623 sALS cases and 142 population controls, sequenced within Project MinE, revealed well-established fALS gene variants, solidifying the concept of incomplete penetrance in ALS. Genome-wide association studies (GWAS) with whole genome sequencing data did not indicate a new risk locus. Coupling GWAS with a coexpression network of disease-associated candidates, points to a significant enrichment for cell cycle- and division-related genes. Within this network, literature text-mining highlights DECR1, ATL1, HDAC2, GEMIN4, and HNRNPA3 as important genes. Finally, information on ALS-related gene variants in the Turkish cohort sequenced within Project MinE was compiled in the GeNDAL variant browser (www.gendal.org).


Assuntos
Esclerose Lateral Amiotrófica/genética , Bases de Dados Genéticas , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Internet , Fenótipo , Turquia , Sequenciamento Completo do Genoma
13.
J Proteome Res ; 19(1): 292-299, 2020 01 03.
Artigo em Inglês | MEDLINE | ID: mdl-31679342

RESUMO

Meningiomas are in most cases benign brain tumors. The WHO 2016 classification defines three grades of meningiomas. This classification had a prognosis value because grade III meningiomas have a worse prognosis value compared to grades I and II meningiomas. However, some benign or atypical meningiomas can have a clinical aggressive behavior. There are currently no reliable markers which allow distinguishing between the meningiomas with a good prognosis and those which may recur. High-resolution magic angle spinning (HRMAS) spectrometry is a noninvasive method able to determine the metabolite profile of a tissue sample. We retrospectively analyzed 62 meningioma samples by using HRMAS spectrometry (43 metabolites). We described a metabolic profile defined by a high concentration for acetate, threonine, N-acetyl-lysine, hydroxybutyrate, myoinositol, ascorbate, scylloinositol, and total choline and a low concentration for aspartate, glucose, isoleucine, valine, adenosine, arginine, and alanine. This metabolomic signature was associated with poor prognosis histological markers [Ki-67 ≥ 40%, high histological grade and negative progesterone receptor (PR) expression]. We also described a similar metabolomic spectrum between grade III and grade I meningiomas. Moreover, all grade I meningiomas with a low Ki-67 expression and a positive PR expression did not have the same metabolomic profile. Metabolomic analysis could be used to determine an aggressive meningioma in order to discuss a personalized treatment. Further studies are needed to confirm these results and to correlate this metabolic profile with survival data.


Assuntos
Neoplasias Encefálicas/metabolismo , Espectroscopia de Ressonância Magnética/métodos , Meningioma/metabolismo , Aminoácidos/análise , Aminoácidos/metabolismo , Biópsia , Neoplasias Encefálicas/cirurgia , Proliferação de Células , Humanos , Antígeno Ki-67/metabolismo , Meningioma/patologia , Meningioma/cirurgia , Metabolômica/métodos
14.
Bioinformatics ; 35(18): 3433-3440, 2019 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-30759247

RESUMO

MOTIVATION: Whole exome sequencing (WES) studies for autism spectrum disorder (ASD) could identify only around six dozen risk genes to date because the genetic architecture of the disorder is highly complex. To speed the gene discovery process up, a few network-based ASD gene discovery algorithms were proposed. Although these methods use static gene interaction networks, functional clustering of genes is bound to evolve during neurodevelopment and disruptions are likely to have a cascading effect on the future associations. Thus, approaches that disregard the dynamic nature of neurodevelopment are limited. RESULTS: Here, we present a spatio-temporal gene discovery algorithm, which leverages information from evolving gene co-expression networks of neurodevelopment. The algorithm solves a prize-collecting Steiner forest-based problem on co-expression networks, adapted to model neurodevelopment and transfer information from precursor neurodevelopmental windows. The decisions made by the algorithm can be traced back, adding interpretability to the results. We apply the algorithm on ASD WES data of 3871 samples and identify risk clusters using BrainSpan co-expression networks of early- and mid-fetal periods. On an independent dataset, we show that incorporation of the temporal dimension increases the predictive power: predicted clusters are hit more and show higher enrichment in ASD-related functions compared with the state-of-the-art. AVAILABILITY AND IMPLEMENTATION: The code is available at http://ciceklab.cs.bilkent.edu.tr/st-steiner. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Estudos de Associação Genética , Algoritmos , Transtorno do Espectro Autista , Análise por Conglomerados , Redes Reguladoras de Genes , Humanos , Software
15.
Bioinformatics ; 35(3): 365-371, 2019 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-30052749

RESUMO

Motivation: Genomic data-sharing beacons aim to provide a secure, easy to implement and standardized interface for data-sharing by only allowing yes/no queries on the presence of specific alleles in the dataset. Previously deemed secure against re-identification attacks, beacons were shown to be vulnerable despite their stringent policy. Recent studies have demonstrated that it is possible to determine whether the victim is in the dataset, by repeatedly querying the beacon for his/her single-nucleotide polymorphisms (SNPs). Here, we propose a novel re-identification attack and show that the privacy risk is more serious than previously thought. Results: Using the proposed attack, even if the victim systematically hides informative SNPs, it is possible to infer the alleles at positions of interest as well as the beacon query results with very high confidence. Our method is based on the fact that alleles at different loci are not necessarily independent. We use linkage disequilibrium and a high-order Markov chain-based algorithm for inference. We show that in a simulated beacon with 65 individuals from the European population, we can infer membership of individuals with 95% confidence with only 5 queries, even when SNPs with MAF <0.05 are hidden. We need less than 0.5% of the number of queries that existing works require, to determine beacon membership under the same conditions. We show that countermeasures such as hiding certain parts of the genome or setting a query budget for the user would fail to protect the privacy of the participants. Availability and implementation: Software is available at http://ciceklab.cs.bilkent.edu.tr/beacon_attack. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica , Disseminação de Informação , Desequilíbrio de Ligação , Algoritmos , Alelos , Feminino , Humanos , Masculino , Polimorfismo de Nucleotídeo Único , Software
16.
Nucleic Acids Res ; 46(21): e125, 2018 11 30.
Artigo em Inglês | MEDLINE | ID: mdl-30124947

RESUMO

Choosing whether to use second or third generation sequencing platforms can lead to trade-offs between accuracy and read length. Several types of studies require long and accurate reads. In such cases researchers often combine both technologies and the erroneous long reads are corrected using the short reads. Current approaches rely on various graph or alignment based techniques and do not take the error profile of the underlying technology into account. Efficient machine learning algorithms that address these shortcomings have the potential to achieve more accurate integration of these two technologies. We propose Hercules, the first machine learning-based long read error correction algorithm. Hercules models every long read as a profile Hidden Markov Model with respect to the underlying platform's error profile. The algorithm learns a posterior transition/emission probability distribution for each long read to correct errors in these reads. We show on two DNA-seq BAC clones (CH17-157L1 and CH17-227A2) that Hercules-corrected reads have the highest mapping rate among all competing algorithms and have the highest accuracy when the breadth of coverage is high. On a large human CHM1 cell line WGS data set, Hercules is one of the few scalable algorithms; and among those, it achieves the highest accuracy.


Assuntos
Algoritmos , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Aprendizado de Máquina , Software , Humanos , Reprodutibilidade dos Testes
17.
Metabolomics ; 15(5): 69, 2019 04 29.
Artigo em Inglês | MEDLINE | ID: mdl-31037432

RESUMO

INTRODUCTION: The identification of frequent acquired mutations shows that patients with oligodendrogliomas have divergent biology with differing prognoses regardless of histological classification. A better understanding of molecular features as well as their metabolic pathways is essential. OBJECTIVES: The aim of this study was to examine the relationship between the tumor metabolome, six genomic aberrations (isocitrate dehydrogenase1 [IDH1] mutation, 1p/19q codeletion, tumor protein p53 [TP53] mutation, O6-methylguanin-DNA methyltransferase [MGMT] promoter methylation, epidermal growth factor receptor [EGFR] amplification, phosphate and tensin homolog [PTEN] methylation), and the patients' survival time. METHODS: We applied 1H high-resolution magic-angle spinning (HRMAS) nuclear magnetic resonance (NMR) spectroscopy to 72 resected oligodendrogliomas. RESULTS: The presence of IDH1, TP53, 1p19q codeletion, MGMT promoter methylation reduced the relative risk of death, whereas PTEN methylation and EGFR amplification were associated with poor prognosis. Increased concentration of 2-hydroxyglutarate (2HG), N-acetyl-aspartate (NAA), myo-inositol and the glycerophosphocholine/phosphocholine (GPC/PC) ratio were good prognostic factors. Increasing the concentration of serine, glycine, glutamate and alanine led to an increased relative risk of death. CONCLUSION: HRMAS NMR spectroscopy provides accurate information on the metabolomics of oligodendrogliomas, making it possible to find new biomarkers indicative of survival. It enables rapid characterization of intact tissue and could be used as an intraoperative method.


Assuntos
Metabolômica , Oligodendroglioma/genética , Oligodendroglioma/metabolismo , Adulto , Humanos , Espectroscopia de Ressonância Magnética , Índice de Gravidade de Doença , Análise de Sobrevida , Fatores de Tempo
18.
HPB (Oxford) ; 21(10): 1354-1361, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-30914156

RESUMO

BACKGROUND: Posthepatectomy liver failure (PHLF) is the main limitation to extending liver resection but its pathophysiology is not yet fully understood. The aim of the study was to describe the metabolic adaptations that occur with PHLF. METHODS: A retrospective study of 82 patients using nuclear magnetic resonance metabolomics to identify and quantify intra-hepatic metabolites was performed. The metabolite levels were compared using metabolic network analysis ADEMA between fatal PHLF (FLF) and non fatal PHLF and according to PHLF/ACLF grading. RESULTS: Metabolomic profiles were significantly different between patients presenting FLF and non FLF or grade 3 ACLF versus < grade 3 ACLF. In the patients undergoing hepatectomy, valine, alanine and glycerophosphocholine were identified as powerful biomarkers to predict FLF (AUROC 0.806, 0.802 and 0.856 respectively). Network analysis showed an activation of aerobic glycolysis with glutaminolysis as observed in highly proliferating systems. Inversely, ACLF3 showed deprivation of glucose and lactate compared to lower ACLF grade. CONCLUSION: Clinical andbiological severity of ACLF and PHLF correlate with specific metabolic adaptations. Metabolomics can predict fatal liver failure after hepatectomy and underline significant differences in the metabolic patterns of ACLF and PHLF.


Assuntos
Insuficiência Hepática Crônica Agudizada/metabolismo , Biomarcadores/metabolismo , Hepatectomia/efeitos adversos , Fígado/metabolismo , Metabolômica/métodos , Complicações Pós-Operatórias , Insuficiência Hepática Crônica Agudizada/diagnóstico , Insuficiência Hepática Crônica Agudizada/etiologia , Idoso , Alanina/metabolismo , Carcinoma Hepatocelular/cirurgia , Feminino , Seguimentos , Humanos , Fígado/patologia , Neoplasias Hepáticas/cirurgia , Espectroscopia de Ressonância Magnética , Masculino , Pessoa de Meia-Idade , Diester Fosfórico Hidrolases/metabolismo , Curva ROC , Estudos Retrospectivos , Fatores de Risco , Valina/metabolismo
19.
Bioinformatics ; 30(12): i175-84, 2014 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-24931981

RESUMO

MOTIVATION: Discovering the transcriptional regulatory architecture of the metabolism has been an important topic to understand the implications of transcriptional fluctuations on metabolism. The reporter algorithm (RA) was proposed to determine the hot spots in metabolic networks, around which transcriptional regulation is focused owing to a disease or a genetic perturbation. Using a z-score-based scoring scheme, RA calculates the average statistical change in the expression levels of genes that are neighbors to a target metabolite in the metabolic network. The RA approach has been used in numerous studies to analyze cellular responses to the downstream genetic changes. In this article, we propose a mutual information-based multivariate reporter algorithm (MIRA) with the goal of eliminating the following problems in detecting reporter metabolites: (i) conventional statistical methods suffer from small sample sizes, (ii) as z-score ranges from minus to plus infinity, calculating average scores can lead to canceling out opposite effects and (iii) analyzing genes one by one, then aggregating results can lead to information loss. MIRA is a multivariate and combinatorial algorithm that calculates the aggregate transcriptional response around a metabolite using mutual information. We show that MIRA's results are biologically sound, empirically significant and more reliable than RA. RESULTS: We apply MIRA to gene expression analysis of six knockout strains of Escherichia coli and show that MIRA captures the underlying metabolic dynamics of the switch from aerobic to anaerobic respiration. We also apply MIRA to an Autism Spectrum Disorder gene expression dataset. Results indicate that MIRA reports metabolites that highly overlap with recently found metabolic biomarkers in the autism literature. Overall, MIRA is a promising algorithm for detecting metabolic drug targets and understanding the relation between gene expression and metabolic activity. AVAILABILITY AND IMPLEMENTATION: The code is implemented in C# language using .NET framework. Project is available upon request.


Assuntos
Algoritmos , Redes e Vias Metabólicas/genética , Criança , Transtornos Globais do Desenvolvimento Infantil/genética , Transtornos Globais do Desenvolvimento Infantil/metabolismo , Escherichia coli/genética , Escherichia coli/metabolismo , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Humanos , Transcrição Gênica
20.
PLoS Comput Biol ; 9(1): e1002859, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23341761

RESUMO

Metabolomics is a relatively new "omics" platform, which analyzes a discrete set of metabolites detected in bio-fluids or tissue samples of organisms. It has been used in a diverse array of studies to detect biomarkers and to determine activity rates for pathways based on changes due to disease or drugs. Recent improvements in analytical methodology and large sample throughput allow for creation of large datasets of metabolites that reflect changes in metabolic dynamics due to disease or a perturbation in the metabolic network. However, current methods of comprehensive analyses of large metabolic datasets (metabolomics) are limited, unlike other "omics" approaches where complex techniques for analyzing coexpression/coregulation of multiple variables are applied. This paper discusses the shortcomings of current metabolomics data analysis techniques, and proposes a new multivariate technique (ADEMA) based on mutual information to identify expected metabolite level changes with respect to a specific condition. We show that ADEMA better predicts De Novo Lipogenesis pathway metabolite level changes in samples with Cystic Fibrosis (CF) than prediction based on the significance of individual metabolite level changes. We also applied ADEMA's classification scheme on three different cohorts of CF and wildtype mice. ADEMA was able to predict whether an unknown mouse has a CF or a wildtype genotype with 1.0, 0.84, and 0.9 accuracy for each respective dataset. ADEMA results had up to 31% higher accuracy as compared to other classification algorithms. In conclusion, ADEMA advances the state-of-the-art in metabolomics analysis, by providing accurate and interpretable classification results.


Assuntos
Algoritmos , Metabolômica , Animais , Lipogênese , Camundongos , Modelos Teóricos , Análise Multivariada
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA