Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
1.
medRxiv ; 2023 Nov 29.
Article in English | MEDLINE | ID: mdl-37292870

ABSTRACT

Background: Pulmonary hypertension (PH) poses a significant health threat with high morbidity and mortality, necessitating improved diagnostic tools for enhanced management. Current biomarkers for PH lack functionality and comprehensive diagnostic and prognostic capabilities. Therefore, there is a critical need to develop biomarkers that address these gaps in PH diagnostics and prognosis. Methods: To address this need, we employed a comprehensive metabolomics analysis in 233 blood based samples coupled with machine learning analysis. For functional insights, human pulmonary arteries (PA) of idiopathic pulmonary arterial hypertension (PAH) lungs were investigated and the effect of extrinsic FFAs on human PA endothelial and smooth muscle cells was tested in vitro. Results: PA of idiopathic PAH lungs showed lipid accumulation and altered expression of lipid homeostasis-related genes. In PA smooth muscle cells, extrinsic FFAs caused excessive proliferation and endothelial barrier dysfunction in PA endothelial cells, both hallmarks of PAH.In the training cohort of 74 PH patients, 30 disease controls without PH, and 65 healthy controls, diagnostic and prognostic markers were identified and subsequently validated in an independent cohort. Exploratory analysis showed a highly impacted metabolome in PH patients and machine learning confirmed a high diagnostic potential. Fully explainable specific free fatty acid (FFA)/lipid-ratios were derived, providing exceptional diagnostic accuracy with an area under the curve (AUC) of 0.89 in the training and 0.90 in the validation cohort, outperforming machine learning results. These ratios were also prognostic and complemented established clinical prognostic PAH scores (FPHR4p and COMPERA2.0), significantly increasing their hazard ratios (HR) from 2.5 and 3.4 to 4.2 and 6.1, respectively. Conclusion: In conclusion, our research confirms the significance of lipidomic alterations in PH, introducing innovative diagnostic and prognostic biomarkers. These findings may have the potential to reshape PH management strategies.

2.
BMC Med Inform Decis Mak ; 22(1): 222, 2022 08 20.
Article in English | MEDLINE | ID: mdl-35987636

ABSTRACT

BACKGROUND AND OBJECTIVES: Fainting is a well-known side effect of blood donation. Such adverse experiences can diminish the return rate for further blood donations. Identifying factors associated with fainting could help prevent adverse incidents during blood donation. MATERIALS AND METHODS: Data of 85,040 blood donations from whole blood and apheresis donors within four consecutive years were included in this retrospective study. Seven different machine learning models (random forests, artificial neural networks, XGradient Boosting, AdaBoost, logistic regression, K nearest neighbors, and support vector machines) for predicting fainting during blood donation were established. The used features derived from the data obtained from the questionnaire every donor has to fill in before the donation and weather data of the day of the donation. RESULTS: One thousand seven hundred fifteen fainting reactions were observed in 228 846 blood donations from 88,003 donors over a study period of 48 months. Similar values for all machine learning algorithms investigated for NPV, PPV, AUC, and F1-score were obtained. In general, NPV was above 0.996, whereas PPV was below 0.03. AUC and F1-score were close to 0.9 for all models. Essential features predicting fainting during blood donation were systolic and diastolic blood pressure and ambient temperature, humidity, and barometric pressure. CONCLUSION: Machine-learning algorithms can establish prediction models of fainting in blood donors. These new tools can reduce adverse reactions during blood donation and improve donor safety and minimize negative associations relating to blood donation.


Subject(s)
Blood Donors , Syncope , Humans , Machine Learning , Retrospective Studies , Weather
3.
Breast Cancer ; 29(2): 274-286, 2022 Mar.
Article in English | MEDLINE | ID: mdl-34865205

ABSTRACT

BACKGROUND: MicroRNAs are small non-coding RNAs with pivotal regulatory functions in multiple cellular processes. Their significance as molecular predictors for breast cancer was demonstrated in the past 15 years. The aim of this study was to elucidate the role of hsa-miR-3651 for predicting of local control (LC) in early breast cancer. RESULTS: By means of high-throughput technology, hsa-miR-3651 was found to be differentially expressed between patients who experienced local relapse compared to those without (N = 23; p = 0.0035). This result could be validated in an independent cohort of 87 patients using RT-qPCR (p < 0.0005). In a second analysis step with a chip-based microarray containing 70,523 probes of potential target molecules, FERM domain protein 3 (FRMD3) was found to be the most down-regulated protein (N = 21; p = 0.0016). Computational analysis employing different prediction algorithms revealed FRMD3 as a likely downstream target of hsa-miR-3651 with an 8mer binding site between the two molecules. This could be validated in an independent patient set (N = 20, p = 0.134). CONCLUSION: The current study revealed that hsa-miR-3651 is a predictor of LC in early breast cancer via its putative target protein FRMD3. Since microRNAs interfere in multiple pathways, the results of this hypothesis generating study may contribute to the development of tailored therapies for breast cancer in the future.


Subject(s)
Breast Neoplasms , MicroRNAs , Neoplasm Recurrence, Local , Tumor Suppressor Proteins , Breast Neoplasms/genetics , Female , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Humans , MicroRNAs/genetics , MicroRNAs/metabolism , Neoplasm Recurrence, Local/genetics , Oligonucleotide Array Sequence Analysis , Tumor Suppressor Proteins/genetics
5.
Eur J Cardiothorac Surg ; 60(6): 1378-1385, 2021 12 01.
Article in English | MEDLINE | ID: mdl-34050368

ABSTRACT

OBJECTIVES: Machine learning methods potentially provide a highly accurate and detailed assessment of expected individual patient risk before elective cardiac surgery. Correct anticipation of this risk allows for the improved counselling of patients and avoidance of possible complications. We therefore investigated the benefit of modern machine learning methods in personalized risk prediction for patients undergoing elective heart valve surgery. METHODS: We performed a monocentric retrospective study in patients who underwent elective heart valve surgery between 1 January 2008 and 31 December 2014 at our centre. We used random forests, artificial neural networks and support vector machines to predict the 30-day mortality from a subset of 129 available demographic and preoperative parameters. Exclusion criteria were reoperation of the same patient, patients who needed anterograde cerebral perfusion due to aortic arch surgery and patients with grown-up congenital heart disease. Finally, the cohort consisted of 2229 patients with a 30-day mortality of 3.86% (86 of 2229 cases). This trial has been registered at clinicaltrials.gov (NCT03724123). RESULTS: The final random forest model trained on the entire data set provided an out-of-bag area under the receiver operator characteristics curve (AUC) of 0.839, which significantly outperformed the European System for Cardiac Operative Risk Evaluation (EuroSCORE) (AUC = 0.704) and a model trained only on the subset of features EuroSCORE uses (AUC = 0.745). CONCLUSIONS: Advanced machine learning methods can predict outcomes of valve surgery procedures with higher accuracy than established risk scores based on logistic regression on pre-selected parameters. This approach is generalizable to other elective high-risk interventions and allows for training models to the cohorts of specific institutions.


Subject(s)
Cardiac Surgical Procedures , Cardiac Surgical Procedures/adverse effects , Heart Valves/surgery , Humans , Machine Learning , Retrospective Studies , Risk Assessment/methods
6.
Genes (Basel) ; 11(12)2020 11 26.
Article in English | MEDLINE | ID: mdl-33255991

ABSTRACT

BACKGROUND: In order to characterize the various subtypes of breast cancer more precisely and improve patients selection for breast conserving therapy (BCT), molecular profiling has gained importance over the past two decades. MicroRNAs, which are small non-coding RNAs, can potentially regulate numerous downstream target molecules and thereby interfere in carcinogenesis and treatment response via multiple pathways. The aim of the current two-phase study was to investigate whether hsa-miR-375-signaling through RASD1 could predict local control (LC) in early breast cancer. RESULTS: The patient and treatment characteristics of 81 individuals were similarly distributed between relapse (n = 27) and control groups (n = 54). In the pilot phase, the primary tumors of 28 patients were analyzed with microarray technology. Of the more than 70,000 genes on the chip, 104 potential hsa-miR-375 target molecules were found to have a lower expression level in relapse patients compared to controls (p-value < 0.2). For RASD1, a hsa-miR-375 binding site was predicted by an in silico search in five mRNA-miRNA databases and mechanistically proven in previous pre-clinical studies. Its expression levels were markedly lower in relapse patients than in controls (p-value of 0.058). In a second phase, this finding could be validated in an independent set of 53 patients using ddPCR. Patients with enhanced levels of hsa-miR-375 compared to RASD1 had a higher probability of local relapse than those with the inverse expression pattern of the two markers (log-rank test, p-value = 0.069). CONCLUSION: This two-phase study demonstrates that hsa-miR-375/RASD1 signaling is able to predict local control in early breast cancer patients, which-to our knowledge-is the first clinical report on a miR combined with one of its downstream target proteins predicting LC in breast cancer.


Subject(s)
Breast Neoplasms/genetics , MicroRNAs/genetics , Signal Transduction/genetics , ras Proteins/genetics , Adult , Aged , Computational Biology/methods , Female , Gene Expression Regulation, Neoplastic/genetics , Gene Regulatory Networks/genetics , Humans , Middle Aged , Neoplasm Recurrence, Local/genetics , RNA, Messenger/genetics
7.
Ann Rheum Dis ; 78(5): 617-628, 2019 05.
Article in English | MEDLINE | ID: mdl-30862608

ABSTRACT

OBJECTIVES: Juvenile idiopathic arthritis (JIA) is the most common class of childhood rheumatic diseases, with distinct disease subsets that may have diverging pathophysiological origins. Both adaptive and innate immune processes have been proposed as primary drivers, which may account for the observed clinical heterogeneity, but few high-depth studies have been performed. METHODS: Here we profiled the adaptive immune system of 85 patients with JIA and 43 age-matched controls with indepth flow cytometry and machine learning approaches. RESULTS: Immune profiling identified immunological changes in patients with JIA. This immune signature was shared across a broad spectrum of childhood inflammatory diseases. The immune signature was identified in clinically distinct subsets of JIA, but was accentuated in patients with systemic JIA and those patients with active disease. Despite the extensive overlap in the immunological spectrum exhibited by healthy children and patients with JIA, machine learning analysis of the data set proved capable of discriminating patients with JIA from healthy controls with ~90% accuracy. CONCLUSIONS: These results pave the way for large-scale immune phenotyping longitudinal studies of JIA. The ability to discriminate between patients with JIA and healthy individuals provides proof of principle for the use of machine learning to identify immune signatures that are predictive to treatment response group.


Subject(s)
Adaptive Immunity/immunology , Arthritis, Juvenile/immunology , Immunophenotyping/methods , Machine Learning , Adolescent , Case-Control Studies , Child , Child, Preschool , Female , Flow Cytometry , Humans , Male
8.
Article in English | MEDLINE | ID: mdl-29906679

ABSTRACT

Chromatography is one of the most versatile unit operations in the biotechnological industry. Regulatory initiatives like Process Analytical Technology and Quality by Design led to the implementation of new chromatographic devices. Those represent an almost inexhaustible source of data. However, the analysis of large datasets is complicated, and significant amounts of information stay hidden in big data. Here we present a new, top-down approach for the systematic analysis of chromatographic datasets. It is the goal of this approach to analyze the dataset as a whole, starting with the most important, global information. The workflow should highlight interesting regions (outliers, drifts, data inconsistencies), and help to localize those regions within a multi-dimensional space in a straightforward way. Moving window factor models were used to extract the most important information, focusing on the differences between samples. The prototype was implemented as an interactive visualization tool for the explorative analysis of complex datasets. We found that the tool makes it convenient to localize variances in a multidimensional dataset and allows to differentiate between explainable and unexplainable variance. Starting with one global difference descriptor per sample, the analysis ends up with highly resolute temporally dependent difference descriptor values, thought as a starting point for the detailed analysis of the underlying raw data.


Subject(s)
Chromatography , Data Interpretation, Statistical , Multivariate Analysis , Algorithms , Databases, Factual
9.
PLoS Negl Trop Dis ; 12(1): e0006182, 2018 01.
Article in English | MEDLINE | ID: mdl-29357361

ABSTRACT

Rabies is caused by lyssaviruses, and is one of the oldest known zoonoses. In recent years, more than 21,000 nucleotide sequences of rabies viruses (RABV), from the prototype species rabies lyssavirus, have been deposited in public databases. Subsequent phylogenetic analyses in combination with metadata suggest geographic distributions of RABV. However, these analyses somewhat experience technical difficulties in defining verifiable criteria for cluster allocations in phylogenetic trees inviting for a more rational approach. Therefore, we applied a relatively new mathematical clustering algorythm named 'affinity propagation clustering' (AP) to propose a standardized sub-species classification utilizing full-genome RABV sequences. Because AP has the advantage that it is computationally fast and works for any meaningful measure of similarity between data samples, it has previously been applied successfully in bioinformatics, for analysis of microarray and gene expression data, however, cluster analysis of sequences is still in its infancy. Existing (516) and original (46) full genome RABV sequences were used to demonstrate the application of AP for RABV clustering. On a global scale, AP proposed four clusters, i.e. New World cluster, Arctic/Arctic-like, Cosmopolitan, and Asian as previously assigned by phylogenetic studies. By combining AP with established phylogenetic analyses, it is possible to resolve phylogenetic relationships between verifiably determined clusters and sequences. This workflow will be useful in confirming cluster distributions in a uniform transparent manner, not only for RABV, but also for other comparative sequence analyses.


Subject(s)
Cluster Analysis , Computational Biology/methods , Phylogeny , RNA, Viral/genetics , Rabies virus/classification , Rabies virus/genetics , Sequence Homology, Nucleic Acid , Algorithms
10.
J Immunol ; 199(8): 2985-2997, 2017 10 15.
Article in English | MEDLINE | ID: mdl-28924003

ABSTRACT

Recent studies have revealed that immune repertoires contain a substantial fraction of public clones, which may be defined as Ab or TCR clonal sequences shared across individuals. It has remained unclear whether public clones possess predictable sequence features that differentiate them from private clones, which are believed to be generated largely stochastically. This knowledge gap represents a lack of insight into the shaping of immune repertoire diversity. Leveraging a machine learning approach capable of capturing the high-dimensional compositional information of each clonal sequence (defined by CDR3), we detected predictive public clone and private clone-specific immunogenomic differences concentrated in CDR3's N1-D-N2 region, which allowed the prediction of public and private status with 80% accuracy in humans and mice. Our results unexpectedly demonstrate that public, as well as private, clones possess predictable high-dimensional immunogenomic features. Our support vector machine model could be trained effectively on large published datasets (3 million clonal sequences) and was sufficiently robust for public clone prediction across individuals and studies prepared with different library preparation and high-throughput sequencing protocols. In summary, we have uncovered the existence of high-dimensional immunogenomic rules that shape immune repertoire diversity in a predictable fashion. Our approach may pave the way for the construction of a comprehensive atlas of public mouse and human immune repertoires with potential applications in rational vaccine design and immunotherapeutics.


Subject(s)
B-Lymphocytes/physiology , Complementarity Determining Regions/genetics , Immunotherapy/methods , Receptors, Antigen, B-Cell/genetics , Receptors, Antigen, T-Cell/genetics , T-Lymphocytes/physiology , Vaccines/immunology , Animals , Antibody Diversity , Clonal Selection, Antigen-Mediated , Clone Cells , Datasets as Topic , High-Throughput Nucleotide Sequencing , Humans , Mice , Mice, Inbred BALB C , Mice, Inbred C57BL
11.
Antimicrob Agents Chemother ; 60(8): 4722-33, 2016 08.
Article in English | MEDLINE | ID: mdl-27216077

ABSTRACT

Emerging resistance to antimicrobials and the lack of new antibiotic drug candidates underscore the need for optimization of current diagnostics and therapies to diminish the evolution and spread of multidrug resistance. As the antibiotic resistance status of a bacterial pathogen is defined by its genome, resistance profiling by applying next-generation sequencing (NGS) technologies may in the future accomplish pathogen identification, prompt initiation of targeted individualized treatment, and the implementation of optimized infection control measures. In this study, qualitative RNA sequencing was used to identify key genetic determinants of antibiotic resistance in 135 clinical Pseudomonas aeruginosa isolates from diverse geographic and infection site origins. By applying transcriptome-wide association studies, adaptive variations associated with resistance to the antibiotic classes fluoroquinolones, aminoglycosides, and ß-lactams were identified. Besides potential novel biomarkers with a direct correlation to resistance, global patterns of phenotype-associated gene expression and sequence variations were identified by predictive machine learning approaches. Our research serves to establish genotype-based molecular diagnostic tools for the identification of the current resistance profiles of bacterial pathogens and paves the way for faster diagnostics for more efficient, targeted treatment strategies to also mitigate the future potential for resistance evolution.


Subject(s)
Anti-Bacterial Agents/pharmacology , Drug Resistance, Multiple, Bacterial/genetics , Pseudomonas aeruginosa/drug effects , Pseudomonas aeruginosa/genetics , Transcriptome/genetics , Aminoglycosides/pharmacology , Fluoroquinolones/pharmacology , Gene Expression Profiling/methods , High-Throughput Nucleotide Sequencing/methods , Humans , Microbial Sensitivity Tests/methods , Pseudomonas Infections/drug therapy , Pseudomonas Infections/microbiology , beta-Lactams/pharmacology
12.
Clin Epigenetics ; 8: 28, 2016.
Article in English | MEDLINE | ID: mdl-26962366

ABSTRACT

BACKGROUND: A long-term analysis by the Early Breast Cancer Trialist Group (EBCTG) revealed a strong correlation between local control and cancer-specific mortality. MicroRNAs (miRs), short (20-25 nucleotides) non-coding RNAs, have been described as prognosticators and predictors for breast cancer in recent years. The aim of the current study was to identify miRs that can predict local control after breast conserving therapy (BCT) in early stage breast cancer. RESULTS: Clinical data of 46 early stage breast cancer patients with local relapse after BCT were selected from the institutional database. These patients were matched to 101 control patients showing identical clinical features but without local relapse. The study was conducted in two steps. (1) In the pilot study, 32 patients (16 relapses versus 16 controls) were screened for the most de-regulated microRNAs (= candidate microRNAs) in a panel of 1250 miRs by microarray technology. Eight miRs were found to be significantly de-regulated. (2) In the validation study, the candidate microRNAs were analyzed in an independent cohort of 115 patients (30 relapses versus 85 controls) with reverse transcription quantitative polymerase chain reaction (RT-qPCR). From these eight candidates, hsa-miR-375 could be validated. Its median fold change was 2.28 (Mann-Whitney U test, corrected p value = 0.008). In the log-rank analysis, high expression levels of hsa-miR-375 correlated with a significantly higher risk of local relapse (p = 0.003). In a multivariate analysis (forward stepwise regression) including established predictors and prognosticators, hsa-miR-375 was the only variable that was able to distinguish the statistical significance between relapse and control groups (raw p value = 0.000195 HR = 0.76, 95 % CI 0.66-0.88; corrected p value = 0.005). CONCLUSIONS: Hsa-miR-375 predicts local control in patient with early stage breast cancer, especially in estrogen receptor α (ER-α)-positive patients. It can therefore serve as an additional molecular marker for treatment choice independently from known predictors and prognosticators. Validation in larger prospective studies is warranted.


Subject(s)
Breast Neoplasms/genetics , MicroRNAs/genetics , Adult , Aged , Breast Neoplasms/diagnosis , Case-Control Studies , Female , Genetic Markers , Humans , Middle Aged , Neoplasm Recurrence, Local/genetics , Oligonucleotide Array Sequence Analysis , Prognosis , Reverse Transcriptase Polymerase Chain Reaction
13.
Bioinformatics ; 31(24): 3997-9, 2015 Dec 15.
Article in English | MEDLINE | ID: mdl-26315911

ABSTRACT

UNLABELLED: Although the R platform and the add-on packages of the Bioconductor project are widely used in bioinformatics, the standard task of multiple sequence alignment has been neglected so far. The msa package, for the first time, provides a unified R interface to the popular multiple sequence alignment algorithms ClustalW, ClustalOmega and MUSCLE. The package requires no additional software and runs on all major platforms. Moreover, the msa package provides an R interface to the powerful package shade which allows for flexible and customizable plotting of multiple sequence alignments. AVAILABILITY AND IMPLEMENTATION: msa is available via the Bioconductor project: http://bioconductor.org/packages/release/bioc/html/msa.html. Further information and the R code of the example presented in this paper are available at http://www.bioinf.jku.at/software/msa/.


Subject(s)
Sequence Alignment/methods , Software , Algorithms , Animals
14.
Bioinformatics ; 31(15): 2574-6, 2015 Aug 01.
Article in English | MEDLINE | ID: mdl-25812745

ABSTRACT

KeBABS provides a powerful, flexible and easy to use framework for KE: rnel- B: ased A: nalysis of B: iological S: equences in R. It includes efficient implementations of the most important sequence kernels, also including variants that allow for taking sequence annotations and positional information into account. KeBABS seamlessly integrates three common support vector machine (SVM) implementations with a unified interface. It allows for hyperparameter selection by cross validation, nested cross validation and also features grouped cross validation. The biological interpretation of SVM models is supported by (1) the computation of weights of sequence patterns and (2) prediction profiles that highlight the contributions of individual sequence positions or sections.


Subject(s)
HLA-A2 Antigen/metabolism , Models, Theoretical , Peptide Fragments/metabolism , Sequence Analysis, Protein/methods , Software , Support Vector Machine , Algorithms , Artificial Intelligence , Computer Simulation , Humans
15.
PLoS One ; 9(1): e85934, 2014.
Article in English | MEDLINE | ID: mdl-24454946

ABSTRACT

The glycosylphosphatidylinositol (GPI)-anchored molecule CD59 has been implicated in the modulation of T cell responses, but the underlying molecular mechanism of CD59 influencing T cell signaling remained unclear. Here we analyzed Jurkat T cells stimulated via anti-CD3ε- or anti-CD59-coated surfaces, using time-resolved single-cell Ca(2+) imaging as a read-out for stimulation. This analysis revealed a heterogeneous Ca(2+) response of the cell population in a stimulus-dependent manner. Further analysis of T cell receptor (TCR)/CD3 deficient or overexpressing cells showed that CD59-mediated signaling is strongly dependent on TCR/CD3 surface expression. In protein co-patterning and fluorescence recovery after photobleaching experiments no direct physical interaction was observed between CD59 and CD3 at the plasma membrane upon anti-CD59 stimulation. However, siRNA-mediated protein knock-downs of downstream signaling molecules revealed that the Src family kinase Lck and the adaptor molecule linker of activated T cells (LAT) are essential for both signaling pathways. Furthermore, flow cytometry measurements showed that knock-down of Lck accelerates CD3 re-expression at the cell surface after anti-CD59 stimulation similar to what has been observed upon direct TCR/CD3 stimulation. Finally, physically linking Lck to CD3ζ completely abolished CD59-triggered Ca(2+) signaling, while signaling was still functional upon direct TCR/CD3 stimulation. Altogether, we demonstrate that Lck mediates signal transmission from CD59 to the TCR/CD3 pathway in Jurkat T cells, and propose that CD59 may act via Lck to modulate T cell responses.


Subject(s)
CD3 Complex/metabolism , CD59 Antigens/metabolism , Calcium Signaling , Lymphocyte Specific Protein Tyrosine Kinase p56(lck)/physiology , Receptors, Antigen, T-Cell/metabolism , Cell Membrane/metabolism , Humans , Jurkat Cells
16.
PLoS One ; 7(11): e47924, 2012.
Article in English | MEDLINE | ID: mdl-23144837

ABSTRACT

To gain deeper insights into principles of cell biology, it is essential to understand how cells reorganize their genomes by chromatin remodeling. We analyzed chromatin remodeling on next generation sequencing data from resting and activated T cells to determine a whole-genome chromatin remodeling landscape. We consider chromatin remodeling in terms of nucleosome repositioning which can be observed most robustly in long nucleosome-free regions (LNFRs) that are occupied by nucleosomes in another cell state. We found that LNFR sequences are either AT-rich or GC-rich, where nucleosome repositioning was observed much more prominently in GC-rich LNFRs - a considerable proportion of them outside promoter regions. Using support vector machines with string kernels, we identified a GC-rich DNA sequence pattern indicating loci of nucleosome repositioning in resting T cells. This pattern appears to be also typical for CpG islands. We found out that nucleosome repositioning in GC-rich LNFRs is indeed associated with CpG islands and with binding sites of the CpG-island-binding ZF-CXXC proteins KDM2A and CFP1. That this association occurs prominently inside and also prominently outside of promoter regions hints at a mechanism governing nucleosome repositioning that acts on a whole-genome scale.


Subject(s)
Chromatin Assembly and Disassembly/genetics , GC Rich Sequence , Animals , Base Composition , Base Sequence , CD4-Positive T-Lymphocytes/metabolism , Chromatin/metabolism , Chromosome Mapping , Consensus Sequence , CpG Islands , Genome, Human , Humans , Mice , Models, Genetic , Molecular Sequence Annotation , Nucleosomes/genetics , Sequence Analysis, DNA
17.
Nucleic Acids Res ; 40(9): e69, 2012 May.
Article in English | MEDLINE | ID: mdl-22302147

ABSTRACT

Quantitative analyses of next-generation sequencing (NGS) data, such as the detection of copy number variations (CNVs), remain challenging. Current methods detect CNVs as changes in the depth of coverage along chromosomes. Technological or genomic variations in the depth of coverage thus lead to a high false discovery rate (FDR), even upon correction for GC content. In the context of association studies between CNVs and disease, a high FDR means many false CNVs, thereby decreasing the discovery power of the study after correction for multiple testing. We propose 'Copy Number estimation by a Mixture Of PoissonS' (cn.MOPS), a data processing pipeline for CNV detection in NGS data. In contrast to previous approaches, cn.MOPS incorporates modeling of depths of coverage across samples at each genomic position. Therefore, cn.MOPS is not affected by read count variations along chromosomes. Using a Bayesian approach, cn.MOPS decomposes variations in the depth of coverage across samples into integer copy numbers and noise by means of its mixture components and Poisson distributions, respectively. The noise estimate allows for reducing the FDR by filtering out detections having high noise that are likely to be false detections. We compared cn.MOPS with the five most popular methods for CNV detection in NGS data using four benchmark datasets: (i) simulated data, (ii) NGS data from a male HapMap individual with implanted CNVs from the X chromosome, (iii) data from HapMap individuals with known CNVs, (iv) high coverage data from the 1000 Genomes Project. cn.MOPS outperformed its five competitors in terms of precision (1-FDR) and recall for both gains and losses in all benchmark data sets. The software cn.MOPS is publicly available as an R package at http://www.bioinf.jku.at/software/cnmops/ and at Bioconductor.


Subject(s)
DNA Copy Number Variations , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA , Software , Chromosomes, Human, X/chemistry , HapMap Project , Humans , Male , Poisson Distribution
18.
Bioinformatics ; 27(17): 2463-4, 2011 Sep 01.
Article in English | MEDLINE | ID: mdl-21737437

ABSTRACT

SUMMARY: Affinity propagation (AP) clustering has recently gained increasing popularity in bioinformatics. AP clustering has the advantage that it allows for determining typical cluster members, the so-called exemplars. We provide an R implementation of this promising new clustering technique to account for the ubiquity of R in bioinformatics. This article introduces the package and presents an application from structural biology. AVAILABILITY: The R package apcluster is available via CRAN-The Comprehensive R Archive Network: http://cran.r-project.org/web/packages/apcluster CONTACT: apcluster@bioinf.jku.at; bodenhofer@bioinf.jku.at.


Subject(s)
Cluster Analysis , Software , Algorithms , Amino Acid Motifs , Computational Biology/methods , Sequence Analysis, Protein
19.
Mol Cell Proteomics ; 10(5): M110.004994, 2011 May.
Article in English | MEDLINE | ID: mdl-21311038

ABSTRACT

Understanding the relationship between protein sequence and structure is one of the great challenges in biology. In the case of the ubiquitous coiled-coil motif, structure and occurrence have been described in extensive detail, but there is a lack of insight into the rules that govern oligomerization, i.e. how many α-helices form a given coiled coil. To shed new light on the formation of two- and three-stranded coiled coils, we developed a machine learning approach to identify rules in the form of weighted amino acid patterns. These rules form the basis of our classification tool, PrOCoil, which also visualizes the contribution of each individual amino acid to the overall oligomeric tendency of a given coiled-coil sequence. We discovered that sequence positions previously thought irrelevant to direct coiled-coil interaction have an undeniable impact on stoichiometry. Our rules also demystify the oligomerization behavior of the yeast transcription factor GCN4, which can now be described as a hybrid--part dimer and part trimer--with both theoretical and experimental justification.


Subject(s)
Amino Acid Motifs , Artificial Intelligence , Computer Simulation , Models, Molecular , Protein Multimerization , Algorithms , Area Under Curve , Basic-Leucine Zipper Transcription Factors/chemistry , Molecular Sequence Annotation , Mutant Proteins/chemistry , ROC Curve , Saccharomyces cerevisiae Proteins/chemistry
20.
Bioinformatics ; 26(12): 1520-7, 2010 Jun 15.
Article in English | MEDLINE | ID: mdl-20418340

ABSTRACT

MOTIVATION: Biclustering of transcriptomic data groups genes and samples simultaneously. It is emerging as a standard tool for extracting knowledge from gene expression measurements. We propose a novel generative approach for biclustering called 'FABIA: Factor Analysis for Bicluster Acquisition'. FABIA is based on a multiplicative model, which accounts for linear dependencies between gene expression and conditions, and also captures heavy-tailed distributions as observed in real-world transcriptomic data. The generative framework allows to utilize well-founded model selection methods and to apply Bayesian techniques. RESULTS: On 100 simulated datasets with known true, artificially implanted biclusters, FABIA clearly outperformed all 11 competitors. On these datasets, FABIA was able to separate spurious biclusters from true biclusters by ranking biclusters according to their information content. FABIA was tested on three microarray datasets with known subclusters, where it was two times the best and once the second best method among the compared biclustering approaches. AVAILABILITY: FABIA is available as an R package on Bioconductor (http://www.bioconductor.org). All datasets, results and software are available at http://www.bioinf.jku.at/software/fabia/fabia.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Gene Expression Profiling/methods , Software , Algorithms , Factor Analysis, Statistical , Gene Expression , Oligonucleotide Array Sequence Analysis/methods , Pattern Recognition, Automated
SELECTION OF CITATIONS
SEARCH DETAIL
...