Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
Más filtros

Banco de datos
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
Brief Bioinform ; 24(6)2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37874950

RESUMEN

Cluster analysis is a crucial stage in the analysis and interpretation of single-cell gene expression (scRNA-seq) data. It is an inherently ill-posed problem whose solutions depend heavily on hyper-parameter and algorithmic choice. The popular approach of K-means clustering, for example, depends heavily on the choice of K and the convergence of the expectation-maximization algorithm to local minima of the objective. Exhaustive search of the space for multiple good quality solutions is known to be a complex problem. Here, we show that quantum computing offers a solution to exploring the cost function of clustering by quantum annealing, implemented on a quantum computing facility offered by D-Wave [1]. Out formulation extracts minimum vertex cover of an affinity graph to sub-sample the cell population and quantum annealing to optimise the cost function. A distribution of low-energy solutions can thus be extracted, offering alternate hypotheses about how genes group together in their space of expressions.


Asunto(s)
Metodologías Computacionales , Teoría Cuántica , RNA-Seq , Análisis de Secuencia de ARN , Algoritmos , Análisis por Conglomerados , Perfilación de la Expresión Génica
2.
Bioinformatics ; 40(2)2024 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-38310333

RESUMEN

MOTIVATION: Protein language models (PLMs), which borrowed ideas for modelling and inference from natural language processing, have demonstrated the ability to extract meaningful representations in an unsupervised way. This led to significant performance improvement in several downstream tasks. Clustering amino acids based on their physical-chemical properties to achieve reduced alphabets has been of interest in past research, but their application to PLMs or folding models is unexplored. RESULTS: Here, we investigate the efficacy of PLMs trained on reduced amino acid alphabets in capturing evolutionary information, and we explore how the loss of protein sequence information impacts learned representations and downstream task performance. Our empirical work shows that PLMs trained on the full alphabet and a large number of sequences capture fine details that are lost in alphabet reduction methods. We further show the ability of a structure prediction model(ESMFold) to fold CASP14 protein sequences translated using a reduced alphabet. For 10 proteins out of the 50 targets, reduced alphabets improve structural predictions with LDDT-Cα differences of up to 19%. AVAILABILITY AND IMPLEMENTATION: Trained models and code are available at github.com/Ieremie/reduced-alph-PLM.


Asunto(s)
Pliegue de Proteína , Proteínas , Proteínas/química , Aminoácidos/química , Secuencia de Aminoácidos , Aminas
3.
Bioinformatics ; 38(8): 2269-2277, 2022 04 12.
Artículo en Inglés | MEDLINE | ID: mdl-35176146

RESUMEN

MOTIVATION: Protein-protein interactions (PPIs) play a key role in diverse biological processes but only a small subset of the interactions has been experimentally identified. Additionally, high-throughput experimental techniques that detect PPIs are known to suffer various limitations, such as exaggerated false positives and negatives rates. The semantic similarity derived from the Gene Ontology (GO) annotation is regarded as one of the most powerful indicators for protein interactions. However, while computational approaches for prediction of PPIs have gained popularity in recent years, most methods fail to capture the specificity of GO terms. RESULTS: We propose TransformerGO, a model that is capable of capturing the semantic similarity between GO sets dynamically using an attention mechanism. We generate dense graph embeddings for GO terms using an algorithmic framework for learning continuous representations of nodes in networks called node2vec. TransformerGO learns deep semantic relations between annotated terms and can distinguish between negative and positive interactions with high accuracy. TransformerGO outperforms classic semantic similarity measures on gold standard PPI datasets and state-of-the-art machine-learning-based approaches on large datasets from Saccharomyces cerevisiae and Homo sapiens. We show how the neural attention mechanism embedded in the transformer architecture detects relevant functional terms when predicting interactions. AVAILABILITY AND IMPLEMENTATION: https://github.com/Ieremie/TransformerGO. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Aprendizaje Automático , Humanos , Ontología de Genes , Saccharomyces cerevisiae/genética , Anotación de Secuencia Molecular , Biología Computacional/métodos
4.
Aging Clin Exp Res ; 35(7): 1449-1457, 2023 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-37202598

RESUMEN

BACKGROUND: Osteoarthritis is the most prevalent type of arthritis. Many approaches exist for characterising radiographic knee OA, including machine learning (ML). AIMS: To examine Kellgren and Lawrence (K&L) scores from ML and expert observation, minimum joint space and osteophyte in relation to pain and function. METHODS: Participants from the Hertfordshire Cohort Study, comprising individuals born in Hertfordshire from 1931 to 1939, were analysed. Radiographs were assessed by clinicians and ML (convolutional neural networks) for K&L scoring. Medial minimum joint space and osteophyte area were ascertained using the knee OA computer-aided diagnosis (KOACAD) program. The Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) was administered. Receiver operating characteristic analysis was implemented for minimum joint space, osteophyte, and observer- and ML-derived K&L scores in relation to pain (WOMAC pain score > 0) and impaired function (WOMAC function score > 0). RESULTS: 359 participants (aged 71-80) were analysed. Among both sexes, discriminative capacity regarding pain and function was fairly high for observer-derived K&L scores [area under curve (AUC): 0.65 (95% CI 0.57, 0.72) to 0.70 (0.63, 0.77)]; results were similar among women for ML-derived K&L scores. Discriminative capacity was moderate among men for minimum joint space in relation to pain [0.60 (0.51, 0.67)] and function [0.62 (0.54, 0.69)]. AUC < 0.60 for other sex-specific associations. DISCUSSION: Observer-derived K&L scores had higher discriminative capacity regarding pain and function compared to minimum joint space and osteophyte. Among women, discriminative capacity was similar for observer- and ML-derived K&L scores. CONCLUSION: ML as an adjunct to expert observation for K&L scoring may be beneficial due to the efficiency and objectivity of ML.


Asunto(s)
Osteoartritis de la Rodilla , Osteofito , Masculino , Humanos , Femenino , Osteoartritis de la Rodilla/diagnóstico por imagen , Estudios de Cohortes , Osteofito/diagnóstico por imagen , Articulación de la Rodilla , Dolor , Índice de Severidad de la Enfermedad
5.
Sensors (Basel) ; 23(23)2023 Nov 28.
Artículo en Inglés | MEDLINE | ID: mdl-38067827

RESUMEN

Understanding how the human body works during sleep and how this varies in the population is a task with significant implications for medicine. Polysomnographic studies, or sleep studies, are a common diagnostic method that produces a significant quantity of time-series sensor data. This study seeks to learn the causal structure from data from polysomnographic studies carried out on 600 adult volunteers in the United States. Two methods are used to learn the causal structure of these data: the well-established Granger causality and "DYNOTEARS", a modern approach that uses continuous optimisation to learn dynamic Bayesian networks (DBNs). The results from the two methods are then compared. Both methods produce graphs that have a number of similarities, including the mutual causation between electrooculogram (EOG) and electroencephelogram (EEG) signals and between sleeping position and SpO2 (blood oxygen level). However, DYNOTEARS, unlike Granger causality, frequently finds a causal link to sleeping position from the other variables. Following the creation of these causal graphs, the relationship between the discovered causal structure and the characteristics of the participants is explored. It is found that there is an association between the waist size of a participant and whether a causal link is found between the electrocardiogram (ECG) measurement and the EOG and EEG measurements. It is concluded that a person's body shape appears to impact the relationship between their heart and brain during sleep and that Granger causality and DYNOTEARS can produce differing results on real-world data.


Asunto(s)
Encéfalo , Sueño , Adulto , Humanos , Teorema de Bayes , Causalidad
6.
Entropy (Basel) ; 23(10)2021 Oct 18.
Artículo en Inglés | MEDLINE | ID: mdl-34682084

RESUMEN

In solving challenging pattern recognition problems, deep neural networks have shown excellent performance by forming powerful mappings between inputs and targets, learning representations (features) and making subsequent predictions. A recent tool to help understand how representations are formed is based on observing the dynamics of learning on an information plane using mutual information, linking the input to the representation (I(X;T)) and the representation to the target (I(T;Y)). In this paper, we use an information theoretical approach to understand how Cascade Learning (CL), a method to train deep neural networks layer-by-layer, learns representations, as CL has shown comparable results while saving computation and memory costs. We observe that performance is not linked to information-compression, which differs from observation on End-to-End (E2E) learning. Additionally, CL can inherit information about targets, and gradually specialise extracted features layer-by-layer. We evaluate this effect by proposing an information transition ratio, I(T;Y)/I(X;T), and show that it can serve as a useful heuristic in setting the depth of a neural network that achieves satisfactory accuracy of classification.

7.
J Immunol ; 201(1): 251-263, 2018 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-29769273

RESUMEN

MicroRNAs are small noncoding RNAs that inhibit gene expression posttranscriptionally, implicated in virtually all biological processes. Although the effect of individual microRNAs is generally studied, the genome-wide role of multiple microRNAs is less investigated. We assessed paired genome-wide expression of microRNAs with total (cytoplasmic) and translational (polyribosome-bound) mRNA levels employing subcellular fractionation and RNA sequencing (Frac-seq) in human primary bronchoepithelium from healthy controls and severe asthmatics. Severe asthma is a chronic inflammatory disease of the airways characterized by poor response to therapy. We found genes (i.e., isoforms of a gene) and mRNA isoforms differentially expressed in asthma, with novel inflammatory and structural pathophysiological mechanisms related to bronchoepithelium disclosed solely by polyribosome-bound mRNAs (e.g., IL1A and LTB genes or ITGA6 and ITGA2 alternatively spliced isoforms). Gene expression (i.e., isoforms of a gene) and mRNA expression analysis revealed different molecular candidates and biological pathways, with differentially expressed polyribosome-bound and total mRNAs also showing little overlap. We reveal a hub of six dysregulated microRNAs accounting for ∼90% of all microRNA targeting, displaying preference for polyribosome-bound mRNAs. Transfection of this hub in bronchial epithelial cells from healthy donors mimicked asthma characteristics. Our work demonstrates extensive posttranscriptional gene dysregulation in human asthma, in which microRNAs play a central role, illustrating the feasibility and importance of assessing posttranscriptional gene expression when investigating human disease.


Asunto(s)
Asma/genética , Células Epiteliales/metabolismo , Regulación de la Expresión Génica/genética , MicroARNs/genética , Isoformas de ARN/genética , Mucosa Respiratoria/citología , Adolescente , Adulto , Anciano , Empalme Alternativo/genética , Secuencia de Bases , Femenino , Humanos , Masculino , Persona de Mediana Edad , ARN Mensajero/genética , Análisis de Secuencia de ARN , Encuestas y Cuestionarios , Adulto Joven
8.
BMC Bioinformatics ; 20(1): 536, 2019 Oct 29.
Artículo en Inglés | MEDLINE | ID: mdl-31664894

RESUMEN

BACKGROUND: Analysis of high-throughput multi-'omics interactions across the hierarchy of expression has wide interest in making inferences with regard to biological function and biomarker discovery. Expression levels across different scales are determined by robust synthesis, regulation and degradation processes, and hence transcript (mRNA) measurements made by microarray/RNA-Seq only show modest correlation with corresponding protein levels. RESULTS: In this work we are interested in quantitative modelling of correlation across such gene products. Building on recent work, we develop computational models spanning transcript, translation and protein levels at different stages of the H. sapiens cell cycle. We enhance this analysis by incorporating 25+ sequence-derived features which are likely determinants of cellular protein concentration and quantitatively select for relevant features, producing a vast dataset with thousands of genes. We reveal insights into the complex interplay between expression levels across time, using machine learning methods to highlight outliers with respect to such models as proteins associated with post-translationally regulated modes of action. CONCLUSIONS: We uncover quantitative separation between modified and degraded proteins that have roles in cell cycle regulation, chromatin remodelling and protein catabolism according to Gene Ontology; and highlight the opportunities for providing biological insights in future model systems.


Asunto(s)
División Celular , Perfilación de la Expresión Génica/métodos , Genómica , Humanos , Biosíntesis de Proteínas , Proteínas/genética , Control Social Formal
9.
Neural Comput ; 29(8): 2164-2176, 2017 08.
Artículo en Inglés | MEDLINE | ID: mdl-28562212

RESUMEN

Nonnegative matrix factorization (NMF) is primarily a linear dimensionality reduction technique that factorizes a nonnegative data matrix into two smaller nonnegative matrices: one that represents the basis of the new subspace and the second that holds the coefficients of all the data points in that new space. In principle, the nonnegativity constraint forces the representation to be sparse and parts based. Instead of extracting holistic features from the data, real parts are extracted that should be significantly easier to interpret and analyze. The size of the new subspace selects how many features will be extracted from the data. An effective choice should minimize the noise while extracting the key features. We propose a mechanism for selecting the subspace size by using a minimum description length technique. We demonstrate that our technique provides plausible estimates for real data as well as accurately predicting the known size of synthetic data. We provide an implementation of our code in a Matlab format.

10.
Int J Mol Sci ; 18(2)2017 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-28157153

RESUMEN

Reverse vaccinology (RV) is a bioinformatics approach that can predict antigens with protective potential from the protein coding genomes of bacterial pathogens for subunit vaccine design. RV has become firmly established following the development of the BEXSERO® vaccine against Neisseria meningitidis serogroup B. RV studies have begun to incorporate machine learning (ML) techniques to distinguish bacterial protective antigens (BPAs) from non-BPAs. This research contributes significantly to the RV field by using permutation analysis to demonstrate that a signal for protective antigens can be curated from published data. Furthermore, the effects of the following on an ML approach to RV were also assessed: nested cross-validation, balancing selection of non-BPAs for subcellular localization, increasing the training data, and incorporating greater numbers of protein annotation tools for feature generation. These enhancements yielded a support vector machine (SVM) classifier that could discriminate BPAs (n = 200) from non-BPAs (n = 200) with an area under the curve (AUC) of 0.787. In addition, hierarchical clustering of BPAs revealed that intracellular BPAs clustered separately from extracellular BPAs. However, no immediate benefit was derived when training SVM classifiers on data sets exclusively containing intra- or extracellular BPAs. In conclusion, this work demonstrates that ML classifiers have great utility in RV approaches and will lead to new subunit vaccines in the future.


Asunto(s)
Antígenos Bacterianos/inmunología , Vacunas Bacterianas/inmunología , Biología Computacional/métodos , Aprendizaje Automático , Vacunas de Subunidad/inmunología , Antígenos Bacterianos/genética , Área Bajo la Curva , Proteínas Bacterianas/genética , Proteínas Bacterianas/inmunología , Vacunas Bacterianas/genética , Mapeo Epitopo , Epítopos/genética , Epítopos/inmunología , Humanos , Mutagénesis , Curva ROC , Máquina de Vectores de Soporte , Vacunas de Subunidad/genética
11.
Bioinformatics ; 31(7): 1060-6, 2015 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-25416748

RESUMEN

MOTIVATION: Transcriptional regulatory networks controlling cell fate decisions in mammalian embryonic development remain elusive despite a long time of research. The recent emergence of single-cell RNA profiling technology raises hope for new discovery. Although experimental works have obtained intriguing insights into the mouse early development, a holistic and systematic view is still missing. Mathematical models of cell fates tend to be concept-based, not designed to learn from real data. To elucidate the regulatory mechanisms behind cell fate decisions, it is highly desirable to synthesize the data-driven and knowledge-driven modeling approaches. RESULTS: We propose a novel method that integrates the structure of a cell lineage tree with transcriptional patterns from single-cell data. This method adopts probabilistic Boolean network (PBN) for network modeling, and genetic algorithm as search strategy. Guided by the 'directionality' of cell development along branches of the cell lineage tree, our method is able to accurately infer the regulatory circuits from single-cell gene expression data, in a holistic way. Applied on the single-cell transcriptional data of mouse preimplantation development, our algorithm outperforms conventional methods of network inference. Given the network topology, our method can also identify the operational interactions in the gene regulatory network (GRN), corresponding to specific cell fate determination. This is one of the first attempts to infer GRNs from single-cell transcriptional data, incorporating dynamics of cell development along a cell lineage tree. AVAILABILITY AND IMPLEMENTATION: Implementation of our algorithm is available from the authors upon request. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Diferenciación Celular/genética , Linaje de la Célula/genética , Embrión de Mamíferos/citología , Embrión de Mamíferos/metabolismo , Redes Reguladoras de Genes , Análisis de la Célula Individual/métodos , Animales , Regulación de la Expresión Génica , Ratones , Modelos Teóricos
12.
Bioinformatics ; 31(15): 2530-6, 2015 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-25819671

RESUMEN

BACKGROUND: In high-throughput experimental biology, it is widely acknowledged that while expression levels measured at the levels of transcriptome and the corresponding proteome do not, in general, correlate well, messenger RNA levels are used as convenient proxies for protein levels. Our interest is in developing data-driven computational models that can bridge the gap between these two levels of measurement at which different mechanisms of regulation may act on different molecular species causing any observed lack of correlations. To this end, we build data-driven predictors of protein levels using mRNA levels and known proxies of translation efficiencies as covariates. Previous work showed that in such a setting, outliers with respect to the model are reliable candidates for post-translational regulation. RESULTS: Here, we present and compare two novel formulations of deriving a protein concentration predictor from which outliers may be extracted in a systematic manner. The first approach, outlier rejecting regression, allows explicit specification of a certain fraction of the data as outliers. In a regression setting, this is a non-convex optimization problem which we solve by deriving a difference of convex functions algorithm (DCA). With post-translationally regulated proteins, one expects their concentrations to be affected primarily by disruption of protein stability. Our second algorithm exploits this observation by minimizing an asymmetric loss using quantile regression and extracts outlier proteins whose measured concentrations are lower than what a genome-wide regression would predict. We validate the two approaches on a dataset of yeast transcriptome and proteome. Functional annotation check on detected outliers demonstrate that the methods are able to identify post-translationally regulated genes with high statistical confidence.


Asunto(s)
Biología Computacional/métodos , Proteoma/metabolismo , ARN Mensajero/genética , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/genética , Transcriptoma , Algoritmos , Biomarcadores/análisis , Regulación Fúngica de la Expresión Génica , Genoma Fúngico , Saccharomyces cerevisiae/metabolismo
13.
Bioinformatics ; 29(23): 3060-6, 2013 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-24045772

RESUMEN

MOTIVATION: Despite much dynamical cellular behaviour being achieved by accurate regulation of protein concentrations, messenger RNA abundances, measured by microarray technology, and more recently by deep sequencing techniques, are widely used as proxies for protein measurements. Although for some species and under some conditions, there is good correlation between transcriptome and proteome level measurements, such correlation is by no means universal due to post-transcriptional and post-translational regulation, both of which are highly prevalent in cells. Here, we seek to develop a data-driven machine learning approach to bridging the gap between these two levels of high-throughput omic measurements on Saccharomyces cerevisiae and deploy the model in a novel way to uncover mRNA-protein pairs that are candidates for post-translational regulation. RESULTS: The application of feature selection by sparsity inducing regression (l1 norm regularization) leads to a stable set of features: i.e. mRNA, ribosomal occupancy, ribosome density, tRNA adaptation index and codon bias while achieving a feature reduction from 37 to 5. A linear predictor used with these features is capable of predicting protein concentrations fairly accurately (R² = 0.86). Proteins whose concentration cannot be predicted accurately, taken as outliers with respect to the predictor, are shown to have annotation evidence of post-translational modification, significantly more than random subsets of similar size P < 0.02. In a data mining sense, this work also shows a wider point that outliers with respect to a learning method can carry meaningful information about a problem domain.


Asunto(s)
Biología Computacional/métodos , Regulación Fúngica de la Expresión Génica , Procesamiento Proteico-Postraduccional , Proteoma/análisis , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Transcriptoma , Inteligencia Artificial , Codón/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismo , ARN de Transferencia/genética , ARN de Transferencia/metabolismo , Ribosomas/metabolismo , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo
14.
JCI Insight ; 9(8)2024 Mar 21.
Artículo en Inglés | MEDLINE | ID: mdl-38512356

RESUMEN

BACKGROUNDNovel biomarkers to identify infectious patients transmitting Mycobacterium tuberculosis are urgently needed to control the global tuberculosis (TB) pandemic. We hypothesized that proteins released into the plasma in active pulmonary TB are clinically useful biomarkers to distinguish TB cases from healthy individuals and patients with other respiratory infections.METHODSWe applied a highly sensitive non-depletion tandem mass spectrometry discovery approach to investigate plasma protein expression in pulmonary TB cases compared to healthy controls in South African and Peruvian cohorts. Bioinformatic analysis using linear modeling and network correlation analyses identified 118 differentially expressed proteins, significant through 3 complementary analytical pipelines. Candidate biomarkers were subsequently analyzed in 2 validation cohorts of differing ethnicity using antibody-based proximity extension assays.RESULTSTB-specific host biomarkers were confirmed. A 6-protein diagnostic panel, comprising FETUB, FCGR3B, LRG1, SELL, CD14, and ADA2, differentiated patients with pulmonary TB from healthy controls and patients with other respiratory infections with high sensitivity and specificity in both cohorts.CONCLUSIONThis biomarker panel exceeds the World Health Organization Target Product Profile specificity criteria for a triage test for TB. The new biomarkers have potential for further development as near-patient TB screening assays, thereby helping to close the case-detection gap that fuels the global pandemic.FUNDINGMedical Research Council (MRC) (MR/R001065/1, MR/S024220/1, MR/P023754/1, and MR/W025728/1); the MRC and the UK Foreign Commonwealth and Development Office; the UK National Institute for Health Research (NIHR); the Wellcome Trust (094000, 203135, and CC2112); Starter Grant for Clinical Lecturers (Academy of Medical Sciences UK); the British Infection Association; the Program for Advanced Research Capacities for AIDS in Peru at Universidad Peruana Cayetano Heredia (D43TW00976301) from the Fogarty International Center at the US NIH; the UK Technology Strategy Board/Innovate UK (101556); the Francis Crick Institute, which receives funding from UKRI-MRC (CC2112); Cancer Research UK (CC2112); and the NIHR Biomedical Research Centre of Imperial College NHS.


Asunto(s)
Biomarcadores , Proteómica , Tuberculosis Pulmonar , Humanos , Biomarcadores/sangre , Proteómica/métodos , Masculino , Femenino , Adulto , Tuberculosis Pulmonar/diagnóstico , Tuberculosis Pulmonar/sangre , Mycobacterium tuberculosis , Persona de Mediana Edad , Perú/epidemiología , Sudáfrica/epidemiología , Estudios de Casos y Controles , Sensibilidad y Especificidad
15.
Bioinformatics ; 28(3): 366-72, 2012 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-22130592

RESUMEN

MOTIVATION: Bicoid protein molecules, translated from maternally provided bicoid mRNA, establish a concentration gradient in Drosophila early embryonic development. There is experimental evidence that the synthesis and subsequent destruction of this protein is regulated at source by precise control of the stability of the maternal mRNA. Can we infer the driving function at the source from noisy observations of the spatio-temporal protein profile? We use non-parametric Gaussian process regression for modelling the propagation of Bicoid in the embryo and infer aspects of source regulation as a posterior function. RESULTS: With synthetic data from a 1D diffusion model with a source simulated to model mRNA stability regulation, our results establish that the Gaussian process method can accurately infer the driving function and capture the spatio-temporal dynamics of embryonic Bicoid propagation. On real data from the FlyEx database, too, the reconstructed source function is indicative of stability regulation, but is temporally smoother than what we expected, partly due to the fact that the dataset is only partially observed. To be in line with recent thinking on the subject, we also analyse this model with a spatial gradient of maternal mRNA, rather than being fixed at only the anterior pole. CONTACT: m.niranjan@southampton.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Drosophila/embriología , Drosophila/metabolismo , Proteínas de Homeodominio/genética , Modelos Genéticos , Transactivadores/genética , Animales , Difusión , Drosophila/genética , Proteínas de Drosophila , Embrión no Mamífero/metabolismo , Desarrollo Embrionario , Femenino , Estabilidad del ARN , ARN Mensajero/genética , ARN Mensajero/metabolismo
16.
Bioinformatics ; 28(11): 1501-7, 2012 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-22539674

RESUMEN

MOTIVATION: Traditional models of systems biology describe dynamic biological phenomena as solutions to ordinary differential equations, which, when parameters in them are set to correct values, faithfully mimic observations. Often parameter values are tweaked by hand until desired results are achieved, or computed from biochemical experiments carried out in vitro. Of interest in this article, is the use of probabilistic modelling tools with which parameters and unobserved variables, modelled as hidden states, can be estimated from limited noisy observations of parts of a dynamical system. RESULTS: Here we focus on sequential filtering methods and take a detailed look at the capabilities of three members of this family: (i) extended Kalman filter (EKF), (ii) unscented Kalman filter (UKF) and (iii) the particle filter, in estimating parameters and unobserved states of cellular response to sudden temperature elevation of the bacterium Escherichia coli. While previous literature has studied this system with the EKF, we show that parameter estimation is only possible with this method when the initial guesses are sufficiently close to the true values. The same turns out to be true for the UKF. In this thorough empirical exploration, we show that the non-parametric method of particle filtering is able to reliably estimate parameters and states, converging from initial distributions relatively far away from the underlying true values. AVAILABILITY AND IMPLEMENTATION: Software implementation of the three filters on this problem can be freely downloaded from http://users.ecs.soton.ac.uk/mn/HeatShock


Asunto(s)
Algoritmos , Respuesta al Choque Térmico , Modelos Biológicos , Biología de Sistemas , Animales , Simulación por Computador , Escherichia coli/fisiología , Modelos Estadísticos , Análisis de Regresión , Saccharomycetales/citología , Saccharomycetales/fisiología , Programas Informáticos
17.
Bone ; 168: 116653, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-36581259

RESUMEN

BACKGROUND: Traditional analysis of High Resolution peripheral Quantitative Computed Tomography (HR-pQCT) images results in a multitude of cortical and trabecular parameters which would be potentially cumbersome to interpret for clinicians compared to user-friendly tools utilising clinical parameters. A computer vision approach (by which the entire scan is 'read' by a computer algorithm) to ascertain fracture risk, would be far simpler. We therefore investigated whether a computer vision and machine learning technique could improve upon selected clinical parameters in assessing fracture risk. METHODS: Participants of the Hertfordshire Cohort Study (HCS) attended research visits at which height and weight were measured; fracture history was determined via self-report and vertebral fracture assessment. Bone microarchitecture was assessed via HR-pQCT scans of the non-dominant distal tibia (Scanco XtremeCT), and bone mineral density measurement and lateral vertebral assessment were performed using dual-energy X-ray absorptiometry (DXA) (Lunar Prodigy Advanced). Images were cropped, pre-processed and texture analysis was performed using a three-dimensional local binary pattern method. These image data, together with age, sex, height, weight, BMI, dietary calcium and femoral neck BMD, were used in a random-forest classification algorithm. Receiver operating characteristic (ROC) analysis was used to compare fracture risk identification methods. RESULTS: Overall, 180 males and 165 females were included in this study with a mean age of approximately 76 years and 97 (28 %) participants had sustained a previous fracture. Using clinical risk factors alone resulted in an area under the curve (AUC) of 0.70 (95 % CI: 0.56-0.84), which improved to 0.71 (0.57-0.85) with the addition of DXA-measured BMD. The addition of HR-pQCT image data to the machine learning classifier with clinical risk factors and DXA-measured BMD as inputs led to an improved AUC of 0.90 (0.83-0.96) with a sensitivity of 0.83 and specificity of 0.74. CONCLUSION: These results suggest that using a three-dimensional computer vision method to HR-pQCT scanning may enhance the identification of those at risk of fracture beyond that afforded by clinical risk factors and DXA-measured BMD. This approach has the potential to make the information offered by HR-pQCT more accessible (and therefore) applicable to healthcare professionals in the clinic if the technology becomes more widely available.


Asunto(s)
Fracturas Óseas , Masculino , Femenino , Humanos , Anciano , Absorciometría de Fotón/métodos , Estudios de Cohortes , Fracturas Óseas/diagnóstico por imagen , Densidad Ósea , Factores de Riesgo , Cuello Femoral , Radio (Anatomía)
18.
Neural Comput ; 24(6): 1462-86, 2012 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-22364499

RESUMEN

This letter considers how a number of modern Markov chain Monte Carlo (MCMC) methods can be applied for parameter estimation and inference in state-space models with point process observations. We quantified the efficiencies of these MCMC methods on synthetic data, and our results suggest that the Reimannian manifold Hamiltonian Monte Carlo method offers the best performance. We further compared such a method with a previously tested variational Bayes method on two experimental data sets. Results indicate similar performance on the large data sets and superior performance on small ones. The work offers an extensive suite of MCMC algorithms evaluated on an important class of models for physiological signal analysis.


Asunto(s)
Cadenas de Markov , Modelos Neurológicos , Método de Montecarlo , Algoritmos , Teorema de Bayes , Simulación por Computador , Neuronas/fisiología
19.
PLoS One ; 17(6): e0269159, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35657932

RESUMEN

BACKGROUND: It is estimated that up to 50% of all disease causing variants disrupt splicing. Due to its complexity, our ability to predict which variants disrupt splicing is limited, meaning missed diagnoses for patients. The emergence of machine learning for targeted medicine holds great potential to improve prediction of splice disrupting variants. The recently published SpliceAI algorithm utilises deep neural networks and has been reported to have a greater accuracy than other commonly used methods. METHODS AND FINDINGS: The original SpliceAI was trained on splice sites included in primary isoforms combined with novel junctions observed in GTEx data, which might introduce noise and de-correlate the machine learning input with its output. Limiting the data to only validated and manual annotated primary and alternatively spliced GENCODE sites in training may improve predictive abilities. All of these gene isoforms were collapsed (aggregated into one pseudo-isoform) and the SpliceAI architecture was retrained (CI-SpliceAI). Predictive performance on a newly curated dataset of 1,316 functionally validated variants from the literature was compared with the original SpliceAI, alongside MMSplice, MaxEntScan, and SQUIRLS. Both SpliceAI algorithms outperformed the other methods, with the original SpliceAI achieving an accuracy of ∼91%, and CI-SpliceAI showing an improvement at ∼92% overall. Predictive accuracy increased in the majority of curated variants. CONCLUSIONS: We show that including only manually annotated alternatively spliced sites in training data improves prediction of clinically relevant variants, and highlight avenues for further performance improvements.


Asunto(s)
Sitios de Empalme de ARN , Empalme del ARN , Empalme Alternativo , Humanos , Aprendizaje Automático , Mutación , Redes Neurales de la Computación , Sitios de Empalme de ARN/genética
20.
IEEE/ACM Trans Comput Biol Bioinform ; 19(6): 3340-3352, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-34705655

RESUMEN

Recent advances in high throughput technologies have made large amounts of biomedical omics data accessible to the scientific community. Single omic data clustering has proved its impact in the biomedical and biological research fields. Multi-omic data clustering and multi-omic data integration techniques have shown improved clustering performance and biological insight. Cancer subtype clustering is an important task in the medical field to be able to identify a suitable treatment procedure and prognosis for cancer patients. State of the art multi-view clustering methods are based on non-convex objectives which only guarantee non-global solutions that are high in computational complexity. Only a few convex multi-view methods are present. However, their models do not take into account the intrinsic manifold structure of the data. In this paper, we introduce a convex graph regularized multi-view clustering method that is robust to outliers. We compare our algorithm to state of the art convex and non-convex multi-view and single view clustering methods, and show its superiority in clustering cancer subtypes on publicly available cancer genomic datasets from the TCGA repository. We also show our method's better ability to potentially discover cancer subtypes compared to other state of the art multi-view methods.


Asunto(s)
Multiómica , Neoplasias , Humanos , Genómica/métodos , Algoritmos , Análisis por Conglomerados , Neoplasias/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA