Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
Mais filtros

Base de dados
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37874950

RESUMO

Cluster analysis is a crucial stage in the analysis and interpretation of single-cell gene expression (scRNA-seq) data. It is an inherently ill-posed problem whose solutions depend heavily on hyper-parameter and algorithmic choice. The popular approach of K-means clustering, for example, depends heavily on the choice of K and the convergence of the expectation-maximization algorithm to local minima of the objective. Exhaustive search of the space for multiple good quality solutions is known to be a complex problem. Here, we show that quantum computing offers a solution to exploring the cost function of clustering by quantum annealing, implemented on a quantum computing facility offered by D-Wave [1]. Out formulation extracts minimum vertex cover of an affinity graph to sub-sample the cell population and quantum annealing to optimise the cost function. A distribution of low-energy solutions can thus be extracted, offering alternate hypotheses about how genes group together in their space of expressions.


Assuntos
Metodologias Computacionais , Teoria Quântica , RNA-Seq , Análise de Sequência de RNA , Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica
2.
Bioinformatics ; 40(2)2024 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-38310333

RESUMO

MOTIVATION: Protein language models (PLMs), which borrowed ideas for modelling and inference from natural language processing, have demonstrated the ability to extract meaningful representations in an unsupervised way. This led to significant performance improvement in several downstream tasks. Clustering amino acids based on their physical-chemical properties to achieve reduced alphabets has been of interest in past research, but their application to PLMs or folding models is unexplored. RESULTS: Here, we investigate the efficacy of PLMs trained on reduced amino acid alphabets in capturing evolutionary information, and we explore how the loss of protein sequence information impacts learned representations and downstream task performance. Our empirical work shows that PLMs trained on the full alphabet and a large number of sequences capture fine details that are lost in alphabet reduction methods. We further show the ability of a structure prediction model(ESMFold) to fold CASP14 protein sequences translated using a reduced alphabet. For 10 proteins out of the 50 targets, reduced alphabets improve structural predictions with LDDT-Cα differences of up to 19%. AVAILABILITY AND IMPLEMENTATION: Trained models and code are available at github.com/Ieremie/reduced-alph-PLM.


Assuntos
Dobramento de Proteína , Proteínas , Proteínas/química , Aminoácidos/química , Sequência de Aminoácidos , Aminas
3.
Bioinformatics ; 38(8): 2269-2277, 2022 04 12.
Artigo em Inglês | MEDLINE | ID: mdl-35176146

RESUMO

MOTIVATION: Protein-protein interactions (PPIs) play a key role in diverse biological processes but only a small subset of the interactions has been experimentally identified. Additionally, high-throughput experimental techniques that detect PPIs are known to suffer various limitations, such as exaggerated false positives and negatives rates. The semantic similarity derived from the Gene Ontology (GO) annotation is regarded as one of the most powerful indicators for protein interactions. However, while computational approaches for prediction of PPIs have gained popularity in recent years, most methods fail to capture the specificity of GO terms. RESULTS: We propose TransformerGO, a model that is capable of capturing the semantic similarity between GO sets dynamically using an attention mechanism. We generate dense graph embeddings for GO terms using an algorithmic framework for learning continuous representations of nodes in networks called node2vec. TransformerGO learns deep semantic relations between annotated terms and can distinguish between negative and positive interactions with high accuracy. TransformerGO outperforms classic semantic similarity measures on gold standard PPI datasets and state-of-the-art machine-learning-based approaches on large datasets from Saccharomyces cerevisiae and Homo sapiens. We show how the neural attention mechanism embedded in the transformer architecture detects relevant functional terms when predicting interactions. AVAILABILITY AND IMPLEMENTATION: https://github.com/Ieremie/TransformerGO. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Aprendizado de Máquina , Humanos , Ontologia Genética , Saccharomyces cerevisiae/genética , Anotação de Sequência Molecular , Biologia Computacional/métodos
4.
Aging Clin Exp Res ; 35(7): 1449-1457, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37202598

RESUMO

BACKGROUND: Osteoarthritis is the most prevalent type of arthritis. Many approaches exist for characterising radiographic knee OA, including machine learning (ML). AIMS: To examine Kellgren and Lawrence (K&L) scores from ML and expert observation, minimum joint space and osteophyte in relation to pain and function. METHODS: Participants from the Hertfordshire Cohort Study, comprising individuals born in Hertfordshire from 1931 to 1939, were analysed. Radiographs were assessed by clinicians and ML (convolutional neural networks) for K&L scoring. Medial minimum joint space and osteophyte area were ascertained using the knee OA computer-aided diagnosis (KOACAD) program. The Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) was administered. Receiver operating characteristic analysis was implemented for minimum joint space, osteophyte, and observer- and ML-derived K&L scores in relation to pain (WOMAC pain score > 0) and impaired function (WOMAC function score > 0). RESULTS: 359 participants (aged 71-80) were analysed. Among both sexes, discriminative capacity regarding pain and function was fairly high for observer-derived K&L scores [area under curve (AUC): 0.65 (95% CI 0.57, 0.72) to 0.70 (0.63, 0.77)]; results were similar among women for ML-derived K&L scores. Discriminative capacity was moderate among men for minimum joint space in relation to pain [0.60 (0.51, 0.67)] and function [0.62 (0.54, 0.69)]. AUC < 0.60 for other sex-specific associations. DISCUSSION: Observer-derived K&L scores had higher discriminative capacity regarding pain and function compared to minimum joint space and osteophyte. Among women, discriminative capacity was similar for observer- and ML-derived K&L scores. CONCLUSION: ML as an adjunct to expert observation for K&L scoring may be beneficial due to the efficiency and objectivity of ML.


Assuntos
Osteoartrite do Joelho , Osteófito , Masculino , Humanos , Feminino , Osteoartrite do Joelho/diagnóstico por imagem , Estudos de Coortes , Osteófito/diagnóstico por imagem , Articulação do Joelho , Dor , Índice de Gravidade de Doença
5.
Sensors (Basel) ; 23(23)2023 Nov 28.
Artigo em Inglês | MEDLINE | ID: mdl-38067827

RESUMO

Understanding how the human body works during sleep and how this varies in the population is a task with significant implications for medicine. Polysomnographic studies, or sleep studies, are a common diagnostic method that produces a significant quantity of time-series sensor data. This study seeks to learn the causal structure from data from polysomnographic studies carried out on 600 adult volunteers in the United States. Two methods are used to learn the causal structure of these data: the well-established Granger causality and "DYNOTEARS", a modern approach that uses continuous optimisation to learn dynamic Bayesian networks (DBNs). The results from the two methods are then compared. Both methods produce graphs that have a number of similarities, including the mutual causation between electrooculogram (EOG) and electroencephelogram (EEG) signals and between sleeping position and SpO2 (blood oxygen level). However, DYNOTEARS, unlike Granger causality, frequently finds a causal link to sleeping position from the other variables. Following the creation of these causal graphs, the relationship between the discovered causal structure and the characteristics of the participants is explored. It is found that there is an association between the waist size of a participant and whether a causal link is found between the electrocardiogram (ECG) measurement and the EOG and EEG measurements. It is concluded that a person's body shape appears to impact the relationship between their heart and brain during sleep and that Granger causality and DYNOTEARS can produce differing results on real-world data.


Assuntos
Encéfalo , Sono , Adulto , Humanos , Teorema de Bayes , Causalidade
6.
Entropy (Basel) ; 23(10)2021 Oct 18.
Artigo em Inglês | MEDLINE | ID: mdl-34682084

RESUMO

In solving challenging pattern recognition problems, deep neural networks have shown excellent performance by forming powerful mappings between inputs and targets, learning representations (features) and making subsequent predictions. A recent tool to help understand how representations are formed is based on observing the dynamics of learning on an information plane using mutual information, linking the input to the representation (I(X;T)) and the representation to the target (I(T;Y)). In this paper, we use an information theoretical approach to understand how Cascade Learning (CL), a method to train deep neural networks layer-by-layer, learns representations, as CL has shown comparable results while saving computation and memory costs. We observe that performance is not linked to information-compression, which differs from observation on End-to-End (E2E) learning. Additionally, CL can inherit information about targets, and gradually specialise extracted features layer-by-layer. We evaluate this effect by proposing an information transition ratio, I(T;Y)/I(X;T), and show that it can serve as a useful heuristic in setting the depth of a neural network that achieves satisfactory accuracy of classification.

7.
J Immunol ; 201(1): 251-263, 2018 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-29769273

RESUMO

MicroRNAs are small noncoding RNAs that inhibit gene expression posttranscriptionally, implicated in virtually all biological processes. Although the effect of individual microRNAs is generally studied, the genome-wide role of multiple microRNAs is less investigated. We assessed paired genome-wide expression of microRNAs with total (cytoplasmic) and translational (polyribosome-bound) mRNA levels employing subcellular fractionation and RNA sequencing (Frac-seq) in human primary bronchoepithelium from healthy controls and severe asthmatics. Severe asthma is a chronic inflammatory disease of the airways characterized by poor response to therapy. We found genes (i.e., isoforms of a gene) and mRNA isoforms differentially expressed in asthma, with novel inflammatory and structural pathophysiological mechanisms related to bronchoepithelium disclosed solely by polyribosome-bound mRNAs (e.g., IL1A and LTB genes or ITGA6 and ITGA2 alternatively spliced isoforms). Gene expression (i.e., isoforms of a gene) and mRNA expression analysis revealed different molecular candidates and biological pathways, with differentially expressed polyribosome-bound and total mRNAs also showing little overlap. We reveal a hub of six dysregulated microRNAs accounting for ∼90% of all microRNA targeting, displaying preference for polyribosome-bound mRNAs. Transfection of this hub in bronchial epithelial cells from healthy donors mimicked asthma characteristics. Our work demonstrates extensive posttranscriptional gene dysregulation in human asthma, in which microRNAs play a central role, illustrating the feasibility and importance of assessing posttranscriptional gene expression when investigating human disease.


Assuntos
Asma/genética , Células Epiteliais/metabolismo , Regulação da Expressão Gênica/genética , MicroRNAs/genética , Isoformas de RNA/genética , Mucosa Respiratória/citologia , Adolescente , Adulto , Idoso , Processamento Alternativo/genética , Sequência de Bases , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , RNA Mensageiro/genética , Análise de Sequência de RNA , Inquéritos e Questionários , Adulto Jovem
8.
BMC Bioinformatics ; 20(1): 536, 2019 Oct 29.
Artigo em Inglês | MEDLINE | ID: mdl-31664894

RESUMO

BACKGROUND: Analysis of high-throughput multi-'omics interactions across the hierarchy of expression has wide interest in making inferences with regard to biological function and biomarker discovery. Expression levels across different scales are determined by robust synthesis, regulation and degradation processes, and hence transcript (mRNA) measurements made by microarray/RNA-Seq only show modest correlation with corresponding protein levels. RESULTS: In this work we are interested in quantitative modelling of correlation across such gene products. Building on recent work, we develop computational models spanning transcript, translation and protein levels at different stages of the H. sapiens cell cycle. We enhance this analysis by incorporating 25+ sequence-derived features which are likely determinants of cellular protein concentration and quantitatively select for relevant features, producing a vast dataset with thousands of genes. We reveal insights into the complex interplay between expression levels across time, using machine learning methods to highlight outliers with respect to such models as proteins associated with post-translationally regulated modes of action. CONCLUSIONS: We uncover quantitative separation between modified and degraded proteins that have roles in cell cycle regulation, chromatin remodelling and protein catabolism according to Gene Ontology; and highlight the opportunities for providing biological insights in future model systems.


Assuntos
Divisão Celular , Perfilação da Expressão Gênica/métodos , Genômica , Humanos , Biossíntese de Proteínas , Proteínas/genética , Controle Social Formal
9.
Neural Comput ; 29(8): 2164-2176, 2017 08.
Artigo em Inglês | MEDLINE | ID: mdl-28562212

RESUMO

Nonnegative matrix factorization (NMF) is primarily a linear dimensionality reduction technique that factorizes a nonnegative data matrix into two smaller nonnegative matrices: one that represents the basis of the new subspace and the second that holds the coefficients of all the data points in that new space. In principle, the nonnegativity constraint forces the representation to be sparse and parts based. Instead of extracting holistic features from the data, real parts are extracted that should be significantly easier to interpret and analyze. The size of the new subspace selects how many features will be extracted from the data. An effective choice should minimize the noise while extracting the key features. We propose a mechanism for selecting the subspace size by using a minimum description length technique. We demonstrate that our technique provides plausible estimates for real data as well as accurately predicting the known size of synthetic data. We provide an implementation of our code in a Matlab format.

10.
Int J Mol Sci ; 18(2)2017 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-28157153

RESUMO

Reverse vaccinology (RV) is a bioinformatics approach that can predict antigens with protective potential from the protein coding genomes of bacterial pathogens for subunit vaccine design. RV has become firmly established following the development of the BEXSERO® vaccine against Neisseria meningitidis serogroup B. RV studies have begun to incorporate machine learning (ML) techniques to distinguish bacterial protective antigens (BPAs) from non-BPAs. This research contributes significantly to the RV field by using permutation analysis to demonstrate that a signal for protective antigens can be curated from published data. Furthermore, the effects of the following on an ML approach to RV were also assessed: nested cross-validation, balancing selection of non-BPAs for subcellular localization, increasing the training data, and incorporating greater numbers of protein annotation tools for feature generation. These enhancements yielded a support vector machine (SVM) classifier that could discriminate BPAs (n = 200) from non-BPAs (n = 200) with an area under the curve (AUC) of 0.787. In addition, hierarchical clustering of BPAs revealed that intracellular BPAs clustered separately from extracellular BPAs. However, no immediate benefit was derived when training SVM classifiers on data sets exclusively containing intra- or extracellular BPAs. In conclusion, this work demonstrates that ML classifiers have great utility in RV approaches and will lead to new subunit vaccines in the future.


Assuntos
Antígenos de Bactérias/imunologia , Vacinas Bacterianas/imunologia , Biologia Computacional/métodos , Aprendizado de Máquina , Vacinas de Subunidades Antigênicas/imunologia , Antígenos de Bactérias/genética , Área Sob a Curva , Proteínas de Bactérias/genética , Proteínas de Bactérias/imunologia , Vacinas Bacterianas/genética , Mapeamento de Epitopos , Epitopos/genética , Epitopos/imunologia , Humanos , Mutagênese , Curva ROC , Máquina de Vetores de Suporte , Vacinas de Subunidades Antigênicas/genética
11.
Bioinformatics ; 31(7): 1060-6, 2015 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-25416748

RESUMO

MOTIVATION: Transcriptional regulatory networks controlling cell fate decisions in mammalian embryonic development remain elusive despite a long time of research. The recent emergence of single-cell RNA profiling technology raises hope for new discovery. Although experimental works have obtained intriguing insights into the mouse early development, a holistic and systematic view is still missing. Mathematical models of cell fates tend to be concept-based, not designed to learn from real data. To elucidate the regulatory mechanisms behind cell fate decisions, it is highly desirable to synthesize the data-driven and knowledge-driven modeling approaches. RESULTS: We propose a novel method that integrates the structure of a cell lineage tree with transcriptional patterns from single-cell data. This method adopts probabilistic Boolean network (PBN) for network modeling, and genetic algorithm as search strategy. Guided by the 'directionality' of cell development along branches of the cell lineage tree, our method is able to accurately infer the regulatory circuits from single-cell gene expression data, in a holistic way. Applied on the single-cell transcriptional data of mouse preimplantation development, our algorithm outperforms conventional methods of network inference. Given the network topology, our method can also identify the operational interactions in the gene regulatory network (GRN), corresponding to specific cell fate determination. This is one of the first attempts to infer GRNs from single-cell transcriptional data, incorporating dynamics of cell development along a cell lineage tree. AVAILABILITY AND IMPLEMENTATION: Implementation of our algorithm is available from the authors upon request. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Diferenciação Celular/genética , Linhagem da Célula/genética , Embrião de Mamíferos/citologia , Embrião de Mamíferos/metabolismo , Redes Reguladoras de Genes , Análise de Célula Única/métodos , Animais , Regulação da Expressão Gênica , Camundongos , Modelos Teóricos
12.
Bioinformatics ; 31(15): 2530-6, 2015 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-25819671

RESUMO

BACKGROUND: In high-throughput experimental biology, it is widely acknowledged that while expression levels measured at the levels of transcriptome and the corresponding proteome do not, in general, correlate well, messenger RNA levels are used as convenient proxies for protein levels. Our interest is in developing data-driven computational models that can bridge the gap between these two levels of measurement at which different mechanisms of regulation may act on different molecular species causing any observed lack of correlations. To this end, we build data-driven predictors of protein levels using mRNA levels and known proxies of translation efficiencies as covariates. Previous work showed that in such a setting, outliers with respect to the model are reliable candidates for post-translational regulation. RESULTS: Here, we present and compare two novel formulations of deriving a protein concentration predictor from which outliers may be extracted in a systematic manner. The first approach, outlier rejecting regression, allows explicit specification of a certain fraction of the data as outliers. In a regression setting, this is a non-convex optimization problem which we solve by deriving a difference of convex functions algorithm (DCA). With post-translationally regulated proteins, one expects their concentrations to be affected primarily by disruption of protein stability. Our second algorithm exploits this observation by minimizing an asymmetric loss using quantile regression and extracts outlier proteins whose measured concentrations are lower than what a genome-wide regression would predict. We validate the two approaches on a dataset of yeast transcriptome and proteome. Functional annotation check on detected outliers demonstrate that the methods are able to identify post-translationally regulated genes with high statistical confidence.


Assuntos
Biologia Computacional/métodos , Proteoma/metabolismo , RNA Mensageiro/genética , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/genética , Transcriptoma , Algoritmos , Biomarcadores/análise , Regulação Fúngica da Expressão Gênica , Genoma Fúngico , Saccharomyces cerevisiae/metabolismo
13.
Bioinformatics ; 29(23): 3060-6, 2013 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-24045772

RESUMO

MOTIVATION: Despite much dynamical cellular behaviour being achieved by accurate regulation of protein concentrations, messenger RNA abundances, measured by microarray technology, and more recently by deep sequencing techniques, are widely used as proxies for protein measurements. Although for some species and under some conditions, there is good correlation between transcriptome and proteome level measurements, such correlation is by no means universal due to post-transcriptional and post-translational regulation, both of which are highly prevalent in cells. Here, we seek to develop a data-driven machine learning approach to bridging the gap between these two levels of high-throughput omic measurements on Saccharomyces cerevisiae and deploy the model in a novel way to uncover mRNA-protein pairs that are candidates for post-translational regulation. RESULTS: The application of feature selection by sparsity inducing regression (l1 norm regularization) leads to a stable set of features: i.e. mRNA, ribosomal occupancy, ribosome density, tRNA adaptation index and codon bias while achieving a feature reduction from 37 to 5. A linear predictor used with these features is capable of predicting protein concentrations fairly accurately (R² = 0.86). Proteins whose concentration cannot be predicted accurately, taken as outliers with respect to the predictor, are shown to have annotation evidence of post-translational modification, significantly more than random subsets of similar size P < 0.02. In a data mining sense, this work also shows a wider point that outliers with respect to a learning method can carry meaningful information about a problem domain.


Assuntos
Biologia Computacional/métodos , Regulação Fúngica da Expressão Gênica , Processamento de Proteína Pós-Traducional , Proteoma/análise , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Transcriptoma , Inteligência Artificial , Códon/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , RNA de Transferência/genética , RNA de Transferência/metabolismo , Ribossomos/metabolismo , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo
14.
JCI Insight ; 9(8)2024 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-38512356

RESUMO

BACKGROUNDNovel biomarkers to identify infectious patients transmitting Mycobacterium tuberculosis are urgently needed to control the global tuberculosis (TB) pandemic. We hypothesized that proteins released into the plasma in active pulmonary TB are clinically useful biomarkers to distinguish TB cases from healthy individuals and patients with other respiratory infections.METHODSWe applied a highly sensitive non-depletion tandem mass spectrometry discovery approach to investigate plasma protein expression in pulmonary TB cases compared to healthy controls in South African and Peruvian cohorts. Bioinformatic analysis using linear modeling and network correlation analyses identified 118 differentially expressed proteins, significant through 3 complementary analytical pipelines. Candidate biomarkers were subsequently analyzed in 2 validation cohorts of differing ethnicity using antibody-based proximity extension assays.RESULTSTB-specific host biomarkers were confirmed. A 6-protein diagnostic panel, comprising FETUB, FCGR3B, LRG1, SELL, CD14, and ADA2, differentiated patients with pulmonary TB from healthy controls and patients with other respiratory infections with high sensitivity and specificity in both cohorts.CONCLUSIONThis biomarker panel exceeds the World Health Organization Target Product Profile specificity criteria for a triage test for TB. The new biomarkers have potential for further development as near-patient TB screening assays, thereby helping to close the case-detection gap that fuels the global pandemic.FUNDINGMedical Research Council (MRC) (MR/R001065/1, MR/S024220/1, MR/P023754/1, and MR/W025728/1); the MRC and the UK Foreign Commonwealth and Development Office; the UK National Institute for Health Research (NIHR); the Wellcome Trust (094000, 203135, and CC2112); Starter Grant for Clinical Lecturers (Academy of Medical Sciences UK); the British Infection Association; the Program for Advanced Research Capacities for AIDS in Peru at Universidad Peruana Cayetano Heredia (D43TW00976301) from the Fogarty International Center at the US NIH; the UK Technology Strategy Board/Innovate UK (101556); the Francis Crick Institute, which receives funding from UKRI-MRC (CC2112); Cancer Research UK (CC2112); and the NIHR Biomedical Research Centre of Imperial College NHS.


Assuntos
Biomarcadores , Proteômica , Tuberculose Pulmonar , Humanos , Biomarcadores/sangue , Proteômica/métodos , Masculino , Feminino , Adulto , Tuberculose Pulmonar/diagnóstico , Tuberculose Pulmonar/sangue , Mycobacterium tuberculosis , Pessoa de Meia-Idade , Peru/epidemiologia , África do Sul/epidemiologia , Estudos de Casos e Controles , Sensibilidade e Especificidade
15.
Bioinformatics ; 28(3): 366-72, 2012 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-22130592

RESUMO

MOTIVATION: Bicoid protein molecules, translated from maternally provided bicoid mRNA, establish a concentration gradient in Drosophila early embryonic development. There is experimental evidence that the synthesis and subsequent destruction of this protein is regulated at source by precise control of the stability of the maternal mRNA. Can we infer the driving function at the source from noisy observations of the spatio-temporal protein profile? We use non-parametric Gaussian process regression for modelling the propagation of Bicoid in the embryo and infer aspects of source regulation as a posterior function. RESULTS: With synthetic data from a 1D diffusion model with a source simulated to model mRNA stability regulation, our results establish that the Gaussian process method can accurately infer the driving function and capture the spatio-temporal dynamics of embryonic Bicoid propagation. On real data from the FlyEx database, too, the reconstructed source function is indicative of stability regulation, but is temporally smoother than what we expected, partly due to the fact that the dataset is only partially observed. To be in line with recent thinking on the subject, we also analyse this model with a spatial gradient of maternal mRNA, rather than being fixed at only the anterior pole. CONTACT: m.niranjan@southampton.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Drosophila/embriologia , Drosophila/metabolismo , Proteínas de Homeodomínio/genética , Modelos Genéticos , Transativadores/genética , Animais , Difusão , Drosophila/genética , Proteínas de Drosophila , Embrião não Mamífero/metabolismo , Desenvolvimento Embrionário , Feminino , Estabilidade de RNA , RNA Mensageiro/genética , RNA Mensageiro/metabolismo
16.
Bioinformatics ; 28(11): 1501-7, 2012 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-22539674

RESUMO

MOTIVATION: Traditional models of systems biology describe dynamic biological phenomena as solutions to ordinary differential equations, which, when parameters in them are set to correct values, faithfully mimic observations. Often parameter values are tweaked by hand until desired results are achieved, or computed from biochemical experiments carried out in vitro. Of interest in this article, is the use of probabilistic modelling tools with which parameters and unobserved variables, modelled as hidden states, can be estimated from limited noisy observations of parts of a dynamical system. RESULTS: Here we focus on sequential filtering methods and take a detailed look at the capabilities of three members of this family: (i) extended Kalman filter (EKF), (ii) unscented Kalman filter (UKF) and (iii) the particle filter, in estimating parameters and unobserved states of cellular response to sudden temperature elevation of the bacterium Escherichia coli. While previous literature has studied this system with the EKF, we show that parameter estimation is only possible with this method when the initial guesses are sufficiently close to the true values. The same turns out to be true for the UKF. In this thorough empirical exploration, we show that the non-parametric method of particle filtering is able to reliably estimate parameters and states, converging from initial distributions relatively far away from the underlying true values. AVAILABILITY AND IMPLEMENTATION: Software implementation of the three filters on this problem can be freely downloaded from http://users.ecs.soton.ac.uk/mn/HeatShock


Assuntos
Algoritmos , Resposta ao Choque Térmico , Modelos Biológicos , Biologia de Sistemas , Animais , Simulação por Computador , Escherichia coli/fisiologia , Modelos Estatísticos , Análise de Regressão , Saccharomycetales/citologia , Saccharomycetales/fisiologia , Software
17.
Bone ; 168: 116653, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36581259

RESUMO

BACKGROUND: Traditional analysis of High Resolution peripheral Quantitative Computed Tomography (HR-pQCT) images results in a multitude of cortical and trabecular parameters which would be potentially cumbersome to interpret for clinicians compared to user-friendly tools utilising clinical parameters. A computer vision approach (by which the entire scan is 'read' by a computer algorithm) to ascertain fracture risk, would be far simpler. We therefore investigated whether a computer vision and machine learning technique could improve upon selected clinical parameters in assessing fracture risk. METHODS: Participants of the Hertfordshire Cohort Study (HCS) attended research visits at which height and weight were measured; fracture history was determined via self-report and vertebral fracture assessment. Bone microarchitecture was assessed via HR-pQCT scans of the non-dominant distal tibia (Scanco XtremeCT), and bone mineral density measurement and lateral vertebral assessment were performed using dual-energy X-ray absorptiometry (DXA) (Lunar Prodigy Advanced). Images were cropped, pre-processed and texture analysis was performed using a three-dimensional local binary pattern method. These image data, together with age, sex, height, weight, BMI, dietary calcium and femoral neck BMD, were used in a random-forest classification algorithm. Receiver operating characteristic (ROC) analysis was used to compare fracture risk identification methods. RESULTS: Overall, 180 males and 165 females were included in this study with a mean age of approximately 76 years and 97 (28 %) participants had sustained a previous fracture. Using clinical risk factors alone resulted in an area under the curve (AUC) of 0.70 (95 % CI: 0.56-0.84), which improved to 0.71 (0.57-0.85) with the addition of DXA-measured BMD. The addition of HR-pQCT image data to the machine learning classifier with clinical risk factors and DXA-measured BMD as inputs led to an improved AUC of 0.90 (0.83-0.96) with a sensitivity of 0.83 and specificity of 0.74. CONCLUSION: These results suggest that using a three-dimensional computer vision method to HR-pQCT scanning may enhance the identification of those at risk of fracture beyond that afforded by clinical risk factors and DXA-measured BMD. This approach has the potential to make the information offered by HR-pQCT more accessible (and therefore) applicable to healthcare professionals in the clinic if the technology becomes more widely available.


Assuntos
Fraturas Ósseas , Masculino , Feminino , Humanos , Idoso , Absorciometria de Fóton/métodos , Estudos de Coortes , Fraturas Ósseas/diagnóstico por imagem , Densidade Óssea , Fatores de Risco , Colo do Fêmur , Rádio (Anatomia)
18.
Neural Comput ; 24(6): 1462-86, 2012 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-22364499

RESUMO

This letter considers how a number of modern Markov chain Monte Carlo (MCMC) methods can be applied for parameter estimation and inference in state-space models with point process observations. We quantified the efficiencies of these MCMC methods on synthetic data, and our results suggest that the Reimannian manifold Hamiltonian Monte Carlo method offers the best performance. We further compared such a method with a previously tested variational Bayes method on two experimental data sets. Results indicate similar performance on the large data sets and superior performance on small ones. The work offers an extensive suite of MCMC algorithms evaluated on an important class of models for physiological signal analysis.


Assuntos
Cadeias de Markov , Modelos Neurológicos , Método de Monte Carlo , Algoritmos , Teorema de Bayes , Simulação por Computador , Neurônios/fisiologia
19.
PLoS One ; 17(6): e0269159, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35657932

RESUMO

BACKGROUND: It is estimated that up to 50% of all disease causing variants disrupt splicing. Due to its complexity, our ability to predict which variants disrupt splicing is limited, meaning missed diagnoses for patients. The emergence of machine learning for targeted medicine holds great potential to improve prediction of splice disrupting variants. The recently published SpliceAI algorithm utilises deep neural networks and has been reported to have a greater accuracy than other commonly used methods. METHODS AND FINDINGS: The original SpliceAI was trained on splice sites included in primary isoforms combined with novel junctions observed in GTEx data, which might introduce noise and de-correlate the machine learning input with its output. Limiting the data to only validated and manual annotated primary and alternatively spliced GENCODE sites in training may improve predictive abilities. All of these gene isoforms were collapsed (aggregated into one pseudo-isoform) and the SpliceAI architecture was retrained (CI-SpliceAI). Predictive performance on a newly curated dataset of 1,316 functionally validated variants from the literature was compared with the original SpliceAI, alongside MMSplice, MaxEntScan, and SQUIRLS. Both SpliceAI algorithms outperformed the other methods, with the original SpliceAI achieving an accuracy of ∼91%, and CI-SpliceAI showing an improvement at ∼92% overall. Predictive accuracy increased in the majority of curated variants. CONCLUSIONS: We show that including only manually annotated alternatively spliced sites in training data improves prediction of clinically relevant variants, and highlight avenues for further performance improvements.


Assuntos
Sítios de Splice de RNA , Splicing de RNA , Processamento Alternativo , Humanos , Aprendizado de Máquina , Mutação , Redes Neurais de Computação , Sítios de Splice de RNA/genética
20.
IEEE/ACM Trans Comput Biol Bioinform ; 19(6): 3340-3352, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34705655

RESUMO

Recent advances in high throughput technologies have made large amounts of biomedical omics data accessible to the scientific community. Single omic data clustering has proved its impact in the biomedical and biological research fields. Multi-omic data clustering and multi-omic data integration techniques have shown improved clustering performance and biological insight. Cancer subtype clustering is an important task in the medical field to be able to identify a suitable treatment procedure and prognosis for cancer patients. State of the art multi-view clustering methods are based on non-convex objectives which only guarantee non-global solutions that are high in computational complexity. Only a few convex multi-view methods are present. However, their models do not take into account the intrinsic manifold structure of the data. In this paper, we introduce a convex graph regularized multi-view clustering method that is robust to outliers. We compare our algorithm to state of the art convex and non-convex multi-view and single view clustering methods, and show its superiority in clustering cancer subtypes on publicly available cancer genomic datasets from the TCGA repository. We also show our method's better ability to potentially discover cancer subtypes compared to other state of the art multi-view methods.


Assuntos
Multiômica , Neoplasias , Humanos , Genômica/métodos , Algoritmos , Análise por Conglomerados , Neoplasias/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA