ABSTRACT
Haematopoiesis in the bone marrow (BM) maintains blood and immune cell production throughout postnatal life. Haematopoiesis first emerges in human BM at 11-12 weeks after conception1,2, yet almost nothing is known about how fetal BM (FBM) evolves to meet the highly specialized needs of the fetus and newborn. Here we detail the development of FBM, including stroma, using multi-omic assessment of mRNA and multiplexed protein epitope expression. We find that the full blood and immune cell repertoire is established in FBM in a short time window of 6-7 weeks early in the second trimester. FBM promotes rapid and extensive diversification of myeloid cells, with granulocytes, eosinophils and dendritic cell subsets emerging for the first time. The substantial expansion of B lymphocytes in FBM contrasts with fetal liver at the same gestational age. Haematopoietic progenitors from fetal liver, FBM and cord blood exhibit transcriptional and functional differences that contribute to tissue-specific identity and cellular diversification. Endothelial cell types form distinct vascular structures that we show are regionally compartmentalized within FBM. Finally, we reveal selective disruption of B lymphocyte, erythroid and myeloid development owing to a cell-intrinsic differentiation bias as well as extrinsic regulation through an altered microenvironment in Down syndrome (trisomy 21).
Subject(s)
Bone Marrow Cells/cytology , Bone Marrow , Down Syndrome/blood , Down Syndrome/immunology , Fetus/cytology , Hematopoiesis , Immune System/cytology , B-Lymphocytes/cytology , Dendritic Cells/cytology , Down Syndrome/metabolism , Down Syndrome/pathology , Endothelial Cells/pathology , Eosinophils/cytology , Erythroid Cells/cytology , Granulocytes/cytology , Humans , Immunity , Myeloid Cells/cytology , Stromal Cells/cytologyABSTRACT
Definitive haematopoiesis in the fetal liver supports self-renewal and differentiation of haematopoietic stem cells and multipotent progenitors (HSC/MPPs) but remains poorly defined in humans. Here, using single-cell transcriptome profiling of approximately 140,000 liver and 74,000 skin, kidney and yolk sac cells, we identify the repertoire of human blood and immune cells during development. We infer differentiation trajectories from HSC/MPPs and evaluate the influence of the tissue microenvironment on blood and immune cell development. We reveal physiological erythropoiesis in fetal skin and the presence of mast cells, natural killer and innate lymphoid cell precursors in the yolk sac. We demonstrate a shift in the haemopoietic composition of fetal liver during gestation away from being predominantly erythroid, accompanied by a parallel change in differentiation potential of HSC/MPPs, which we functionally validate. Our integrated map of fetal liver haematopoiesis provides a blueprint for the study of paediatric blood and immune disorders, and a reference for harnessing the therapeutic potential of HSC/MPPs.
Subject(s)
Fetus/cytology , Hematopoiesis , Liver/cytology , Liver/embryology , Blood Cells/cytology , Cellular Microenvironment , Female , Fetus/metabolism , Flow Cytometry , Gene Expression Profiling , Humans , Liver/metabolism , Lymphoid Tissue/cytology , Single-Cell Analysis , Stem Cells/metabolismABSTRACT
Foreign proteins are produced by introducing synthetic constructs into host bacteria for biotechnology applications. This process can cause resource competition between synthetic circuits and host cells, placing a metabolic burden on the host cells which may result in load stress and detrimental physiological changes. Consequently, the host bacteria can experience slow growth, and the synthetic system may suffer from suboptimal function. To help in the detection of bacterial load stress, we developed machine-learning strategies to select a minimal number of genes that could serve as biomarkers for the design of load stress reporters. We identified pairs of biomarkers that showed discriminative capacity to detect the load stress states induced in 41 engineered Escherichia coli strains.
Subject(s)
Biotechnology , Escherichia coli , Escherichia coli/metabolism , BacteriaABSTRACT
BACKGROUND: Perceived age (PA) has been associated with mortality, genetic variants linked to ageing and several age-related morbidities. However, estimating PA in large datasets is laborious and costly to generate, limiting its practical applicability. OBJECTIVES: To determine if estimating PA using deep learning-based algorithms results in the same associations with morbidities and genetic variants as human-estimated perceived age. METHODS: Self-supervised learning (SSL) and deep feature transfer (DFT) deep learning (DL) approaches were trained and tested on human-estimated PAs and their corresponding frontal face images of middle-aged to elderly Dutch participants (n = 2679) from a population-based study in the Netherlands. We compared the DL-estimated PAs with morbidities previously associated with human-estimated PA as well as genetic variants in the gene MC1R; we additionally tested the PA associations with MC1R in a new validation cohort (n = 1158). RESULTS: The DL approaches predicted PA in this population with a mean absolute error of 2.84 years (DFT) and 2.39 years (SSL). In the training-test dataset, we found the same significant (p < 0.05) associations for DL PA with osteoporosis, ARHL, cognition, COPD and cataracts and MC1R, as with human PA. We also found a similar but less significant association for SSL and DFT PAs (0.69 and 0.71 years per allele, p = 0.008 and 0.011, respectively) with MC1R variants in the validation dataset as that found with human, SSL and DFT PAs in the training-test dataset (0.79, 0.78 and 0.71 years per allele respectively; all p < 0.0001). CONCLUSIONS: Deep learning methods can automatically estimate PA from facial images with enough accuracy to replicate known links between human-estimated perceived age and several age-related morbidities. Furthermore, DL predicted perceived age associated with MC1R gene variants in a validation cohort. Hence, such DL PA techniques may be used instead of human estimations in perceived age studies thereby reducing time and costs.
ABSTRACT
OBJECTIVES: To identify highly ranked features related to clinicians' diagnosis of clinically relevant knee OA. METHODS: General practitioners (GPs) and secondary care physicians (SPs) were recruited to evaluate 5-10 years follow-up clinical and radiographic data of knees from the CHECK cohort for the presence of clinically relevant OA. GPs and SPs were gathered in pairs; each pair consisted of one GP and one SP, and the paired clinicians independently evaluated the same subset of knees. A diagnosis was made for each knee by the GP and SP before and after viewing radiographic data. Nested 5-fold cross-validation enhanced random forest models were built to identify the top 10 features related to the diagnosis. RESULTS: Seventeen clinician pairs evaluated 1106 knees with 139 clinical and 36 radiographic features. GPs diagnosed clinically relevant OA in 42% and 43% knees, before and after viewing radiographic data, respectively. SPs diagnosed in 43% and 51% knees, respectively. Models containing top 10 features had good performance for explaining clinicians' diagnosis with area under the curve ranging from 0.76-0.83. Before viewing radiographic data, quantitative symptomatic features (i.e. WOMAC scores) were the most important ones related to the diagnosis of both GPs and SPs; after viewing radiographic data, radiographic features appeared in the top lists for both, but seemed to be more important for SPs than GPs. CONCLUSIONS: Random forest models presented good performance in explaining clinicians' diagnosis, which helped to reveal typical features of patients recognized as clinically relevant knee OA by clinicians from two different care settings.
Subject(s)
Osteoarthritis, Knee , Humans , Osteoarthritis, Knee/diagnostic imaging , Osteoarthritis, Knee/complications , Knee JointABSTRACT
OBJECTIVES: Osteoarthritis (OA) patient stratification is an important challenge to design tailored treatments and drive drug development. Biochemical markers reflecting joint tissue turnover were measured in the IMI-APPROACH cohort at baseline and analysed using a machine learning approach in order to study OA-dominant phenotypes driven by the endotype-related clusters and discover the driving features and their disease-context meaning. METHOD: Data quality assessment was performed to design appropriate data preprocessing techniques. The k-means clustering algorithm was used to find dominant subgroups of patients based on the biochemical markers data. Classification models were trained to predict cluster membership, and Explainable AI techniques were used to interpret these to reveal the driving factors behind each cluster and identify phenotypes. Statistical analysis was performed to compare differences between clusters with respect to other markers in the IMI-APPROACH cohort and the longitudinal disease progression. RESULTS: Three dominant endotypes were found, associated with three phenotypes: C1) low tissue turnover (low repair and articular cartilage/subchondral bone turnover), C2) structural damage (high bone formation/resorption, cartilage degradation) and C3) systemic inflammation (joint tissue degradation, inflammation, cartilage degradation). The method achieved consistent results in the FNIH/OAI cohort. C1 had the highest proportion of non-progressors. C2 was mostly linked to longitudinal structural progression, and C3 was linked to sustained or progressive pain. CONCLUSIONS: This work supports the existence of differential phenotypes in OA. The biomarker approach could potentially drive stratification for OA clinical trials and contribute to precision medicine strategies for OA progression in the future. TRIAL REGISTRATION NUMBER: NCT03883568.
Subject(s)
Bone Resorption , Cartilage, Articular , Osteoarthritis, Knee , Biomarkers , Cluster Analysis , Disease Progression , Humans , Inflammation , Osteoarthritis, Knee/drug therapyABSTRACT
OBJECTIVES: The IMI-APPROACH knee osteoarthritis study used machine learning (ML) to predict structural and/or pain progression, expressed by a structural (S) and pain (P) predicted-progression score, to select patients from existing cohorts. This study evaluates the actual 2-year progression within the IMI-APPROACH, in relation to the predicted-progression scores. METHODS: Actual structural progression was measured using minimum joint space width (minJSW). Actual pain (progression) was evaluated using the Knee injury and Osteoarthritis Outcomes Score (KOOS) pain questionnaire. Progression was presented as actual change (Δ) after 2 years, and as progression over 2 years based on a per patient fitted regression line using 0, 0.5, 1 and 2-year values. Differences in predicted-progression scores between actual progressors and non-progressors were evaluated. Receiver operating characteristic (ROC) curves were constructed and corresponding area under the curve (AUC) reported. Using Youden's index, optimal cut-offs were chosen to enable evaluation of both predicted-progression scores to identify actual progressors. RESULTS: Actual structural progressors were initially assigned higher S predicted-progression scores compared with structural non-progressors. Likewise, actual pain progressors were assigned higher P predicted-progression scores compared with pain non-progressors. The AUC-ROC for the S predicted-progression score to identify actual structural progressors was poor (0.612 and 0.599 for Δ and regression minJSW, respectively). The AUC-ROC for the P predicted-progression score to identify actual pain progressors were good (0.817 and 0.830 for Δ and regression KOOS pain, respectively). CONCLUSION: The S and P predicted-progression scores as provided by the ML models developed and used for the selection of IMI-APPROACH patients were to some degree able to distinguish between actual progressors and non-progressors. TRIAL REGISTRATION: ClinicalTrials.gov, https://clinicaltrials.gov, NCT03883568.
Subject(s)
Osteoarthritis, Knee , Humans , Disease Progression , Pain/etiology , Joints , Knee JointABSTRACT
BACKGROUND: The IMI-APPROACH cohort is an exploratory, 5-centre, 2-year prospective follow-up study of knee osteoarthritis (OA). Aim was to describe baseline multi-tissue semiquantitative MRI evaluation of index knees and to describe change for different MRI features based on number of subregion-approaches and change in maximum grades over a 24-month period. METHODS: MRIs were acquired using 1.5 T or 3 T MRI systems and assessed using the semi-quantitative MRI OA Knee Scoring (MOAKS) system. MRIs were read at baseline and 24-months for cartilage damage, bone marrow lesions (BML), osteophytes, meniscal damage and extrusion, and Hoffa- and effusion-synovitis. In descriptive fashion, the frequencies of MRI features at baseline and change in these imaging biomarkers over time are presented for the entire sample in a subregional and maximum score approach for most features. Differences between knees without and with structural radiographic (R) OA are analyzed in addition. RESULTS: Two hundred eighty-nine participants had readable baseline MRI examinations. Mean age was 66.6 ± 7.1 years and participants had a mean BMI of 28.1 ± 5.3 kg/m2. The majority (55.3%) of included knees had radiographic OA. Any change in total cartilage MOAKS score was observed in 53.1% considering full-grade changes only, and in 73.9% including full-grade and within-grade changes. Any medial cartilage progression was seen in 23.9% and any lateral progression on 22.1%. While for the medial and lateral compartments numbers of subregions with improvement and worsening of BMLs were very similar, for the PFJ more improvement was observed compared to worsening (15.5% vs. 9.0%). Including within grade changes, the number of knees showing BML worsening increased from 42.2% to 55.6%. While for some features 24-months change was rare, frequency of change was much more common in knees with vs. without ROA (e.g. worsening of total MOAKS score cartilage in 68.4% of ROA knees vs. 36.7% of no-ROA knees, and 60.7% vs. 21.8% for an increase in maximum BML score per knee). CONCLUSIONS: A wide range of MRI-detected structural pathologies was present in the IMI-APPROACH cohort. Baseline prevalence and change of features was substantially more common in the ROA subgroup compared to the knees without ROA. TRIAL REGISTRATION: Clinicaltrials.gov identification: NCT03883568.
Subject(s)
Cartilage Diseases , Cartilage, Articular , Osteoarthritis, Knee , Aged , Humans , Middle Aged , Biomarkers , Cartilage Diseases/pathology , Cartilage, Articular/diagnostic imaging , Cartilage, Articular/pathology , Follow-Up Studies , Magnetic Resonance Imaging , Osteoarthritis, Knee/diagnostic imaging , Osteoarthritis, Knee/pathology , Prospective StudiesABSTRACT
BACKGROUND: Late-life depression (LLD) is associated with poor social functioning. However, previous research uses bias-prone self-report scales to measure social functioning and a more objective measure is lacking. We tested a novel wearable device to measure speech that participants encounter as an indicator of social interaction. METHODS: Twenty nine participants with LLD and 29 age-matched controls wore a wrist-worn device continuously for seven days, which recorded their acoustic environment. Acoustic data were automatically analysed using deep learning models that had been developed and validated on an independent speech dataset. Total speech activity and the proportion of speech produced by the device wearer were both detected whilst maintaining participants' privacy. Participants underwent a neuropsychological test battery and clinical and self-report scales to measure severity of depression, general and social functioning. RESULTS: Compared to controls, participants with LLD showed poorer self-reported social and general functioning. Total speech activity was much lower for participants with LLD than controls, with no overlap between groups. The proportion of speech produced by the participants was smaller for LLD than controls. In LLD, both speech measures correlated with attention and psychomotor speed performance but not with depression severity or self-reported social functioning. CONCLUSIONS: Using this device, LLD was associated with lower levels of speech than controls and speech activity was related to psychomotor retardation. We have demonstrated that speech activity measured by wearable technology differentiated LLD from controls with high precision and, in this study, provided an objective measure of an aspect of real-world social functioning in LLD.
Subject(s)
Aging/psychology , Deep Learning , Depressive Disorder, Major/psychology , Social Interaction , Speech , Aged , Aged, 80 and over , Attention , Case-Control Studies , England , Female , Humans , Male , Neuropsychological Tests , Social Adjustment , Wearable Electronic DevicesABSTRACT
A goal of the biotechnology industry is to be able to recognise detrimental cellular states that may lead to suboptimal or anomalous growth in a bacterial population. Our current knowledge of how different environmental treatments modulate gene regulation and bring about physiology adaptations is limited, and hence it is difficult to determine the mechanisms that lead to their effects. Patterns of gene expression, revealed using technologies such as microarrays or RNA-seq, can provide useful biomarkers of different gene regulatory states indicative of a bacterium's physiological status. It is desirable to have only a few key genes as the biomarkers to reduce the costs of determining the transcriptional state by opening the way for methods such as quantitative RT-PCR and amplicon panels. In this paper, we used unsupervised machine learning to construct a transcriptional landscape model from condition-dependent transcriptome data, from which we have identified 10 clusters of samples with differentiated gene expression profiles and linked to different cellular growth states. Using an iterative feature elimination strategy, we identified a minimal panel of 10 biomarker genes that achieved 100% cross-validation accuracy in predicting the cluster assignment. Moreover, we designed and evaluated a variety of data processing strategies to ensure our methods were able to generate meaningful transcriptional landscape models, capturing relevant biological processes. Overall, the computational strategies introduced in this study facilitate the identification of a detailed set of relevant cellular growth states, and how to sense them using a reduced biomarker panel.
Subject(s)
Bacillus subtilis , Gene Expression Profiling , Bacillus subtilis/genetics , Biomarkers , Microarray AnalysisABSTRACT
We designed and evaluated an assumption-free, deep learning-based methodology for animal health monitoring, specifically for the early detection of respiratory disease in growing pigs based on environmental sensor data. Two recurrent neural networks (RNNs), each comprising gated recurrent units (GRUs), were used to create an autoencoder (GRU-AE) into which environmental data, collected from a variety of sensors, was processed to detect anomalies. An autoencoder is a type of network trained to reconstruct the patterns it is fed as input. By training the GRU-AE using environmental data that did not lead to an occurrence of respiratory disease, data that did not fit the pattern of "healthy environmental data" had a greater reconstruction error. All reconstruction errors were labelled as either normal or anomalous using threshold-based anomaly detection optimised with particle swarm optimisation (PSO), from which alerts are raised. The results from the GRU-AE method outperformed state-of-the-art techniques, raising alerts when such predictions deviated from the actual observations. The results show that a change in the environment can result in occurrences of pigs showing symptoms of respiratory disease within 1â»7 days, meaning that there is a period of time during which their keepers can act to mitigate the negative effect of respiratory diseases, such as porcine reproductive and respiratory syndrome (PRRS), a common and destructive disease endemic in pigs.
ABSTRACT
BACKGROUND: Current -omics technologies are able to sense the state of a biological sample in a very wide variety of ways. Given the high dimensionality that typically characterises these data, relevant knowledge is often hidden and hard to identify. Machine learning methods, and particularly feature selection algorithms, have proven very effective over the years at identifying small but relevant subsets of variables from a variety of application domains, including -omics data. Many methods exist with varying trade-off between the size of the identified variable subsets and the predictive power of such subsets. In this paper we focus on an heuristic for the identification of biomarkers called RGIFE: Rank Guided Iterative Feature Elimination. RGIFE is guided in its biomarker identification process by the information extracted from machine learning models and incorporates several mechanisms to ensure that it creates minimal and highly predictive features sets. RESULTS: We compare RGIFE against five well-known feature selection algorithms using both synthetic and real (cancer-related transcriptomics) datasets. First, we assess the ability of the methods to identify relevant and highly predictive features. Then, using a prostate cancer dataset as a case study, we look at the biological relevance of the identified biomarkers. CONCLUSIONS: We propose RGIFE, a heuristic for the inference of reduced panels of biomarkers that obtains similar predictive performance to widely adopted feature selection methods while selecting significantly fewer feature. Furthermore, focusing on the case study, we show the higher biological relevance of the biomarkers selected by our approach. The RGIFE source code is available at: http://ico2s.org/software/rgife.html .
Subject(s)
Algorithms , Biomarkers/analysis , User-Computer Interface , Biomarkers/metabolism , Databases, Factual , Humans , Internet , Neoplasms/diagnosis , Neoplasms/genetics , Neoplasms/metabolismABSTRACT
BACKGROUND: Investigations into novel biomarkers using omics techniques generate large amounts of data. Due to their size and numbers of attributes, these data are suitable for analysis with machine learning methods. A key component of typical machine learning pipelines for omics data is feature selection, which is used to reduce the raw high-dimensional data into a tractable number of features. Feature selection needs to balance the objective of using as few features as possible, while maintaining high predictive power. This balance is crucial when the goal of data analysis is the identification of highly accurate but small panels of biomarkers with potential clinical utility. In this paper we propose a heuristic for the selection of very small feature subsets, via an iterative feature elimination process that is guided by rule-based machine learning, called RGIFE (Rule-guided Iterative Feature Elimination). We use this heuristic to identify putative biomarkers of osteoarthritis (OA), articular cartilage degradation and synovial inflammation, using both proteomic and transcriptomic datasets. RESULTS AND DISCUSSION: Our RGIFE heuristic increased the classification accuracies achieved for all datasets when no feature selection is used, and performed well in a comparison with other feature selection methods. Using this method the datasets were reduced to a smaller number of genes or proteins, including those known to be relevant to OA, cartilage degradation and joint inflammation. The results have shown the RGIFE feature reduction method to be suitable for analysing both proteomic and transcriptomics data. Methods that generate large 'omics' datasets are increasingly being used in the area of rheumatology. CONCLUSIONS: Feature reduction methods are advantageous for the analysis of omics data in the field of rheumatology, as the applications of such techniques are likely to result in improvements in diagnosis, treatment and drug discovery.
Subject(s)
Biomarkers/metabolism , Gene Expression Profiling , Heuristics , Machine Learning , Proteomics , Algorithms , Animals , Cartilage/metabolism , Databases, Genetic , Databases, Protein , Dogs , Extracellular Matrix/metabolism , Humans , Inflammation/metabolism , Inflammation/pathologyABSTRACT
The protein structure prediction problem continues to elude scientists. Despite the introduction of many methods, only modest gains were made over the last decade for certain classes of prediction targets. To address this challenge, a social-media based worldwide collaborative effort, named WeFold, was undertaken by 13 labs. During the collaboration, the laboratories were simultaneously competing with each other. Here, we present the first attempt at "coopetition" in scientific research applied to the protein structure prediction and refinement problems. The coopetition was possible by allowing the participating labs to contribute different components of their protein structure prediction pipelines and create new hybrid pipelines that they tested during CASP10. This manuscript describes both successes and areas needing improvement as identified throughout the first WeFold experiment and discusses the efforts that are underway to advance this initiative. A footprint of all contributions and structures are publicly accessible at http://www.wefold.org.
Subject(s)
Computational Biology/methods , Computer Simulation , Cooperative Behavior , Protein Structure, Tertiary , Proteins/ultrastructure , Humans , Models, Molecular , Research Design , Video GamesABSTRACT
The meta-analysis of large-scale postgenomics data sets within public databases promises to provide important novel biological knowledge. Statistical approaches including correlation analyses in coexpression studies of gene expression have emerged as tools to elucidate gene function using these data sets. Here, we present a powerful and novel alternative methodology to computationally identify functional relationships between genes from microarray data sets using rule-based machine learning. This approach, termed "coprediction," is based on the collective ability of groups of genes co-occurring within rules to accurately predict the developmental outcome of a biological system. We demonstrate the utility of coprediction as a powerful analytical tool using publicly available microarray data generated exclusively from Arabidopsis thaliana seeds to compute a functional gene interaction network, termed Seed Co-Prediction Network (SCoPNet). SCoPNet predicts functional associations between genes acting in the same developmental and signal transduction pathways irrespective of the similarity in their respective gene expression patterns. Using SCoPNet, we identified four novel regulators of seed germination (ALTERED SEED GERMINATION5, 6, 7, and 8), and predicted interactions at the level of transcript abundance between these novel and previously described factors influencing Arabidopsis seed germination. An online Web tool to query SCoPNet has been developed as a community resource to dissect seed biology and is available at http://www.vseed.nottingham.ac.uk/.
Subject(s)
Arabidopsis/genetics , Artificial Intelligence , Computational Biology , Germination/genetics , Transcriptome , Algorithms , Gene Expression Regulation, Plant , Gene Regulatory Networks , Internet , Likelihood Functions , Oligonucleotide Array Sequence Analysis , Seeds/genetics , Seeds/growth & developmentABSTRACT
The amount of grey literature and 'softer' intelligence from social media or websites is vast. Given the long lead-times of producing high-quality peer-reviewed health information, this is causing a demand for new ways to provide prompt input for secondary research. To our knowledge, this is the first review of automated data extraction methods or tools for health-related grey literature and soft data, with a focus on (semi)automating horizon scans, health technology assessments (HTA), evidence maps, or other literature reviews. We searched six databases to cover both health- and computer-science literature. After deduplication, 10% of the search results were screened by two reviewers, the remainder was single-screened up to an estimated 95% sensitivity; screening was stopped early after screening an additional 1000 results with no new includes. All full texts were retrieved, screened, and extracted by a single reviewer and 10% were checked in duplicate. We included 84 papers covering automation for health-related social media, internet fora, news, patents, government agencies and charities, or trial registers. From each paper, we extracted data about important functionalities for users of the tool or method; information about the level of support and reliability; and about practical challenges and research gaps. Poor availability of code, data, and usable tools leads to low transparency regarding performance and duplication of work. Financial implications, scalability, integration into downstream workflows, and meaningful evaluations should be carefully planned before starting to develop a tool, given the vast amounts of data and opportunities those tools offer to expedite research.
Subject(s)
Gray Literature , Technology Assessment, Biomedical , Humans , Reproducibility of Results , Automation , InternetABSTRACT
INTRODUCTION: Osteoarthritis (OA) affects over 500 million people worldwide. OA patients are symptomatically treated, and current therapies exhibit marginal efficacy and frequently carry safety-risks associated with chronic use. No disease-modifying therapies have been approved to date leaving surgical joint replacement as a last resort. To enable effective patient care and successful drug development there is an urgent need to uncover the pathobiological drivers of OA and how these translate into disease endotypes. Endotypes provide a more precise and mechanistic definition of disease subgroups than observable phenotypes, and a panel of tissue- and pathology-specific biochemical markers may uncover treatable endotypes of OA. AREAS COVERED: We have searched PubMed for full-text articles written in English to provide an in-depth narrative review of a panel of validated biochemical markers utilized for endotyping of OA and their association to key OA pathologies. EXPERT OPINION: As utilized in IMI-APPROACH and validated in OAI-FNIH, a panel of biochemical markers may uncover disease subgroups and facilitate the enrichment of treatable molecular endotypes for recruitment in therapeutic clinical trials. Understanding the link between biochemical markers and patient-reported outcomes and treatable endotypes that may respond to given therapies will pave the way for new drug development in OA.
Subject(s)
Osteoarthritis , Humans , Osteoarthritis/diagnosis , Osteoarthritis/pathology , Biomarkers , PhenotypeABSTRACT
MOTIVATION: The prediction of a protein's contact map has become in recent years, a crucial stepping stone for the prediction of the complete 3D structure of a protein. In this article, we describe a methodology for this problem that was shown to be successful in CASP8 and CASP9. The methodology is based on (i) the fusion of the prediction of a variety of structural aspects of protein residues, (ii) an ensemble strategy used to facilitate the training process and (iii) a rule-based machine learning system from which we can extract human-readable explanations of the predictor and derive useful information about the contact map representation. RESULTS: The main part of the evaluation is the comparison against the sequence-based contact prediction methods from CASP9, where our method presented the best rank in five out of the six evaluated metrics. We also assess the impact of the size of the ensemble used in our predictor to show the trade-off between performance and training time of our method. Finally, we also study the rule sets generated by our machine learning system. From this analysis, we are able to estimate the contribution of the attributes in our representation and how these interact to derive contact predictions. AVAILABILITY: http://icos.cs.nott.ac.uk/servers/psp.html. CONTACT: natalio.krasnogor@nottingham.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Artificial Intelligence , Computational Biology/methods , Proteins/chemistry , Algorithms , Caspase 8/chemistry , Caspase 9/chemistry , Databases, Protein , Humans , Protein Interaction Domains and MotifsABSTRACT
BACKGROUND: Osteoarthritis (OA) is an inflammatory disease of synovial joints involving the loss and degeneration of articular cartilage. The gold standard for evaluating cartilage loss in OA is the measurement of joint space width on standard radiographs. However, in most cases the diagnosis is made well after the onset of the disease, when the symptoms are well established. Identification of early biomarkers of OA can facilitate earlier diagnosis, improve disease monitoring and predict responses to therapeutic interventions. METHODS: This study describes the bioinformatic analysis of data generated from high throughput proteomics for identification of potential biomarkers of OA. The mass spectrometry data was generated using a canine explant model of articular cartilage treated with the pro-inflammatory cytokine interleukin 1 ß (IL-1ß). The bioinformatics analysis involved the application of machine learning and network analysis to the proteomic mass spectrometry data. A rule based machine learning technique, BioHEL, was used to create a model that classified the samples into their relevant treatment groups by identifying those proteins that separated samples into their respective groups. The proteins identified were considered to be potential biomarkers. Protein networks were also generated; from these networks, proteins pivotal to the classification were identified. RESULTS: BioHEL correctly classified eighteen out of twenty-three samples, giving a classification accuracy of 78.3% for the dataset. The dataset included the four classes of control, IL-1ß, carprofen, and IL-1ß and carprofen together. This exceeded the other machine learners that were used for a comparison, on the same dataset, with the exception of another rule-based method, JRip, which performed equally well. The proteins that were most frequently used in rules generated by BioHEL were found to include a number of relevant proteins including matrix metalloproteinase 3, interleukin 8 and matrix gla protein. CONCLUSIONS: Using this protocol, combining an in vitro model of OA with bioinformatics analysis, a number of relevant extracellular matrix proteins were identified, thereby supporting the application of these bioinformatics tools for analysis of proteomic data from in vitro models of cartilage degradation.
Subject(s)
Cartilage, Articular/metabolism , Proteins/metabolism , Animals , Artificial Intelligence , Dogs , Interleukin-1beta , Male , Mass Spectrometry , Osteoarthritis/etiology , ProteomeABSTRACT
One challenge in the engineering of biological systems is to be able to recognise the cellular stress states of bacterial hosts, as these stress states can lead to suboptimal growth and lower yields of target products. To enable the design of genetic circuits for reporting or mitigating the stress states, it is important to identify a relatively reduced set of gene biomarkers that can reliably indicate relevant cellular growth states in bacteria. Recent advances in high-throughput omics technologies have enhanced the identification of molecular biomarkers specific states in bacteria, motivating computational methods that can identify robust biomarkers for experimental characterisation and verification. Focused on identifying gene expression biomarkers to sense various stress states in Bacillus subtilis, this study aimed to design a knowledge integration strategy for the selection of a robust biomarker panel that generalises on external datasets and experiments. We developed a recommendation system that ranks the candidate biomarker panels based on complementary information from machine learning model, gene regulatory network and co-expression network. We identified a recommended biomarker panel showing high stress sensing power for a variety of conditions both in the dataset used for biomarker identification (mean f1-score achieved at 0.99), as well as in a range of independent datasets (mean f1-score achieved at 0.98). We discovered a significant correlation between stress sensing power and evaluation metrics such as the number of associated regulators in a B. subtilis gene regulatory network (GRN) and the number of associated modules in a B. subtilis co-expression network (CEN). GRNs and CENs provide information relevant to the diversity of biological processes encoded by biomarker genes. We demonstrate that quantitatively relating meaningful evaluation metrics with stress sensing power has the potential for recognising biomarkers that show better sensitivity and robustness to an extended set of stress conditions and enable a more reliable biomarker panel selection.