RESUMO
In the enduring challenge against disease, advancements in medical technology have empowered clinicians with novel diagnostic platforms. Whilst in some cases, a single test may provide a confident diagnosis, often additional tests are required. However, to strike a balance between diagnostic accuracy and cost-effectiveness, one must rigorously construct the clinical pathways. Here, we developed a framework to build multi-platform precision pathways in an automated, unbiased way, recommending the key steps a clinician would take to reach a diagnosis. We achieve this by developing a confidence score, used to simulate a clinical scenario, where at each stage, either a confident diagnosis is made, or another test is performed. Our framework provides a range of tools to interpret, visualize and compare the pathways, improving communication and enabling their evaluation on accuracy and cost, specific to different contexts. This framework will guide the development of novel diagnostic pathways for different diseases, accelerating the implementation of precision medicine into clinical practice.
Assuntos
Comunicação , Medicina de Precisão , Processos MentaisRESUMO
Inferring gene regulatory networks (GRNs) is a fundamental challenge in biology that aims to unravel the complex relationships between genes and their regulators. Deciphering these networks plays a critical role in understanding the underlying regulatory crosstalk that drives many cellular processes and diseases. Recent advances in sequencing technology have led to the development of state-of-the-art GRN inference methods that exploit matched single-cell multi-omic data. By employing diverse mathematical and statistical methodologies, these methods aim to reconstruct more comprehensive and precise gene regulatory networks. In this review, we give a brief overview on the statistical and methodological foundations commonly used in GRN inference methods. We then compare and contrast the latest state-of-the-art GRN inference methods for single-cell matched multi-omics data, and discuss their assumptions, limitations and opportunities. Finally, we discuss the challenges and future directions that hold promise for further advancements in this rapidly developing field.
Assuntos
Redes Reguladoras de Genes , Multiômica , Redes Reguladoras de Genes/genéticaRESUMO
Organ shortage is a major barrier in transplantation and rules guarding organ allocation decisions should be robust, transparent, ethical and fair. Whilst numerous allocation strategies have been proposed, it is often unrealistic to evaluate all of them in real-life settings. Hence, the capability of conducting simulations prior to deployment is important. Here, we developed a kidney allocation simulation framework (simKAP) that aims to evaluate the allocation process and the complex clinical decision-making process of organ acceptance in kidney transplantation. Our findings have shown that incorporation of both the clinical decision-making and a dynamic wait-listing process resulted in the best agreement between the actual and simulated data in almost all scenarios. Additionally, several hypothetical risk-based allocation strategies were generated, and we found that these strategies improved recipients' long-term post-transplant patient survival and reduced wait time for transplantation. The importance of simKAP lies in its ability for policymakers in any transplant community to evaluate any proposed allocation algorithm using in-silico simulation.
Assuntos
Transplante de Rim , Obtenção de Tecidos e Órgãos , Transplantes , Humanos , Rim , Tomada de Decisões , Doadores de Tecidos , Alocação de RecursosRESUMO
Accurate prediction of allograft survival after kidney transplantation allows early identification of at-risk recipients for adverse outcomes and initiation of preventive interventions to optimize post-transplant care. Many prediction algorithms do not model cohort heterogeneity and may lead to inaccurate assessment of longer-term graft outcomes among minority groups. Using data from a national Australian kidney transplant cohort (2008-2017) as the derivation set, we developed P-Cube, a multi-step precision prediction pathway model for predicting overall graft survival in three ethnic subgroups: European Australians, Asian Australians and Aboriginal and Torres Strait Islander Peoples. The concordance index for the European Australians, Asian Australians, and Aboriginal and Torres Strait Islander Peoples subpopulations were 0.99 (0.98-0.99), 0.93 (0.92-0.94) and 0.92 (0.91-0.93), respectively. Similar findings were observed when validating P-cube using an external dataset [Scientific Registry of Transplant Recipient Registry (2006-2020)]. Six sub-categories of recipients with distinct risk factor profiles were identified. Some factors such as blood group compatibility were considered important across the entire transplant population. Other factors such as human leukocyte antigen (HLA)-DR mismatches were unique to older recipients. The P-cube model identifies allograft survival specific risk factors within a heterogenous population and offers personalized survival predictions in a diverse cohort.
Assuntos
Transplante de Rim , Humanos , Transplante de Rim/efeitos adversos , Transplantados , Austrália/epidemiologia , Transplante Homólogo , AloenxertosRESUMO
MOTIVATION: Recent advances in multimodal single-cell omics technologies enable multiple modalities of molecular attributes, such as gene expression, chromatin accessibility, and protein abundance, to be profiled simultaneously at a global level in individual cells. While the increasing availability of multiple data modalities is expected to provide a more accurate clustering and characterization of cells, the development of computational methods that are capable of extracting information embedded across data modalities is still in its infancy. RESULTS: We propose SnapCCESS for clustering cells by integrating data modalities in multimodal single-cell omics data using an unsupervised ensemble deep learning framework. By creating snapshots of embeddings of multimodality using variational autoencoders, SnapCCESS can be coupled with various clustering algorithms for generating consensus clustering of cells. We applied SnapCCESS with several clustering algorithms to various datasets generated from popular multimodal single-cell omics technologies. Our results demonstrate that SnapCCESS is effective and more efficient than conventional ensemble deep learning-based clustering methods and outperforms other state-of-the-art multimodal embedding generation methods in integrating data modalities for clustering cells. The improved clustering of cells from SnapCCESS will pave the way for more accurate characterization of cell identity and types, an essential step for various downstream analyses of multimodal single-cell omics data. AVAILABILITY AND IMPLEMENTATION: SnapCCESS is implemented as a Python package and is freely available from https://github.com/PYangLab/SnapCCESS under the open-source license of GPL-3. The data used in this study are publicly available (see section 'Data availability').
Assuntos
Aprendizado Profundo , Algoritmos , Análise por Conglomerados , Cromatina , Análise de Célula ÚnicaRESUMO
Potential benefits of precision medicine in cardiovascular disease (CVD) include more accurate phenotyping of individual patients with the same condition or presentation, using multiple clinical, imaging, molecular and other variables to guide diagnosis and treatment. An approach to realising this potential is the digital twin concept, whereby a virtual representation of a patient is constructed and receives real-time updates of a range of data variables in order to predict disease and optimise treatment selection for the real-life patient. We explored the term digital twin, its defining concepts, the challenges as an emerging field, and potentially important applications in CVD. A mapping review was undertaken using a systematic search of peer-reviewed literature. Industry-based participants and patent applications were identified through web-based sources. Searches of Compendex, EMBASE, Medline, ProQuest and Scopus databases yielded 88 papers related to cardiovascular conditions (28%, n = 25), non-cardiovascular conditions (41%, n = 36), and general aspects of the health digital twin (31%, n = 27). Fifteen companies with a commercial interest in health digital twin or simulation modelling had products focused on CVD. The patent search identified 18 applications from 11 applicants, of which 73% were companies and 27% were universities. Three applicants had cardiac-related inventions. For CVD, digital twin research within industry and academia is recent, interdisciplinary, and established globally. Overall, the applications were numerical simulation models, although precursor models exist for the real-time cyber-physical system characteristic of a true digital twin. Implementation challenges include ethical constraints and clinical barriers to the adoption of decision tools derived from artificial intelligence systems.
RESUMO
MOTIVATION: With the recent surge of large-cohort scale single cell research, it is of critical importance that analytical methods can fully utilize the comprehensive characterization of cellular systems that single cell technologies produce to provide insights into samples from individuals. Currently, there is little consensus on the best ways to compress information from the complex data structures of these technologies to summary statistics that represent each sample (e.g. individuals). RESULTS: Here, we present scFeatures, an approach that creates interpretable cellular and molecular representations of single-cell and spatial data at the sample level. We demonstrate that summarizing a broad collection of features at the sample level is both important for understanding underlying disease mechanisms in different experimental studies and for accurately classifying disease status of individuals. AVAILABILITY AND IMPLEMENTATION: scFeatures is publicly available as an R package at https://github.com/SydneyBioX/scFeatures. All data used in this study are publicly available with accession ID reported in the Section 2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Software , HumanosRESUMO
Single-cell RNA-seq (scRNA-seq) data simulation is critical for evaluating computational methods for analysing scRNA-seq data especially when ground truth is experimentally unattainable. The reliability of evaluation depends on the ability of simulation methods to capture properties of experimental data. However, while many scRNA-seq data simulation methods have been proposed, a systematic evaluation of these methods is lacking. We develop a comprehensive evaluation framework, SimBench, including a kernel density estimation measure to benchmark 12 simulation methods through 35 scRNA-seq experimental datasets. We evaluate the simulation methods on a panel of data properties, ability to maintain biological signals, scalability and applicability. Our benchmark uncovers performance differences among the methods and highlights the varying difficulties in simulating data characteristics. Furthermore, we identify several limitations including maintaining heterogeneity of distribution. These results, together with the framework and datasets made publicly available as R packages, will guide simulation methods selection and their future development.
Assuntos
Benchmarking/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Simulação por Computador , Análise de Dados , Modelos Estatísticos , Reprodutibilidade dos Testes , Projetos de Pesquisa , Análise EspacialRESUMO
Liquid chromatography-mass spectrometry-based metabolomics studies are increasingly applied to large population cohorts, which run for several weeks or even years in data acquisition. This inevitably introduces unwanted intra- and inter-batch variations over time that can overshadow true biological signals and thus hinder potential biological discoveries. To date, normalisation approaches have struggled to mitigate the variability introduced by technical factors whilst preserving biological variance, especially for protracted acquisitions. Here, we propose a study design framework with an arrangement for embedding biological sample replicates to quantify variance within and between batches and a workflow that uses these replicates to remove unwanted variation in a hierarchical manner (hRUV). We use this design to produce a dataset of more than 1000 human plasma samples run over an extended period of time. We demonstrate significant improvement of hRUV over existing methods in preserving biological signals whilst removing unwanted variation for large scale metabolomics studies. Our tools not only provide a strategy for large scale data normalisation, but also provides guidance on the design strategy for large omics studies.
Assuntos
Metabolômica/métodos , Cromatografia Líquida , Humanos , Espectrometria de Massas/métodos , Modelos Biológicos , Fluxo de TrabalhoRESUMO
BACKGROUND: There is increasing awareness that perinatal psychosocial adversity experienced by mothers, children, and their families, may influence health and well-being across the life course. To maximise the impact of population-based interventions for optimising perinatal wellbeing, health services can utilise empirical methods to identify subgroups at highest risk of poor outcomes relative to the overall population. METHODS: This study sought to identify sub-groups using latent class analysis within a population of mothers in Sydney, Australia, based on their differing experience of self-reported indicators of psychosocial adversity. This study sought to identify sub-groups using latent class analysis within a population of mothers in Sydney, Australia, based on their differing experience of self-reported indicators of psychosocial adversity. Subgroup differences in antenatal and postnatal depressive symptoms were assessed using the Edinburgh Postnatal Depression Scale. RESULTS: Latent class analysis identified four distinct subgroups within the cohort, who were distinguished empirically on the basis of their native language, current smoking status, previous involvement with Family-and-Community Services (FaCS), history of child abuse, presence of a supportive partner, and a history of intimate partner psychological violence. One group consisted of socially supported 'local' women who speak English as their primary language (Group L), another of socially supported 'migrant' women who speak a language other than English as their primary language (Group M), another of socially stressed 'local' women who speak English as their primary language (Group Ls), and socially stressed 'migrant' women who speak a language other than English as their primary language (Group Ms.). Compared to local and not socially stressed residents (L group), the odds of antenatal depression were nearly three times higher for the socially stressed groups (Ls OR: 2.87 95%CI 2.10-3.94) and nearly nine times more in the Ms. group (Ms OR: 8.78, 95%CI 5.13-15.03). Antenatal symptoms of depression were also higher in the not socially stressed migrant group (M OR: 1.70 95%CI 1.47-1.97) compared to non-migrants. In the postnatal period, Group M was 1.5 times more likely, while the Ms. group was over five times more likely to experience suboptimal mental health compared to Group L (OR 1.50, 95%CI 1.22-1.84; and OR 5.28, 95%CI 2.63-10.63, for M and Ms. respectively). CONCLUSIONS: The application of empirical subgrouping analysis permits an informed approach to targeted interventions and resource allocation for optimising perinatal maternal wellbeing.
Assuntos
Depressão Pós-Parto/prevenção & controle , Programas de Rastreamento/organização & administração , Saúde Materna/estatística & dados numéricos , Saúde Mental/estatística & dados numéricos , Adulto , Austrália/epidemiologia , Depressão Pós-Parto/diagnóstico , Depressão Pós-Parto/epidemiologia , Depressão Pós-Parto/psicologia , Registros Eletrônicos de Saúde/estatística & dados numéricos , Feminino , Alocação de Recursos para a Atenção à Saúde , Humanos , Recém-Nascido , Análise de Classes Latentes , Programas de Rastreamento/métodos , Assistência Perinatal/métodos , Assistência Perinatal/organização & administração , Gravidez , Escalas de Graduação Psiquiátrica/estatística & dados numéricos , Estudos Retrospectivos , Medição de Risco/métodos , Autorrelato/estatística & dados numéricos , Determinantes Sociais da Saúde/estatística & dados numéricos , Adulto JovemRESUMO
The molecular mechanisms underlying development of the pentameral body of adult echinoderms are poorly understood but are important to solve with respect to evolution of a unique body plan that contrasts with the bilateral body plan of other deuterostomes. As Nodal and BMP2/4 signalling is involved in axis formation in larvae and development of the echinoderm body plan, we used the developmental transcriptome generated for the asterinid seastar Parvulastra exigua to investigate the temporal expression patterns of Nodal and BMP2/4 genes from the embryo and across metamorphosis to the juvenile. For echinoderms, the Asteroidea represents the basal-type body architecture with a distinct (separated) ray structure. Parvulastra exigua has lecithotrophic development forming the juvenile soon after gastrulation providing ready access to the developing adult stage. We identified 39 genes associated with the Nodal and BMP2/4 network in the P. exigua developmental transcriptome. Clustering analysis of these genes resulted in 6 clusters with similar temporal expression patterns across development. A co-expression analysis revealed genes that have similar expression profiles as Nodal and BMP2/4. These results indicated genes that may have a regulatory relationship in patterning morphogenesis of the juvenile seastar. Developmental RNA-seq analyses of Parvulastra exigua show changes in Nodal and BMP2/4 signalling genes across the metamorphic transition. We provide the foundation for detailed analyses of this cascade in the evolution of the unusual pentameral echinoderm body and its deuterostome affinities.
Assuntos
Estrelas-do-Mar , Transcriptoma , Animais , Equinodermos/genética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no DesenvolvimentoRESUMO
Kidney transplant recipients and transplant physicians face important clinical questions where machine learning methods may help improve the decision-making process. This mini-review explores potential applications of machine learning methods to key stages of a kidney transplant recipient's journey, from initial waitlisting and donor selection, to personalization of immunosuppression and prediction of post-transplantation events. Both unsupervised and supervised machine learning methods are presented, including k-means clustering, principal components analysis, k-nearest neighbors, and random forests. The various challenges of these approaches are also discussed.
Assuntos
Transplante de Rim , Aprendizado de Máquina , Humanos , Transplante de Rim/efeitos adversos , TransplantadosRESUMO
Single-cell genomics has transformed our ability to examine cell fate choice. Examining cells along a computationally ordered 'pseudotime' offers the potential to unpick subtle changes in variability and covariation among key genes. We describe an approach, scHOT-single-cell higher-order testing-which provides a flexible and statistically robust framework for identifying changes in higher-order interactions among genes. scHOT can be applied for cells along a continuous trajectory or across space and accommodates various higher-order measurements including variability or correlation. We demonstrate the use of scHOT by studying coordinated changes in higher-order interactions during embryonic development of the mouse liver. Additionally, scHOT identifies subtle changes in gene-gene correlations across space using spatially resolved transcriptomics data from the mouse olfactory bulb. scHOT meaningfully adds to first-order differential expression testing and provides a framework for interrogating higher-order interactions using single-cell data.
Assuntos
Fígado/embriologia , Análise de Célula Única/métodos , Animais , Biologia Computacional , Bases de Dados de Ácidos Nucleicos , Hepatócitos/fisiologia , Fígado/citologia , Camundongos , Análise de Sequência com Séries de Oligonucleotídeos , Análise de Sequência de RNA , SoftwareRESUMO
Automated cell type identification is a key computational challenge in single-cell RNA-sequencing (scRNA-seq) data. To capitalise on the large collection of well-annotated scRNA-seq datasets, we developed scClassify, a multiscale classification framework based on ensemble learning and cell type hierarchies constructed from single or multiple annotated datasets as references. scClassify enables the estimation of sample size required for accurate classification of cell types in a cell type hierarchy and allows joint classification of cells when multiple references are available. We show that scClassify consistently performs better than other supervised cell type classification methods across 114 pairs of reference and testing data, representing a diverse combination of sizes, technologies and levels of complexity, and further demonstrate the unique components of scClassify through simulations and compendia of experimental datasets. Finally, we demonstrate the scalability of scClassify on large single-cell atlases and highlight a novel application of identifying subpopulations of cells from the Tabula Muris data that were unidentified in the original publication. Together, scClassify represents state-of-the-art methodology in automated cell type identification from scRNA-seq data.
Assuntos
Células/metabolismo , Animais , Análise por Conglomerados , Bases de Dados como Assunto , Humanos , Leucócitos Mononucleares/metabolismo , Aprendizado de Máquina , Camundongos , Pâncreas/metabolismo , Tamanho da Amostra , SoftwareRESUMO
MOTIVATION: Multi-modal profiling of single cells represents one of the latest technological advancements in molecular biology. Among various single-cell multi-modal strategies, cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) allows simultaneous quantification of two distinct species: RNA and cell-surface proteins. Here, we introduce CiteFuse, a streamlined package consisting of a suite of tools for doublet detection, modality integration, clustering, differential RNA and protein expression analysis, antibody-derived tag evaluation, ligand-receptor interaction analysis and interactive web-based visualization of CITE-seq data. RESULTS: We demonstrate the capacity of CiteFuse to integrate the two data modalities and its relative advantage against data generated from single-modality profiling using both simulations and real-world CITE-seq data. Furthermore, we illustrate a novel doublet detection method based on a combined index of cell hashing and transcriptome data. Finally, we demonstrate CiteFuse for predicting ligand-receptor interactions by using multi-modal CITE-seq data. Collectively, we demonstrate the utility and effectiveness of CiteFuse for the integrative analysis of transcriptome and epitope profiles from CITE-seq data. AVAILABILITY AND IMPLEMENTATION: CiteFuse is freely available at http://shiny.maths.usyd.edu.au/CiteFuse/ as an online web service and at https://github.com/SydneyBioX/CiteFuse/ as an R package. CONTACT: pengyi.yang@sydney.edu.au. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Software , Transcriptoma , Epitopos , Perfilação da Expressão Gênica , RNA , Análise de Sequência de RNA , Análise de Célula ÚnicaRESUMO
The Echinodermata is characterized by a secondarily evolved pentameral body plan. While the evolutionary origin of this body plan has been the subject of debate, the molecular mechanisms underlying its development are poorly understood. We assembled a de novo developmental transcriptome from the embryo through metamorphosis in the sea star Parvulastra exigua. We use the asteroid model as it represents the basal-type echinoderm body architecture. Global variation in gene expression distinguished the gastrula profile and showed that metamorphic and juvenile stages were more similar to each other than to the pre-metamorphic stages, pointing to the marked changes that occur during metamorphosis. Differential expression and gene ontology (GO) analyses revealed dynamic changes in gene expression throughout development and the transition to pentamery. Many GO terms enriched during late metamorphosis were related to neurogenesis and signalling. Neural transcription factor genes exhibited clusters with distinct expression patterns. A suite of these genes was up-regulated during metamorphosis (e.g. Pax6, Eya, Hey, NeuroD, FoxD, Mbx, and Otp). In situ hybridization showed expression of neural genes in the CNS and sensory structures. Our results provide a foundation to understand the metamorphic transition in echinoderms and the genes involved in development and evolution of pentamery.
Assuntos
Neurogênese/genética , Estrelas-do-Mar/crescimento & desenvolvimento , Fatores de Transcrição/metabolismo , Animais , Evolução Molecular , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento , Estrelas-do-Mar/genéticaRESUMO
BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) is a fast emerging technology allowing global transcriptome profiling on the single cell level. Cell type identification from scRNA-seq data is a critical task in a variety of research such as developmental biology, cell reprogramming, and cancers. Typically, cell type identification relies on human inspection using a combination of prior biological knowledge (e.g. marker genes and morphology) and computational techniques (e.g. PCA and clustering). Due to the incompleteness of our current knowledge and the subjectivity involved in this process, a small amount of cells may be subject to mislabelling. RESULTS: Here, we propose a semi-supervised learning framework, named scReClassify, for 'post hoc' cell type identification from scRNA-seq datasets. Starting from an initial cell type annotation with potentially mislabelled cells, scReClassify first performs dimension reduction using PCA and next applies a semi-supervised learning method to learn and subsequently reclassify cells that are likely mislabelled initially to the most probable cell types. By using both simulated and real-world experimental datasets that profiled various tissues and biological systems, we demonstrate that scReClassify is able to accurately identify and reclassify misclassified cells to their correct cell types. CONCLUSIONS: scReClassify can be used for scRNA-seq data as a post hoc cell type classification tool to fine-tune cell type annotations generated by any cell type classification procedure. It is implemented as an R package and is freely available from https://github.com/SydneyBioX/scReClassify.
Assuntos
RNA-Seq/métodos , Animais , Humanos , Aprendizado de Máquina , Camundongos , Análise de Célula Única/métodos , SoftwareRESUMO
The increasing role played by liquid chromatography-mass spectrometry (LC-MS)-based proteomics in biological discovery has led to a growing need for quality control (QC) on the LC-MS systems. While numerous quality control tools have been developed to track the performance of LC-MS systems based on a pre-defined set of performance factors (e.g., mass error, retention time), the precise influence and contribution of the performance factors and their generalization property to different biological samples are not as well characterized. Here, a web-based application (QCMAP) is developed for interactive diagnosis and prediction of the performance of LC-MS systems across different biological sample types. Leveraging on a standardized HeLa cell sample run as QC within a multi-user facility, predictive models are trained on a panel of commonly used performance factors to pinpoint the precise conditions to a (un)satisfactory performance in three LC-MS systems. It is demonstrated that the learned model can be applied to predict LC-MS system performance for brain samples generated from an independent study. By compiling these predictive models into our web-application, QCMAP allows users to benchmark the performance of their LC-MS systems using their own samples and identify key factors for instrument optimization. QCMAP is freely available from: http://shiny.maths.usyd.edu.au/QCMAP/.
Assuntos
Cromatografia Líquida/métodos , Proteômica/métodos , Controle de Qualidade , Espectrometria de Massas em Tandem/métodos , Linhagem Celular Tumoral , Células HeLa , Humanos , InternetRESUMO
Concerted examination of multiple collections of single-cell RNA sequencing (RNA-seq) data promises further biological insights that cannot be uncovered with individual datasets. Here we present scMerge, an algorithm that integrates multiple single-cell RNA-seq datasets using factor analysis of stably expressed genes and pseudoreplicates across datasets. Using a large collection of public datasets, we benchmark scMerge against published methods and demonstrate that it consistently provides improved cell type separation by removing unwanted factors; scMerge can also enhance biological discovery through robust data integration, which we show through the inference of development trajectory in a liver dataset collection.