Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 165
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Genome Res ; 34(1): 119-133, 2024 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-38190633

RESUMO

Single-cell technologies offer unprecedented opportunities to dissect gene regulatory mechanisms in context-specific ways. Although there are computational methods for extracting gene regulatory relationships from scRNA-seq and scATAC-seq data, the data integration problem, essential for accurate cell type identification, has been mostly treated as a standalone challenge. Here we present scTIE, a unified method that integrates temporal multimodal data and infers regulatory relationships predictive of cellular state changes. scTIE uses an autoencoder to embed cells from all time points into a common space by using iterative optimal transport, followed by extracting interpretable information to predict cell trajectories. Using a variety of synthetic and real temporal multimodal data sets, we show scTIE achieves effective data integration while preserving more biological signals than existing methods, particularly in the presence of batch effects and noise. Furthermore, on the exemplar multiome data set we generated from differentiating mouse embryonic stem cells over time, we show scTIE captures regulatory elements highly predictive of cell transition probabilities, providing new potentials to understand the regulatory landscape driving developmental processes.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Animais , Camundongos , Perfilação da Expressão Gênica/métodos , Análise de Célula Única/métodos , Regulação da Expressão Gênica
2.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-37096588

RESUMO

The advances of single-cell transcriptomic technologies have led to increasing use of single-cell RNA sequencing (scRNA-seq) data in large-scale patient cohort studies. The resulting high-dimensional data can be summarized and incorporated into patient outcome prediction models in several ways; however, there is a pressing need to understand the impact of analytical decisions on such model quality. In this study, we evaluate the impact of analytical choices on model choices, ensemble learning strategies and integrate approaches on patient outcome prediction using five scRNA-seq COVID-19 datasets. First, we examine the difference in performance between using single-view feature space versus multi-view feature space. Next, we survey multiple learning platforms from classical machine learning to modern deep learning methods. Lastly, we compare different integration approaches when combining datasets is necessary. Through benchmarking such analytical combinations, our study highlights the power of ensemble learning, consistency among different learning methods and robustness to dataset normalization when using multiple datasets as the model input.


Assuntos
Benchmarking , COVID-19 , Humanos , Perfilação da Expressão Gênica , Aprendizado de Máquina , Análise de Sequência de RNA/métodos
3.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36813563

RESUMO

Cell-state transition can reveal additional information from single-cell ribonucleic acid (RNA)-sequencing data in time-resolved biological phenomena. However, most of the current methods are based on the time derivative of the gene expression state, which restricts them to the short-term evolution of cell states. Here, we present single-cell State Transition Across-samples of RNA-seq data (scSTAR), which overcomes this limitation by constructing a paired-cell projection between biological conditions with an arbitrary time span by maximizing the covariance between two feature spaces using partial least square and minimum squared error methods. In mouse ageing data, the response to stress in CD4+ memory T cell subtypes was found to be associated with ageing. A novel Treg subtype characterized by mTORC activation was identified to be associated with antitumour immune suppression, which was confirmed by immunofluorescence microscopy and survival analysis in 11 cancers from The Cancer Genome Atlas Program. On melanoma data, scSTAR improved immunotherapy-response prediction accuracy from 0.8 to 0.96.


Assuntos
Perfilação da Expressão Gênica , RNA , Animais , Camundongos , RNA/genética , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Genoma
4.
Bioinformatics ; 39(6)2023 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-37314966

RESUMO

MOTIVATION: Recent advances in multimodal single-cell omics technologies enable multiple modalities of molecular attributes, such as gene expression, chromatin accessibility, and protein abundance, to be profiled simultaneously at a global level in individual cells. While the increasing availability of multiple data modalities is expected to provide a more accurate clustering and characterization of cells, the development of computational methods that are capable of extracting information embedded across data modalities is still in its infancy. RESULTS: We propose SnapCCESS for clustering cells by integrating data modalities in multimodal single-cell omics data using an unsupervised ensemble deep learning framework. By creating snapshots of embeddings of multimodality using variational autoencoders, SnapCCESS can be coupled with various clustering algorithms for generating consensus clustering of cells. We applied SnapCCESS with several clustering algorithms to various datasets generated from popular multimodal single-cell omics technologies. Our results demonstrate that SnapCCESS is effective and more efficient than conventional ensemble deep learning-based clustering methods and outperforms other state-of-the-art multimodal embedding generation methods in integrating data modalities for clustering cells. The improved clustering of cells from SnapCCESS will pave the way for more accurate characterization of cell identity and types, an essential step for various downstream analyses of multimodal single-cell omics data. AVAILABILITY AND IMPLEMENTATION: SnapCCESS is implemented as a Python package and is freely available from https://github.com/PYangLab/SnapCCESS under the open-source license of GPL-3. The data used in this study are publicly available (see section 'Data availability').


Assuntos
Aprendizado Profundo , Algoritmos , Análise por Conglomerados , Cromatina , Análise de Célula Única
5.
PLoS Biol ; 19(10): e3001419, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34618807

RESUMO

Evolving in sync with the computation revolution over the past 30 years, computational biology has emerged as a mature scientific field. While the field has made major contributions toward improving scientific knowledge and human health, individual computational biology practitioners at various institutions often languish in career development. As optimistic biologists passionate about the future of our field, we propose solutions for both eager and reluctant individual scientists, institutions, publishers, funding agencies, and educators to fully embrace computational biology. We believe that in order to pave the way for the next generation of discoveries, we need to improve recognition for computational biologists and better align pathways of career success with pathways of scientific progress. With 10 outlined steps, we call on all adjacent fields to move away from the traditional individual, single-discipline investigator research model and embrace multidisciplinary, data-driven, team science.


Assuntos
Biologia Computacional , Orçamentos , Comportamento Cooperativo , Humanos , Pesquisa Interdisciplinar , Tutoria , Motivação , Publicações , Recompensa , Software
6.
Nat Methods ; 17(8): 799-806, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32661426

RESUMO

Single-cell genomics has transformed our ability to examine cell fate choice. Examining cells along a computationally ordered 'pseudotime' offers the potential to unpick subtle changes in variability and covariation among key genes. We describe an approach, scHOT-single-cell higher-order testing-which provides a flexible and statistically robust framework for identifying changes in higher-order interactions among genes. scHOT can be applied for cells along a continuous trajectory or across space and accommodates various higher-order measurements including variability or correlation. We demonstrate the use of scHOT by studying coordinated changes in higher-order interactions during embryonic development of the mouse liver. Additionally, scHOT identifies subtle changes in gene-gene correlations across space using spatially resolved transcriptomics data from the mouse olfactory bulb. scHOT meaningfully adds to first-order differential expression testing and provides a framework for interrogating higher-order interactions using single-cell data.


Assuntos
Fígado/embriologia , Análise de Célula Única/métodos , Animais , Biologia Computacional , Bases de Dados de Ácidos Nucleicos , Hepatócitos/fisiologia , Fígado/citologia , Camundongos , Análise de Sequência com Séries de Oligonucleotídeos , Análise de Sequência de RNA , Software
7.
Mod Pathol ; 36(8): 100190, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37080394

RESUMO

Squamous cell carcinoma is the most common head and neck malignancy arising from the oral mucosa and the skin. The histologic and immunohistochemical features of oral squamous cell carcinoma (OSCC) and head and neck cutaneous squamous cell carcinoma (HNcSCC) are similar, making it difficult to identify the primary site in cases of metastases. With the advent of immunotherapy, reliable distinction of OSCC and HNcSCC at metastatic sites has important treatment and prognostic implications. Here, we investigate and compare the genomic landscape of OSCC and HNcSCC to identify diagnostically useful biomarkers. Whole-genome sequencing data from 57 OSCC and 41 HNcSCC patients were obtained for tumor and matched normal samples. Tumor mutation burden (TMB), Catalogue of Somatic Mutations in Cancer (COSMIC) mutational signatures, frequent chromosomal alterations, somatic single nucleotide, and copy number variations were analyzed. The median TMB of 3.75 in primary OSCC was significantly lower (P < .001) than that of 147.51 mutations/Mb in primary HNcSCC. The COSMIC mutation signatures were significantly different (P < .001) between OSCC and HNcSCC. OSCC showed COSMIC single-base substitution (SBS) mutation signature 1 and AID/APOBEC activity-associated signature 2 and/or 13. All except 1 HNcSCC from hair-bearing scalp showed UV damage-associated COSMIC SBS mutation signature 7. Both OSCC and HNcSCC demonstrated a predominance of tumor suppressor gene mutations, predominantly TP53. The most frequently mutated oncogenes were PIK3CA and MUC4 in OSCC and HNcSCC, respectively. The metastases of OSCC and HNcSCC demonstrated TMB and COSMIC SBS mutation signatures similar to their primary counterparts. The combination of high TMB and UV signature in a metastatic keratinizing squamous cell carcinoma suggests HNcSCC as the primary site and may also facilitate decisions regarding immunotherapy. HNcSCC and OSCC show distinct genomic profiles despite histologic and immunohistochemical similarities. Their genomic characteristics may underlie differences in behavior and guide treatment decisions in recurrent and metastatic settings.


Assuntos
Carcinoma de Células Escamosas , Neoplasias de Cabeça e Pescoço , Neoplasias Bucais , Neoplasias Cutâneas , Humanos , Carcinoma de Células Escamosas/genética , Carcinoma de Células Escamosas/patologia , Carcinoma de Células Escamosas de Cabeça e Pescoço/genética , Variações do Número de Cópias de DNA , Neoplasias Bucais/patologia , Neoplasias Cutâneas/genética , Neoplasias Cutâneas/patologia , Neoplasias de Cabeça e Pescoço/genética , Mutação , Genômica , Biomarcadores Tumorais/genética
8.
Bioinformatics ; 38(20): 4745-4753, 2022 10 14.
Artigo em Inglês | MEDLINE | ID: mdl-36040148

RESUMO

MOTIVATION: With the recent surge of large-cohort scale single cell research, it is of critical importance that analytical methods can fully utilize the comprehensive characterization of cellular systems that single cell technologies produce to provide insights into samples from individuals. Currently, there is little consensus on the best ways to compress information from the complex data structures of these technologies to summary statistics that represent each sample (e.g. individuals). RESULTS: Here, we present scFeatures, an approach that creates interpretable cellular and molecular representations of single-cell and spatial data at the sample level. We demonstrate that summarizing a broad collection of features at the sample level is both important for understanding underlying disease mechanisms in different experimental studies and for accurately classifying disease status of individuals. AVAILABILITY AND IMPLEMENTATION: scFeatures is publicly available as an R package at https://github.com/SydneyBioX/scFeatures. All data used in this study are publicly available with accession ID reported in the Section 2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Humanos
9.
PLoS Comput Biol ; 18(10): e1010495, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36197936

RESUMO

COVID-19 patients display a wide range of disease severity, ranging from asymptomatic to critical symptoms with high mortality risk. Our ability to understand the interaction of SARS-CoV-2 infected cells within the lung, and of protective or dysfunctional immune responses to the virus, is critical to effectively treat these patients. Currently, our understanding of cell-cell interactions across different disease states, and how such interactions may drive pathogenic outcomes, is incomplete. Here, we developed a generalizable and scalable workflow for identifying cells that are differentially interacting across COVID-19 patients with distinct disease outcomes and use this to examine eight public single-cell RNA-seq datasets (six from peripheral blood mononuclear cells, one from bronchoalveolar lavage and one from nasopharyngeal), with a total of 211 individual samples. By characterizing the cell-cell interaction patterns across epithelial and immune cells in lung tissues for patients with varying disease severity, we illustrate diverse communication patterns across individuals, and discover heterogeneous communication patterns among moderate and severe patients. We further illustrate patterns derived from cell-cell interactions are potential signatures for discriminating between moderate and severe patients. Overall, this workflow can be generalized and scaled to combine multiple scRNA-seq datasets to uncover cell-cell interactions.


Assuntos
COVID-19 , Comunicação Celular , Humanos , Leucócitos Mononucleares , SARS-CoV-2 , Fluxo de Trabalho
10.
Arterioscler Thromb Vasc Biol ; 42(3): 352-361, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35045730

RESUMO

BACKGROUND: Treating known risk factors for coronary artery disease (CAD) has substantially reduced CAD morbidity and mortality. However, a significant burden of CAD remains unexplained. Immunoglobulin E sensitization to mammalian oligosaccharide galactose-α-1,3-galactose (α-Gal) was recently associated with CAD in a small observational study. We sought to confirm that α-Gal sensitization is associated with CAD burden, in particular noncalcified plaque. Additionally, we sort to assess whether that α-Gal sensitization is associated with ST-segment-elevated myocardial infarction (STEMI) Methods: We performed a cross-sectional analysis of participants enrolled in the BioHEART cohort study. We measured α-Gal specific-immunoglobulin E antibodies in serum of 1056 patients referred for CT coronary angiography for suspected CAD and 100 selected patients presenting with STEMI, enriched for patients without standard modifiable risk factors. CT coronary angiograms were assessed using coronary artery calcium scores and segmental plaque scores. RESULTS: α-Gal sensitization was associated with presence of noncalcified plaque (odds ratio, 1.62 [95% CI, 1.04-2.53], P=0.03) and obstructive CAD (odds ratio, 2.05 [95% CI, 1.29-3.25], P=0.002), independent of age, sex, and traditional risk factors. The α-Gal sensitization rate was 12.8-fold higher in patients with STEMI compared with matched healthy controls and 2.2-fold higher in the patients with STEMI compared with matched stable CAD patients (17% versus 1.3%, P=0.01 and 20% versus 9%, P=0.03, respectively). CONCLUSIONS: α-Gal sensitization is independently associated with noncalcified plaque burden and obstructive CAD and occurs at higher frequency in patients with STEMI than those with stable or no CAD. These findings may have implications for individuals exposed to ticks, as well as public health policy. Registration: URL: https://www.anzctr.org.au; Unique identifier: ACTRN12618001322224.


Assuntos
Doença da Artéria Coronariana/etiologia , Doença da Artéria Coronariana/imunologia , Hipersensibilidade Alimentar/complicações , Placa Aterosclerótica/etiologia , Placa Aterosclerótica/imunologia , Infarto do Miocárdio com Supradesnível do Segmento ST/etiologia , Infarto do Miocárdio com Supradesnível do Segmento ST/imunologia , Idoso , Animais , Estudos de Coortes , Angiografia por Tomografia Computadorizada , Angiografia Coronária , Doença da Artéria Coronariana/diagnóstico por imagem , Estudos Transversais , Dissacarídeos/imunologia , Feminino , Hipersensibilidade Alimentar/imunologia , Humanos , Imunoglobulina E/sangue , Imunoglobulina E/imunologia , Masculino , Pessoa de Meia-Idade , Placa Aterosclerótica/diagnóstico por imagem , Estudos Prospectivos , Fatores de Risco , Índice de Gravidade de Doença , Calcificação Vascular/diagnóstico por imagem
11.
Transpl Int ; 36: 11338, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37767525

RESUMO

Accurate prediction of allograft survival after kidney transplantation allows early identification of at-risk recipients for adverse outcomes and initiation of preventive interventions to optimize post-transplant care. Many prediction algorithms do not model cohort heterogeneity and may lead to inaccurate assessment of longer-term graft outcomes among minority groups. Using data from a national Australian kidney transplant cohort (2008-2017) as the derivation set, we developed P-Cube, a multi-step precision prediction pathway model for predicting overall graft survival in three ethnic subgroups: European Australians, Asian Australians and Aboriginal and Torres Strait Islander Peoples. The concordance index for the European Australians, Asian Australians, and Aboriginal and Torres Strait Islander Peoples subpopulations were 0.99 (0.98-0.99), 0.93 (0.92-0.94) and 0.92 (0.91-0.93), respectively. Similar findings were observed when validating P-cube using an external dataset [Scientific Registry of Transplant Recipient Registry (2006-2020)]. Six sub-categories of recipients with distinct risk factor profiles were identified. Some factors such as blood group compatibility were considered important across the entire transplant population. Other factors such as human leukocyte antigen (HLA)-DR mismatches were unique to older recipients. The P-cube model identifies allograft survival specific risk factors within a heterogenous population and offers personalized survival predictions in a diverse cohort.


Assuntos
Transplante de Rim , Humanos , Transplante de Rim/efeitos adversos , Transplantados , Austrália/epidemiologia , Transplante Homólogo , Aloenxertos
12.
Nature ; 545(7653): 175-180, 2017 05 11.
Artigo em Inglês | MEDLINE | ID: mdl-28467829

RESUMO

Melanoma of the skin is a common cancer only in Europeans, whereas it arises in internal body surfaces (mucosal sites) and on the hands and feet (acral sites) in people throughout the world. Here we report analysis of whole-genome sequences from cutaneous, acral and mucosal subtypes of melanoma. The heavily mutated landscape of coding and non-coding mutations in cutaneous melanoma resolved novel signatures of mutagenesis attributable to ultraviolet radiation. However, acral and mucosal melanomas were dominated by structural changes and mutation signatures of unknown aetiology, not previously identified in melanoma. The number of genes affected by recurrent mutations disrupting non-coding sequences was similar to that affected by recurrent mutations to coding sequences. Significantly mutated genes included BRAF, CDKN2A, NRAS and TP53 in cutaneous melanoma, BRAF, NRAS and NF1 in acral melanoma and SF3B1 in mucosal melanoma. Mutations affecting the TERT promoter were the most frequent of all; however, neither they nor ATRX mutations, which correlate with alternative telomere lengthening, were associated with greater telomere length. Most melanomas had potentially actionable mutations, most in components of the mitogen-activated protein kinase and phosphoinositol kinase pathways. The whole-genome mutation landscape of melanoma reveals diverse carcinogenic processes across its subtypes, some unrelated to sun exposure, and extends potential involvement of the non-coding genome in its pathogenesis.


Assuntos
Genoma Humano/genética , Melanoma/genética , Mutação/genética , DNA Helicases/genética , GTP Fosfo-Hidrolases/genética , Genes p16 , Humanos , Melanoma/classificação , Proteínas de Membrana/genética , Proteínas Quinases Ativadas por Mitógeno/genética , Neurofibromatose 1/genética , Proteínas Nucleares/genética , Fosfoproteínas/genética , Proteínas Proto-Oncogênicas B-raf/genética , Fatores de Processamento de RNA/genética , Transdução de Sinais/efeitos dos fármacos , Telomerase/genética , Telômero/genética , Proteína Supressora de Tumor p53/genética , Raios Ultravioleta/efeitos adversos , Proteína Nuclear Ligada ao X
13.
Genes Chromosomes Cancer ; 61(9): 561-571, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35670448

RESUMO

INTRODUCTION: Oral squamous cell carcinoma (OSCC) in the young (<50 years), without known carcinogenic risk factors, is on the rise globally. Whole genome duplication (WGD) has been shown to occur at higher rates in cancers without an identifiable carcinogenic agent. We aimed to evaluate the prevalence of WGD in a cohort of OSCC patients under the age of 50 years. METHODS: Whole genome sequencing (WGS) was performed on 28 OSCC patients from the Sydney Head and Neck Cancer Institute (SHNCI) biobank. An additional nine cases were obtained from The Cancer Genome Atlas (TCGA). RESULTS: WGD was seen in 27 of 37 (73%) cases. Non-synonymous, somatic TP53 mutations occurred in 25 of 27 (93%) cases of WGD and were predicted to precede WGD in 21 (77%). WGD was significantly associated with larger tumor size (p = 0.01) and was frequent in patients with recurrences (87%, p = 0.36). Overall survival was significantly worse in those with WGD (p = 0.05). CONCLUSIONS: Our data, based on one of the largest WGS datasets of young patients with OSCC, demonstrates a high frequency of WGD and its association with adverse pathologic characteristics and clinical outcomes. TP53 mutations also preceded WGD, as has been described in other tumors without a clear mutagenic driver.


Assuntos
Carcinoma de Células Escamosas , Neoplasias de Cabeça e Pescoço , Neoplasias Bucais , Carcinoma de Células Escamosas/genética , Duplicação Gênica , Neoplasias de Cabeça e Pescoço/genética , Humanos , Pessoa de Meia-Idade , Neoplasias Bucais/genética , Carcinoma de Células Escamosas de Cabeça e Pescoço/genética
14.
Am J Kidney Dis ; 79(4): 549-560, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-34461168

RESUMO

RATIONALE & OBJECTIVE: The risk of developing colorectal cancer in patients with chronic kidney disease (CKD) is twice that of the general population, but the factors associated with colorectal cancer are poorly understood. The aim of this study was to identify factors associated with advanced colorectal neoplasia in patients with CKD. STUDY DESIGN: Prospective cohort study. SETTING & PARTICIPANTS: Patients with CKD stages 3-5, including those treated with maintenance dialysis or transplantation across 11 sites in Australia, New Zealand, Canada, and Spain, were screened for colorectal neoplasia using a fecal immunochemical test (FIT) as part of the Detecting Bowel Cancer in CKD (DETECT) Study. EXPOSURE: Baseline characteristics for patients at the time of study enrollment were ascertained, including duration of CKD, comorbidities, and medications. OUTCOME: Advanced colorectal neoplasia was identified through a 2-step verification process with colonoscopy following positive FIT and 2-year clinical follow-up for all patients. ANALYTICAL APPROACH: Potential factors associated with advanced colorectal neoplasia were explored using multivariable logistic regression. Sensitivity analyses were performed using grouped LASSO (least absolute shrinkage and selection operator) logistic regression. RESULTS: Among 1,706 patients who received FIT-based screening-791 with CKD stages 3-5 not receiving kidney replacement therapy (KRT), 418 receiving dialysis, and 497 patients with a functioning kidney transplant-117 patients (6.9%) were detected to have advanced colorectal neoplasia (54 with CKD stages 3-5 without KRT, 34 receiving dialysis, and 29 transplant recipients), including 9 colorectal cancers. The factors found to be associated with advanced colorectal neoplasia included older age (OR per year older, 1.05 [95% CI, 1.03-1.07], P<0.001), male sex (OR, 2.27 [95% CI, 1.45-3.54], P<0.001), azathioprine use (OR, 2.99 [95% CI, 1.40-6.37], P=0.005), and erythropoiesis-stimulating agent use (OR, 1.92 [95% CI, 1.22-3.03], P=0.005). Grouped LASSO logistic regression revealed similar associations between these factors and advanced colorectal neoplasia. LIMITATIONS: Unmeasured confounding factors. CONCLUSIONS: Older age, male sex, erythropoiesis-stimulating agents, and azathioprine were found to be significantly associated with advanced colorectal neoplasia in patients with CKD.


Assuntos
Neoplasias Colorretais , Insuficiência Renal Crônica , Colonoscopia , Neoplasias Colorretais/diagnóstico , Neoplasias Colorretais/epidemiologia , Fezes , Humanos , Masculino , Sangue Oculto , Estudos Prospectivos , Insuficiência Renal Crônica/complicações , Insuficiência Renal Crônica/epidemiologia , Insuficiência Renal Crônica/terapia , Fatores de Risco
15.
Proc Natl Acad Sci U S A ; 116(20): 9775-9784, 2019 05 14.
Artigo em Inglês | MEDLINE | ID: mdl-31028141

RESUMO

Concerted examination of multiple collections of single-cell RNA sequencing (RNA-seq) data promises further biological insights that cannot be uncovered with individual datasets. Here we present scMerge, an algorithm that integrates multiple single-cell RNA-seq datasets using factor analysis of stably expressed genes and pseudoreplicates across datasets. Using a large collection of public datasets, we benchmark scMerge against published methods and demonstrate that it consistently provides improved cell type separation by removing unwanted factors; scMerge can also enhance biological discovery through robust data integration, which we show through the inference of development trajectory in a liver dataset collection.


Assuntos
Metanálise como Assunto , Análise de Sequência de RNA , Análise de Célula Única , Software , Algoritmos , Animais , Desenvolvimento Embrionário , Análise Fatorial , Expressão Gênica , Humanos , Camundongos
16.
Kidney Int ; 99(4): 817-823, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-32916179

RESUMO

Kidney transplant recipients and transplant physicians face important clinical questions where machine learning methods may help improve the decision-making process. This mini-review explores potential applications of machine learning methods to key stages of a kidney transplant recipient's journey, from initial waitlisting and donor selection, to personalization of immunosuppression and prediction of post-transplantation events. Both unsupervised and supervised machine learning methods are presented, including k-means clustering, principal components analysis, k-nearest neighbors, and random forests. The various challenges of these approaches are also discussed.


Assuntos
Transplante de Rim , Aprendizado de Máquina , Humanos , Transplante de Rim/efeitos adversos , Transplantados
17.
Brief Bioinform ; 20(6): 2316-2326, 2019 11 27.
Artigo em Inglês | MEDLINE | ID: mdl-30137247

RESUMO

Advances in high-throughput sequencing on single-cell gene expressions [single-cell RNA sequencing (scRNA-seq)] have enabled transcriptome profiling on individual cells from complex samples. A common goal in scRNA-seq data analysis is to discover and characterise cell types, typically through clustering methods. The quality of the clustering therefore plays a critical role in biological discovery. While numerous clustering algorithms have been proposed for scRNA-seq data, fundamentally they all rely on a similarity metric for categorising individual cells. Although several studies have compared the performance of various clustering algorithms for scRNA-seq data, currently there is no benchmark of different similarity metrics and their influence on scRNA-seq data clustering. Here, we compared a panel of similarity metrics on clustering a collection of annotated scRNA-seq datasets. Within each dataset, a stratified subsampling procedure was applied and an array of evaluation measures was employed to assess the similarity metrics. This produced a highly reliable and reproducible consensus on their performance assessment. Overall, we found that correlation-based metrics (e.g. Pearson's correlation) outperformed distance-based metrics (e.g. Euclidean distance). To test if the use of correlation-based metrics can benefit the recently published clustering techniques for scRNA-seq data, we modified a state-of-the-art kernel-based clustering algorithm (SIMLR) using Pearson's correlation as a similarity measure and found significant performance improvement over Euclidean distance on scRNA-seq data clustering. These findings demonstrate the importance of similarity metrics in clustering scRNA-seq data and highlight Pearson's correlation as a favourable choice. Further comparison on different scRNA-seq library preparation protocols suggests that they may also affect clustering performance. Finally, the benchmarking framework is available at http://www.maths.usyd.edu.au/u/SMS/bioinformatics/software.html.


Assuntos
Análise de Sequência de RNA , Algoritmos , Análise por Conglomerados , Humanos
18.
Bioinformatics ; 36(14): 4137-4143, 2020 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-32353146

RESUMO

MOTIVATION: Multi-modal profiling of single cells represents one of the latest technological advancements in molecular biology. Among various single-cell multi-modal strategies, cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) allows simultaneous quantification of two distinct species: RNA and cell-surface proteins. Here, we introduce CiteFuse, a streamlined package consisting of a suite of tools for doublet detection, modality integration, clustering, differential RNA and protein expression analysis, antibody-derived tag evaluation, ligand-receptor interaction analysis and interactive web-based visualization of CITE-seq data. RESULTS: We demonstrate the capacity of CiteFuse to integrate the two data modalities and its relative advantage against data generated from single-modality profiling using both simulations and real-world CITE-seq data. Furthermore, we illustrate a novel doublet detection method based on a combined index of cell hashing and transcriptome data. Finally, we demonstrate CiteFuse for predicting ligand-receptor interactions by using multi-modal CITE-seq data. Collectively, we demonstrate the utility and effectiveness of CiteFuse for the integrative analysis of transcriptome and epitope profiles from CITE-seq data. AVAILABILITY AND IMPLEMENTATION: CiteFuse is freely available at http://shiny.maths.usyd.edu.au/CiteFuse/ as an online web service and at https://github.com/SydneyBioX/CiteFuse/ as an R package. CONTACT: pengyi.yang@sydney.edu.au. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Transcriptoma , Epitopos , Perfilação da Expressão Gênica , RNA , Análise de Sequência de RNA , Análise de Célula Única
19.
Mol Syst Biol ; 16(6): e9389, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32567229

RESUMO

Automated cell type identification is a key computational challenge in single-cell RNA-sequencing (scRNA-seq) data. To capitalise on the large collection of well-annotated scRNA-seq datasets, we developed scClassify, a multiscale classification framework based on ensemble learning and cell type hierarchies constructed from single or multiple annotated datasets as references. scClassify enables the estimation of sample size required for accurate classification of cell types in a cell type hierarchy and allows joint classification of cells when multiple references are available. We show that scClassify consistently performs better than other supervised cell type classification methods across 114 pairs of reference and testing data, representing a diverse combination of sizes, technologies and levels of complexity, and further demonstrate the unique components of scClassify through simulations and compendia of experimental datasets. Finally, we demonstrate the scalability of scClassify on large single-cell atlases and highlight a novel application of identifying subpopulations of cells from the Tabula Muris data that were unidentified in the original publication. Together, scClassify represents state-of-the-art methodology in automated cell type identification from scRNA-seq data.


Assuntos
Células/metabolismo , Animais , Análise por Conglomerados , Bases de Dados como Assunto , Humanos , Leucócitos Mononucleares/metabolismo , Aprendizado de Máquina , Camundongos , Pâncreas/metabolismo , Tamanho da Amostra , Software
20.
BMC Pregnancy Childbirth ; 21(1): 277, 2021 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-33823838

RESUMO

BACKGROUND: There is increasing awareness that perinatal psychosocial adversity experienced by mothers, children, and their families, may influence health and well-being across the life course. To maximise the impact of population-based interventions for optimising perinatal wellbeing, health services can utilise empirical methods to identify subgroups at highest risk of poor outcomes relative to the overall population. METHODS: This study sought to identify sub-groups using latent class analysis within a population of mothers in Sydney, Australia, based on their differing experience of self-reported indicators of psychosocial adversity. This study sought to identify sub-groups using latent class analysis within a population of mothers in Sydney, Australia, based on their differing experience of self-reported indicators of psychosocial adversity. Subgroup differences in antenatal and postnatal depressive symptoms were assessed using the Edinburgh Postnatal Depression Scale. RESULTS: Latent class analysis identified four distinct subgroups within the cohort, who were distinguished empirically on the basis of their native language, current smoking status, previous involvement with Family-and-Community Services (FaCS), history of child abuse, presence of a supportive partner, and a history of intimate partner psychological violence. One group consisted of socially supported 'local' women who speak English as their primary language (Group L), another of socially supported 'migrant' women who speak a language other than English as their primary language (Group M), another of socially stressed 'local' women who speak English as their primary language (Group Ls), and socially stressed 'migrant' women who speak a language other than English as their primary language (Group Ms.). Compared to local and not socially stressed residents (L group), the odds of antenatal depression were nearly three times higher for the socially stressed groups (Ls OR: 2.87 95%CI 2.10-3.94) and nearly nine times more in the Ms. group (Ms OR: 8.78, 95%CI 5.13-15.03). Antenatal symptoms of depression were also higher in the not socially stressed migrant group (M OR: 1.70 95%CI 1.47-1.97) compared to non-migrants. In the postnatal period, Group M was 1.5 times more likely, while the Ms. group was over five times more likely to experience suboptimal mental health compared to Group L (OR 1.50, 95%CI 1.22-1.84; and OR 5.28, 95%CI 2.63-10.63, for M and Ms. respectively). CONCLUSIONS: The application of empirical subgrouping analysis permits an informed approach to targeted interventions and resource allocation for optimising perinatal maternal wellbeing.


Assuntos
Depressão Pós-Parto/prevenção & controle , Programas de Rastreamento/organização & administração , Saúde Materna/estatística & dados numéricos , Saúde Mental/estatística & dados numéricos , Adulto , Austrália/epidemiologia , Depressão Pós-Parto/diagnóstico , Depressão Pós-Parto/epidemiologia , Depressão Pós-Parto/psicologia , Registros Eletrônicos de Saúde/estatística & dados numéricos , Feminino , Alocação de Recursos para a Atenção à Saúde , Humanos , Recém-Nascido , Análise de Classes Latentes , Programas de Rastreamento/métodos , Assistência Perinatal/métodos , Assistência Perinatal/organização & administração , Gravidez , Escalas de Graduação Psiquiátrica/estatística & dados numéricos , Estudos Retrospectivos , Medição de Risco/métodos , Autorrelato/estatística & dados numéricos , Determinantes Sociais da Saúde/estatística & dados numéricos , Adulto Jovem
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa