Pesquisa | Biblioteca Virtual em Saúde Fiocruz

1.

sgcocaller and comapr: personalised haplotype assembly and comparative crossover map analysis using single-gamete sequencing data.

Lyu, Ruqian; Tsui, Vanessa; Crismani, Wayne; Liu, Ruijie; Shim, Heejung; McCarthy, Davis J.

Nucleic Acids Res ; 50(20): e118, 2022 11 11.

Artigo em Inglês | MEDLINE | ID: mdl-36107768

RESUMO

Profiling gametes of an individual enables the construction of personalised haplotypes and meiotic crossover landscapes, now achievable at larger scale than ever through the availability of high-throughput single-cell sequencing technologies. However, high-throughput single-gamete data commonly have low depth of coverage per gamete, which challenges existing gamete-based haplotype phasing methods. In addition, haplotyping a large number of single gametes from high-throughput single-cell DNA sequencing data and constructing meiotic crossover profiles using existing methods requires intensive processing. Here, we introduce efficient software tools for the essential tasks of generating personalised haplotypes and calling crossovers in gametes from single-gamete DNA sequencing data (sgcocaller), and constructing, visualising, and comparing individualised crossover landscapes from single gametes (comapr). With additional data pre-possessing, the tools can also be applied to bulk-sequenced samples. We demonstrate that sgcocaller is able to generate impeccable phasing results for high-coverage datasets, on which it is more accurate and stable than existing methods, and also performs well on low-coverage single-gamete sequencing datasets for which current methods fail. Our tools achieve highly accurate results with user-friendly installation, comprehensive documentation, efficient computation times and minimal memory usage.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA , Algoritmos , Células Germinativas , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Análise da Expressão Gênica de Célula Única , Software , Troca Genética

2.

Cardelino: computational integration of somatic clonal substructure and single-cell transcriptomes.

McCarthy, Davis J; Rostom, Raghd; Huang, Yuanhua; Kunz, Daniel J; Danecek, Petr; Bonder, Marc Jan; Hagai, Tzachi; Lyu, Ruqian; Wang, Wenyi; Gaffney, Daniel J; Simons, Benjamin D; Stegle, Oliver; Teichmann, Sarah A.

Nat Methods ; 17(4): 414-421, 2020 04.

Artigo em Inglês | MEDLINE | ID: mdl-32203388

RESUMO

Bulk and single-cell DNA sequencing has enabled reconstructing clonal substructures of somatic tissues from frequency and cooccurrence patterns of somatic variants. However, approaches to characterize phenotypic variations between clones are not established. Here we present cardelino (https://github.com/single-cell-genetics/cardelino), a computational method for inferring the clonal tree configuration and the clone of origin of individual cells assayed using single-cell RNA-seq (scRNA-seq). Cardelino flexibly integrates information from imperfect clonal trees inferred based on bulk exome-seq data, and sparse variant alleles expressed in scRNA-seq data. We apply cardelino to a published cancer dataset and to newly generated matched scRNA-seq and exome-seq data from 32 human dermal fibroblast lines, identifying hundreds of differentially expressed genes between cells from different somatic clones. These genes are frequently enriched for cell cycle and proliferation pathways, indicating a role for cell division genes in somatic evolution in healthy skin.

Assuntos

Fibroblastos/metabolismo , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Software , Algoritmos , Ciclo Celular , Proliferação de Células , Humanos , Melanoma , Mutação , Transcriptoma

3.

Common genetic variation drives molecular heterogeneity in human iPSCs.

Kilpinen, Helena; Goncalves, Angela; Leha, Andreas; Afzal, Vackar; Alasoo, Kaur; Ashford, Sofie; Bala, Sendu; Bensaddek, Dalila; Casale, Francesco Paolo; Culley, Oliver J; Danecek, Petr; Faulconbridge, Adam; Harrison, Peter W; Kathuria, Annie; McCarthy, Davis; McCarthy, Shane A; Meleckyte, Ruta; Memari, Yasin; Moens, Nathalie; Soares, Filipa; Mann, Alice; Streeter, Ian; Agu, Chukwuma A; Alderton, Alex; Nelson, Rachel; Harper, Sarah; Patel, Minal; White, Alistair; Patel, Sharad R; Clarke, Laura; Halai, Reena; Kirton, Christopher M; Kolb-Kokocinski, Anja; Beales, Philip; Birney, Ewan; Danovi, Davide; Lamond, Angus I; Ouwehand, Willem H; Vallier, Ludovic; Watt, Fiona M; Durbin, Richard; Stegle, Oliver; Gaffney, Daniel J.

Nature ; 546(7658): 370-375, 2017 06 15.

Artigo em Inglês | MEDLINE | ID: mdl-28489815

RESUMO

Technology utilizing human induced pluripotent stem cells (iPS cells) has enormous potential to provide improved cellular models of human disease. However, variable genetic and phenotypic characterization of many existing iPS cell lines limits their potential use for research and therapy. Here we describe the systematic generation, genotyping and phenotyping of 711 iPS cell lines derived from 301 healthy individuals by the Human Induced Pluripotent Stem Cells Initiative. Our study outlines the major sources of genetic and phenotypic variation in iPS cells and establishes their suitability as models of complex human traits and cancer. Through genome-wide profiling we find that 5-46% of the variation in different iPS cell phenotypes, including differentiation capacity and cellular morphology, arises from differences between individuals. Additionally, we assess the phenotypic consequences of genomic copy-number alterations that are repeatedly observed in iPS cells. In addition, we present a comprehensive map of common regulatory variants affecting the transcriptome of human pluripotent cells.

Assuntos

Variação Genética/genética , Células-Tronco Pluripotentes Induzidas/metabolismo , Células Cultivadas , Reprogramação Celular/genética , Variações do Número de Cópias de DNA/genética , Regulação da Expressão Gênica/genética , Genótipo , Humanos , Especificidade de Órgãos , Fenótipo , Controle de Qualidade , Locos de Características Quantitativas/genética , Transcriptoma/genética

4.

Corrigendum: Common genetic variation drives molecular heterogeneity in human iPSCs.

Kilpinen, Helena; Goncalves, Angela; Leha, Andreas; Afzal, Vackar; Alasoo, Kaur; Ashford, Sofie; Bala, Sendu; Bensaddek, Dalila; Casale, Francesco Paolo; Culley, Oliver J; Danecek, Petr; Faulconbridge, Adam; Harrison, Peter W; Kathuria, Annie; McCarthy, Davis; McCarthy, Shane A; Meleckyte, Ruta; Memari, Yasin; Moens, Nathalie; Soares, Filipa; Mann, Alice; Streeter, Ian; Agu, Chukwuma A; Alderton, Alex; Nelson, Rachel; Harper, Sarah; Patel, Minal; White, Alistair; Patel, Sharad R; Clarke, Laura; Halai, Reena; Kirton, Christopher M; Kolb-Kokocinski, Anja; Beales, Philip; Birney, Ewan; Danovi, Davide; Lamond, Angus I; Ouwehand, Willem H; Vallier, Ludovic; Watt, Fiona M; Durbin, Richard; Stegle, Oliver; Gaffney, Daniel J.

Nature ; 546(7660): 686, 2017 06 29.

Artigo em Inglês | MEDLINE | ID: mdl-28614302

RESUMO

This corrects the article DOI: 10.1038/nature22403.

5.

Trade-off between conservation of biological variation and batch effect removal in deep generative modeling for single-cell transcriptomics.

Li, Hui; McCarthy, Davis J; Shim, Heejung; Wei, Susan.

BMC Bioinformatics ; 23(1): 460, 2022 Nov 03.

Artigo em Inglês | MEDLINE | ID: mdl-36329399

RESUMO

BACKGROUND: Single-cell RNA sequencing (scRNA-seq) technology has contributed significantly to diverse research areas in biology, from cancer to development. Since scRNA-seq data is high-dimensional, a common strategy is to learn low-dimensional latent representations better to understand overall structure in the data. In this work, we build upon scVI, a powerful deep generative model which can learn biologically meaningful latent representations, but which has limited explicit control of batch effects. Rather than prioritizing batch effect removal over conservation of biological variation, or vice versa, our goal is to provide a bird's eye view of the trade-offs between these two conflicting objectives. Specifically, using the well established concept of Pareto front from economics and engineering, we seek to learn the entire trade-off curve between conservation of biological variation and removal of batch effects. RESULTS: A multi-objective optimisation technique known as Pareto multi-task learning (Pareto MTL) is used to obtain the Pareto front between conservation of biological variation and batch effect removal. Our results indicate Pareto MTL can obtain a better Pareto front than the naive scalarization approach typically encountered in the literature. In addition, we propose to measure batch effect by applying a neural-network based estimator called Mutual Information Neural Estimation (MINE) and show benefits over the more standard maximum mean discrepancy measure. CONCLUSION: The Pareto front between conservation of biological variation and batch effect removal is a valuable tool for researchers in computational biology. Our results demonstrate the efficacy of applying Pareto MTL to estimate the Pareto front in conjunction with applying MINE to measure the batch effect.

Assuntos

Algoritmos , Transcriptoma , Biologia Computacional/métodos , Análise de Célula Única

6.

Key signaling networks are dysregulated in patients with the adipose tissue disorder, lipedema.

Ishaq, Musarat; Bandara, Nadeeka; Morgan, Steven; Nowell, Cameron; Mehdi, Ahmad M; Lyu, Ruqian; McCarthy, Davis; Anderson, Dovile; Creek, Darren J; Achen, Marc G; Shayan, Ramin; Karnezis, Tara.

Int J Obes (Lond) ; 46(3): 502-514, 2022 03.

Artigo em Inglês | MEDLINE | ID: mdl-34764426

RESUMO

OBJECTIVES: Lipedema, a poorly understood chronic disease of adipose hyper-deposition, is often mistaken for obesity and causes significant impairment to mobility and quality-of-life. To identify molecular mechanisms underpinning lipedema, we employed comprehensive omics-based comparative analyses of whole tissue, adipocyte precursors (adipose-derived stem cells (ADSCs)), and adipocytes from patients with or without lipedema. METHODS: We compared whole-tissues, ADSCs, and adipocytes from body mass index-matched lipedema (n = 14) and unaffected (n = 10) patients using comprehensive global lipidomic and metabolomic analyses, transcriptional profiling, and functional assays. RESULTS: Transcriptional profiling revealed >4400 significant differences in lipedema tissue, with altered levels of mRNAs involved in critical signaling and cell function-regulating pathways (e.g., lipid metabolism and cell-cycle/proliferation). Functional assays showed accelerated ADSC proliferation and differentiation in lipedema. Profiling lipedema adipocytes revealed >900 changes in lipid composition and >600 differentially altered metabolites. Transcriptional profiling of lipedema ADSCs and non-lipedema ADSCs revealed significant differential expression of >3400 genes including some involved in extracellular matrix and cell-cycle/proliferation signaling pathways. One upregulated gene in lipedema ADSCs, Bub1, encodes a cell-cycle regulator, central to the kinetochore complex, which regulates several histone proteins involved in cell proliferation. Downstream signaling analysis of lipedema ADSCs demonstrated enhanced activation of histone H2A, a key cell proliferation driver and Bub1 target. Critically, hyperproliferation exhibited by lipedema ADSCs was inhibited by the small molecule Bub1 inhibitor 2OH-BNPP1 and by CRISPR/Cas9-mediated Bub1 gene depletion. CONCLUSION: We found significant differences in gene expression, and lipid and metabolite profiles, in tissue, ADSCs, and adipocytes from lipedema patients compared to non-affected controls. Functional assays demonstrated that dysregulated Bub1 signaling drives increased proliferation of lipedema ADSCs, suggesting a potential mechanism for enhanced adipogenesis in lipedema. Importantly, our characterization of signaling networks driving lipedema identifies potential molecular targets, including Bub1, for novel lipedema therapeutics.

Assuntos

Lipedema , Adipócitos/metabolismo , Adipogenia/genética , Tecido Adiposo/metabolismo , Diferenciação Celular/fisiologia , Humanos , Lipedema/genética , Lipídeos

7.

The genetic architecture of type 2 diabetes.

Fuchsberger, Christian; Flannick, Jason; Teslovich, Tanya M; Mahajan, Anubha; Agarwala, Vineeta; Gaulton, Kyle J; Ma, Clement; Fontanillas, Pierre; Moutsianas, Loukas; McCarthy, Davis J; Rivas, Manuel A; Perry, John R B; Sim, Xueling; Blackwell, Thomas W; Robertson, Neil R; Rayner, N William; Cingolani, Pablo; Locke, Adam E; Tajes, Juan Fernandez; Highland, Heather M; Dupuis, Josee; Chines, Peter S; Lindgren, Cecilia M; Hartl, Christopher; Jackson, Anne U; Chen, Han; Huyghe, Jeroen R; van de Bunt, Martijn; Pearson, Richard D; Kumar, Ashish; Müller-Nurasyid, Martina; Grarup, Niels; Stringham, Heather M; Gamazon, Eric R; Lee, Jaehoon; Chen, Yuhui; Scott, Robert A; Below, Jennifer E; Chen, Peng; Huang, Jinyan; Go, Min Jin; Stitzel, Michael L; Pasko, Dorota; Parker, Stephen C J; Varga, Tibor V; Green, Todd; Beer, Nicola L; Day-Williams, Aaron G; Ferreira, Teresa; Fingerlin, Tasha.

Nature ; 536(7614): 41-47, 2016 08 04.

Artigo em Inglês | MEDLINE | ID: mdl-27398621

RESUMO

The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of the heritability of this disease. Here, to test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole-genome sequencing in 2,657 European individuals with and without diabetes, and exome sequencing in 12,940 individuals from five ancestry groups. To increase statistical power, we expanded the sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support the idea that lower-frequency variants have a major role in predisposition to type 2 diabetes.

Assuntos

Diabetes Mellitus Tipo 2/genética , Predisposição Genética para Doença/genética , Variação Genética/genética , Alelos , Análise Mutacional de DNA , Europa (Continente)/etnologia , Exoma , Estudo de Associação Genômica Ampla , Técnicas de Genotipagem , Humanos , Tamanho da Amostra

8.

Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R.

McCarthy, Davis J; Campbell, Kieran R; Lun, Aaron T L; Wills, Quin F.

Bioinformatics ; 33(8): 1179-1186, 2017 04 15.

Artigo em Inglês | MEDLINE | ID: mdl-28088763

RESUMO

Motivation: Single-cell RNA sequencing (scRNA-seq) is increasingly used to study gene expression at the level of individual cells. However, preparing raw sequence data for further analysis is not a straightforward process. Biases, artifacts and other sources of unwanted variation are present in the data, requiring substantial time and effort to be spent on pre-processing, quality control (QC) and normalization. Results: We have developed the R/Bioconductor package scater to facilitate rigorous pre-processing, quality control, normalization and visualization of scRNA-seq data. The package provides a convenient, flexible workflow to process raw sequencing reads into a high-quality expression dataset ready for downstream analysis. scater provides a rich suite of plotting tools for single-cell data and a flexible data structure that is compatible with existing tools and can be used as infrastructure for future software development. Availability and Implementation: The open-source code, along with installation instructions, vignettes and case studies, is available through Bioconductor at http://bioconductor.org/packages/scater . Contact: davis@ebi.ac.uk. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Linguagens de Programação , Análise de Sequência de RNA/métodos , Análise de Sequência de RNA/normas , Análise de Célula Única/métodos , Software , Linhagem Celular , Humanos , Análise de Componente Principal , Controle de Qualidade , RNA/genética , Estatística como Assunto

9.

MOZ and BMI1 play opposing roles during Hox gene activation in ES cells and in body segment identity specification in vivo.

Sheikh, Bilal N; Downer, Natalie L; Phipson, Belinda; Vanyai, Hannah K; Kueh, Andrew J; McCarthy, Davis J; Smyth, Gordon K; Thomas, Tim; Voss, Anne K.

Proc Natl Acad Sci U S A ; 112(17): 5437-42, 2015 Apr 28.

Artigo em Inglês | MEDLINE | ID: mdl-25922517

RESUMO

Hox genes underlie the specification of body segment identity in the anterior-posterior axis. They are activated during gastrulation and undergo a dynamic shift from a transcriptionally repressed to an active chromatin state in a sequence that reflects their chromosomal location. Nevertheless, the precise role of chromatin modifying complexes during the initial activation phase remains unclear. In the current study, we examined the role of chromatin regulators during Hox gene activation. Using embryonic stem cell lines lacking the transcriptional activator MOZ and the polycomb-family repressor BMI1, we showed that MOZ and BMI1, respectively, promoted and repressed Hox genes during the shift from the transcriptionally repressed to the active state. Strikingly however, MOZ but not BMI1 was required to regulate Hox mRNA levels after the initial activation phase. To determine the interaction of MOZ and BMI1 in vivo, we interrogated their role in regulating Hox genes and body segment identity using Moz;Bmi1 double deficient mice. We found that the homeotic transformations and shifts in Hox gene expression boundaries observed in single Moz and Bmi1 mutant mice were rescued to a wild type identity in Moz;Bmi1 double knockout animals. Together, our findings establish that MOZ and BMI1 play opposing roles during the onset of Hox gene expression in the ES cell model and during body segment identity specification in vivo. We propose that chromatin-modifying complexes have a previously unappreciated role during the initiation phase of Hox gene expression, which is critical for the correct specification of body segment identity.

Assuntos

Padronização Corporal/fisiologia , Embrião de Mamíferos/embriologia , Células-Tronco Embrionárias/metabolismo , Histona Acetiltransferases/metabolismo , Proteínas de Homeodomínio/biossíntese , Complexo Repressor Polycomb 1/metabolismo , Proteínas Proto-Oncogênicas/metabolismo , Animais , Embrião de Mamíferos/citologia , Células-Tronco Embrionárias/citologia , Regulação da Expressão Gênica no Desenvolvimento/fisiologia , Histona Acetiltransferases/genética , Proteínas de Homeodomínio/genética , Camundongos , Camundongos Endogâmicos BALB C , Camundongos Knockout , Complexo Repressor Polycomb 1/genética , Proteínas Proto-Oncogênicas/genética

10.

Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation.

McCarthy, Davis J; Chen, Yunshun; Smyth, Gordon K.

Nucleic Acids Res ; 40(10): 4288-97, 2012 May.

Artigo em Inglês | MEDLINE | ID: mdl-22287627

RESUMO

A flexible statistical framework is developed for the analysis of read counts from RNA-Seq gene expression studies. It provides the ability to analyse complex experiments involving multiple treatment conditions and blocking variables while still taking full account of biological variation. Biological variation between RNA samples is estimated separately from the technical variation associated with sequencing technologies. Novel empirical Bayes methods allow each gene to have its own specific variability, even when there are relatively few biological replicates from which to estimate such variability. The pipeline is implemented in the edgeR package of the Bioconductor project. A case study analysis of carcinoma data demonstrates the ability of generalized linear model methods (GLMs) to detect differential expression in a paired design, and even to detect tumour-specific expression changes. The case study demonstrates the need to allow for gene-specific variability, rather than assuming a common dispersion across genes or a fixed relationship between abundance and variability. Genewise dispersions de-prioritize genes with inconsistent results and allow the main analysis to focus on changes that are consistent between biological replicates. Parallel computational approaches are developed to make non-linear model fitting faster and more reliable, making the application of GLMs to genomic data more convenient and practical. Simulations demonstrate the ability of adjusted profile likelihood estimators to return accurate estimators of biological variability in complex situations. When variation is gene-specific, empirical Bayes estimators provide an advantageous compromise between the extremes of assuming common dispersion or separate genewise dispersion. The methods developed here can also be applied to count data arising from DNA-Seq applications, including ChIP-Seq for epigenetic marks and DNA methylation analyses.

Assuntos

Perfilação da Expressão Gênica , Variação Genética , Análise de Sequência de RNA , Algoritmos , Teorema de Bayes , Carcinoma de Células Escamosas/genética , Carcinoma de Células Escamosas/metabolismo , Sequenciamento de Nucleotídeos em Larga Escala , Modelos Lineares , Neoplasias Bucais/genética , Neoplasias Bucais/metabolismo

11.

A comparison of marker gene selection methods for single-cell RNA sequencing data.

Pullin, Jeffrey M; McCarthy, Davis J.

Genome Biol ; 25(1): 56, 2024 02 26.

Artigo em Inglês | MEDLINE | ID: mdl-38409056

RESUMO

BACKGROUND: The development of single-cell RNA sequencing (scRNA-seq) has enabled scientists to catalog and probe the transcriptional heterogeneity of individual cells in unprecedented detail. A common step in the analysis of scRNA-seq data is the selection of so-called marker genes, most commonly to enable annotation of the biological cell types present in the sample. In this paper, we benchmark 59 computational methods for selecting marker genes in scRNA-seq data. RESULTS: We compare the performance of the methods using 14 real scRNA-seq datasets and over 170 additional simulated datasets. Methods are compared on their ability to recover simulated and expert-annotated marker genes, the predictive performance and characteristics of the gene sets they select, their memory usage and speed, and their implementation quality. In addition, various case studies are used to scrutinize the most commonly used methods, highlighting issues and inconsistencies. CONCLUSIONS: Overall, we present a comprehensive evaluation of methods for selecting marker genes in scRNA-seq data. Our results highlight the efficacy of simple methods, especially the Wilcoxon rank-sum test, Student's t-test, and logistic regression.

Assuntos

Benchmarking , Análise de Célula Única , Análise de Célula Única/métodos , Sequenciamento do Exoma , Análise de Sequência de RNA , Perfilação da Expressão Gênica , Software

12.

De novo transcriptome assembly and genome annotation of the fat-tailed dunnart (Sminthopsis crassicaudata).

Ibeh, Neke; Feigin, Charles Y; Frankenberg, Stephen R; McCarthy, Davis J; Pask, Andrew J; Gallego Romero, Irene.

GigaByte ; 2024: gigabyte118, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38746537

RESUMO

Marsupials exhibit distinctive modes of reproduction and early development that set them apart from their eutherian counterparts and render them invaluable for comparative studies. However, marsupial genomic resources still lag far behind those of eutherian mammals. We present a series of novel genomic resources for the fat-tailed dunnart (Sminthopsis crassicaudata), a mouse-like marsupial that, due to its ease of husbandry and ex-utero development, is emerging as a laboratory model. We constructed a highly representative multi-tissue de novo transcriptome assembly of dunnart RNA-seq reads spanning 12 tissues. The transcriptome includes 2,093,982 assembled transcripts and has a mammalian transcriptome BUSCO completeness score of 93.3%, the highest amongst currently published marsupial transcriptomes. This global transcriptome, along with ab initio predictions, supported annotation of the existing dunnart genome, revealing 21,622 protein-coding genes. Altogether, these resources will enable wider use of the dunnart as a model marsupial and deepen our understanding of mammalian genome evolution.

13.

An Interpretable and Accurate Deep-Learning Diagnosis Framework Modeled With Fully and Semi-Supervised Reciprocal Learning.

Wang, Chong; Chen, Yuanhong; Liu, Fengbei; Elliott, Michael; Kwok, Chun Fung; Pena-Solorzano, Carlos; Frazer, Helen; McCarthy, Davis James; Carneiro, Gustavo.

IEEE Trans Med Imaging ; 43(1): 392-404, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-37603481

RESUMO

The deployment of automated deep-learning classifiers in clinical practice has the potential to streamline the diagnosis process and improve the diagnosis accuracy, but the acceptance of those classifiers relies on both their accuracy and interpretability. In general, accurate deep-learning classifiers provide little model interpretability, while interpretable models do not have competitive classification accuracy. In this paper, we introduce a new deep-learning diagnosis framework, called InterNRL, that is designed to be highly accurate and interpretable. InterNRL consists of a student-teacher framework, where the student model is an interpretable prototype-based classifier (ProtoPNet) and the teacher is an accurate global image classifier (GlobalNet). The two classifiers are mutually optimised with a novel reciprocal learning paradigm in which the student ProtoPNet learns from optimal pseudo labels produced by the teacher GlobalNet, while GlobalNet learns from ProtoPNet's classification performance and pseudo labels. This reciprocal learning paradigm enables InterNRL to be flexibly optimised under both fully- and semi-supervised learning scenarios, reaching state-of-the-art classification performance in both scenarios for the tasks of breast cancer and retinal disease diagnosis. Moreover, relying on weakly-labelled training images, InterNRL also achieves superior breast cancer localisation and brain tumour segmentation results than other competing methods.

Assuntos

Neoplasias da Mama , Aprendizado Profundo , Doenças Retinianas , Humanos , Feminino , Retina , Aprendizado de Máquina Supervisionado

14.

BRAIxDet: Learning to detect malignant breast lesion with incomplete annotations.

Chen, Yuanhong; Liu, Yuyuan; Wang, Chong; Elliott, Michael; Kwok, Chun Fung; Peña-Solorzano, Carlos; Tian, Yu; Liu, Fengbei; Frazer, Helen; McCarthy, Davis J; Carneiro, Gustavo.

Med Image Anal ; 96: 103192, 2024 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-38810516

RESUMO

Methods to detect malignant lesions from screening mammograms are usually trained with fully annotated datasets, where images are labelled with the localisation and classification of cancerous lesions. However, real-world screening mammogram datasets commonly have a subset that is fully annotated and another subset that is weakly annotated with just the global classification (i.e., without lesion localisation). Given the large size of such datasets, researchers usually face a dilemma with the weakly annotated subset: to not use it or to fully annotate it. The first option will reduce detection accuracy because it does not use the whole dataset, and the second option is too expensive given that the annotation needs to be done by expert radiologists. In this paper, we propose a middle-ground solution for the dilemma, which is to formulate the training as a weakly- and semi-supervised learning problem that we refer to as malignant breast lesion detection with incomplete annotations. To address this problem, our new method comprises two stages, namely: (1) pre-training a multi-view mammogram classifier with weak supervision from the whole dataset, and (2) extending the trained classifier to become a multi-view detector that is trained with semi-supervised student-teacher learning, where the training set contains fully and weakly-annotated mammograms. We provide extensive detection results on two real-world screening mammogram datasets containing incomplete annotations and show that our proposed approach achieves state-of-the-art results in the detection of malignant breast lesions with incomplete annotations.

Assuntos

Neoplasias da Mama , Mamografia , Interpretação de Imagem Radiográfica Assistida por Computador , Humanos , Neoplasias da Mama/diagnóstico por imagem , Mamografia/métodos , Feminino , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Algoritmos , Aprendizado de Máquina Supervisionado

15.

Demuxafy: improvement in droplet assignment by integrating multiple single-cell demultiplexing and doublet detection methods.

Neavin, Drew; Senabouth, Anne; Arora, Himanshi; Lee, Jimmy Tsz Hang; Ripoll-Cladellas, Aida; Franke, Lude; Prabhakar, Shyam; Ye, Chun Jimmie; McCarthy, Davis J; Melé, Marta; Hemberg, Martin; Powell, Joseph E.

Genome Biol ; 25(1): 94, 2024 04 15.

Artigo em Inglês | MEDLINE | ID: mdl-38622708

RESUMO

Recent innovations in single-cell RNA-sequencing (scRNA-seq) provide the technology to investigate biological questions at cellular resolution. Pooling cells from multiple individuals has become a common strategy, and droplets can subsequently be assigned to a specific individual by leveraging their inherent genetic differences. An implicit challenge with scRNA-seq is the occurrence of doublets-droplets containing two or more cells. We develop Demuxafy, a framework to enhance donor assignment and doublet removal through the consensus intersection of multiple demultiplexing and doublet detecting methods. Demuxafy significantly improves droplet assignment by separating singlets from doublets and classifying the correct individual.

Assuntos

Análise de Célula Única , Humanos , Análise de Célula Única/métodos , Análise de Sequência de RNA/métodos

16.

Cell-type-specific and disease-associated expression quantitative trait loci in the human lung.

Natri, Heini M; Del Azodi, Christina B; Peter, Lance; Taylor, Chase J; Chugh, Sagrika; Kendle, Robert; Chung, Mei-I; Flaherty, David K; Matlock, Brittany K; Calvi, Carla L; Blackwell, Timothy S; Ware, Lorraine B; Bacchetta, Matthew; Walia, Rajat; Shaver, Ciara M; Kropski, Jonathan A; McCarthy, Davis J; Banovich, Nicholas E.

Nat Genet ; 56(4): 595-604, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38548990

RESUMO

Common genetic variants confer substantial risk for chronic lung diseases, including pulmonary fibrosis. Defining the genetic control of gene expression in a cell-type-specific and context-dependent manner is critical for understanding the mechanisms through which genetic variation influences complex traits and disease pathobiology. To this end, we performed single-cell RNA sequencing of lung tissue from 66 individuals with pulmonary fibrosis and 48 unaffected donors. Using a pseudobulk approach, we mapped expression quantitative trait loci (eQTLs) across 38 cell types, observing both shared and cell-type-specific regulatory effects. Furthermore, we identified disease interaction eQTLs and demonstrated that this class of associations is more likely to be cell-type-specific and linked to cellular dysregulation in pulmonary fibrosis. Finally, we connected lung disease risk variants to their regulatory targets in disease-relevant cell types. These results indicate that cellular context determines the impact of genetic variation on gene expression and implicates context-specific eQTLs as key regulators of lung homeostasis and disease.

Assuntos

Fibrose Pulmonar , Locos de Características Quantitativas , Humanos , Locos de Características Quantitativas/genética , Fibrose Pulmonar/genética , Regulação da Expressão Gênica/genética , Pulmão , Herança Multifatorial , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único

17.

Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates.

Lund, Steven P; Nettleton, Dan; McCarthy, Davis J; Smyth, Gordon K.

Stat Appl Genet Mol Biol ; 11(5)2012 Oct 22.

Artigo em Inglês | MEDLINE | ID: mdl-23104842

RESUMO

Next generation sequencing technology provides a powerful tool for measuring gene expression (mRNA) levels in the form of RNA-sequence data. Method development for identifying differentially expressed (DE) genes from RNA-seq data, which frequently includes many low-count integers and can exhibit severe overdispersion relative to Poisson or binomial distributions, is a popular area of ongoing research. Here we present quasi-likelihood methods with shrunken dispersion estimates based on an adaptation of Smyth's (2004) approach to estimating gene-specific error variances for microarray data. Our suggested methods are computationally simple, analogous to ANOVA and compare favorably versus competing methods in detecting DE genes and estimating false discovery rates across a variety of simulations based on real data.

Assuntos

Perfilação da Expressão Gênica/estatística & dados numéricos , Análise de Sequência de RNA/métodos , Sequência de Bases , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Funções Verossimilhança , RNA Mensageiro/metabolismo

18.

Fancm has dual roles in the limiting of meiotic crossovers and germ cell maintenance in mammals.

Tsui, Vanessa; Lyu, Ruqian; Novakovic, Stevan; Stringer, Jessica M; Dunleavy, Jessica E M; Granger, Elissah; Semple, Tim; Leichter, Anna; Martelotto, Luciano G; Merriner, D Jo; Liu, Ruijie; McNeill, Lucy; Zerafa, Nadeen; Hoffmann, Eva R; O'Bryan, Moira K; Hutt, Karla; Deans, Andrew J; Heierhorst, Jörg; McCarthy, Davis J; Crismani, Wayne.

Cell Genom ; 3(8): 100349, 2023 Aug 09.

Artigo em Inglês | MEDLINE | ID: mdl-37601968

RESUMO

Meiotic crossovers are required for accurate chromosome segregation and producing new allelic combinations. Meiotic crossover numbers are tightly regulated within a narrow range, despite an excess of initiating DNA double-strand breaks. Here, we reveal the tumor suppressor FANCM as a meiotic anti-crossover factor in mammals. We use unique large-scale crossover analyses with both single-gamete sequencing and pedigree-based bulk-sequencing datasets to identify a genome-wide increase in crossover frequencies in Fancm-deficient mice. Gametogenesis is heavily perturbed in Fancm loss-of-function mice, which is consistent with the reproductive defects reported in humans with biallelic FANCM mutations. A portion of the gametogenesis defects can be attributed to the cGAS-STING pathway after birth. Despite the gametogenesis phenotypes in Fancm mutants, both sexes are capable of producing offspring. We propose that the anti-crossover function and role in gametogenesis of Fancm are separable and will inform diagnostic pathways for human genomic instability disorders.

19.

Cell type-specific and disease-associated eQTL in the human lung.

Natri, Heini M; Del Azodi, Christina B; Peter, Lance; Taylor, Chase J; Chugh, Sagrika; Kendle, Robert; Chung, Mei-I; Flaherty, David K; Matlock, Brittany K; Calvi, Carla L; Blackwell, Timothy S; Ware, Lorraine B; Bacchetta, Matthew; Walia, Rajat; Shaver, Ciara M; Kropski, Jonathan A; McCarthy, Davis J; Banovich, Nicholas E.

bioRxiv ; 2023 Jun 29.

Artigo em Inglês | MEDLINE | ID: mdl-36993211

RESUMO

Common genetic variants confer substantial risk for chronic lung diseases, including pulmonary fibrosis (PF). Defining the genetic control of gene expression in a cell-type-specific and context-dependent manner is critical for understanding the mechanisms through which genetic variation influences complex traits and disease pathobiology. To this end, we performed single-cell RNA-sequencing of lung tissue from 67 PF and 49 unaffected donors. Employing a pseudo-bulk approach, we mapped expression quantitative trait loci (eQTL) across 38 cell types, observing both shared and cell type-specific regulatory effects. Further, we identified disease-interaction eQTL and demonstrated that this class of associations is more likely to be cell-type specific and linked to cellular dysregulation in PF. Finally, we connected PF risk variants to their regulatory targets in disease-relevant cell types. These results indicate that cellular context determines the impact of genetic variation on gene expression, and implicates context-specific eQTL as key regulators of lung homeostasis and disease.

20.

ADMANI: Annotated Digital Mammograms and Associated Non-Image Datasets.

Frazer, Helen M L; Tang, Jennifer S N; Elliott, Michael S; Kunicki, Katrina M; Hill, Brendan; Karthik, Ravishankar; Kwok, Chun Fung; Peña-Solorzano, Carlos A; Chen, Yuanhong; Wang, Chong; Al-Qershi, Osamah; Fox, Samantha K; Li, Shuai; Makalic, Enes; Nguyen, Tuong L; Schmidt, Daniel F; Basnayake Ralalage, Prabhathi; Lippey, Jocelyn F; Brotchie, Peter; Hopper, John L; Carneiro, Gustavo; McCarthy, Davis J.

Radiol Artif Intell ; 5(2): e220072, 2023 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-37035431

RESUMO

Supplemental material is available for this article. Keywords: Mammography, Screening, Convolutional Neural Network (CNN) Published under a CC BY 4.0 license. See also the commentary by Cadrin-Chênevert in this issue.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA