Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 49
Filtrar
1.
Genome Res ; 33(2): 218-231, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36653120

RESUMO

The true benefits of large single-cell transcriptome and epigenome data sets can be realized only with the development of new approaches and search tools for annotating individual cells. Matching a single-cell epigenome profile to a large pool of reference cells remains a major challenge. Here, we present scEpiSearch, which enables searching, comparison, and independent classification of single-cell open-chromatin profiles against a large reference of single-cell expression and open-chromatin data sets. Across performance benchmarks, scEpiSearch outperformed multiple methods in accuracy of search and low-dimensional coembedding of single-cell profiles, irrespective of platforms and species. Here we also demonstrate the unconventional utilities of scEpiSearch by applying it on single-cell epigenome profiles of K562 cells and samples from patients with acute leukaemia to reveal different aspects of their heterogeneity, multipotent behavior, and dedifferentiated states. Applying scEpiSearch on our single-cell open-chromatin profiles from embryonic stem cells (ESCs), we identified ESC subpopulations with more activity and poising for endoplasmic reticulum stress and unfolded protein response. Thus, scEpiSearch solves the nontrivial problem of amalgamating information from a large pool of single cells to identify and study the regulatory states of cells using their single-cell epigenomes.


Assuntos
Cromatina , Transcriptoma , Humanos , Cromatina/metabolismo , Epigenoma , Células-Tronco Embrionárias/metabolismo , Análise de Célula Única
2.
Genome Res ; 33(1): 80-95, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36414416

RESUMO

The identification and characterization of circulating tumor cells (CTCs) are important for gaining insights into the biology of metastatic cancers, monitoring disease progression, and medical management of the disease. The limiting factor in the enrichment of purified CTC populations is their sparse availability, heterogeneity, and altered phenotypes relative to the primary tumor. Intensive research both at the technical and molecular fronts led to the development of assays that ease CTC detection and identification from peripheral blood. Most CTC detection methods based on single-cell RNA sequencing (scRNA-seq) use a mix of size selection, marker-based white blood cell (WBC) depletion, and antibodies targeting tumor-associated antigens. However, the majority of these methods either miss out on atypical CTCs or suffer from WBC contamination. We present unCTC, an R package for unbiased identification and characterization of CTCs from single-cell transcriptomic data. unCTC features many standard and novel computational and statistical modules for various analyses. These include a novel method of scRNA-seq clustering, named deep dictionary learning using k-means clustering cost (DDLK), expression-based copy number variation (CNV) inference, and combinatorial, marker-based verification of the malignant phenotypes. DDLK enables robust segregation of CTCs and WBCs in the pathway space, as opposed to the gene expression space. We validated the utility of unCTC on scRNA-seq profiles of breast CTCs from six patients, captured and profiled using an integrated ClearCell FX and Polaris workflow that works by the principles of size-based separation of CTCs and marker-based WBC depletion.


Assuntos
Células Neoplásicas Circulantes , Humanos , Células Neoplásicas Circulantes/metabolismo , Transcriptoma , Variações do Número de Cópias de DNA , Perfilação da Expressão Gênica , Biomarcadores Tumorais
3.
Bioinformatics ; 40(4)2024 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-38608194

RESUMO

MOTIVATION: Dysregulation of a gene's function, either due to mutations or impairments in regulatory networks, often triggers pathological states in the affected tissue. Comprehensive mapping of these apparent gene-pathology relationships is an ever-daunting task, primarily due to genetic pleiotropy and lack of suitable computational approaches. With the advent of high throughput genomics platforms and community scale initiatives such as the Human Cell Landscape project, researchers have been able to create gene expression portraits of healthy tissues resolved at the level of single cells. However, a similar wealth of knowledge is currently not at our finger-tip when it comes to diseases. This is because the genetic manifestation of a disease is often quite diverse and is confounded by several clinical and demographic covariates. RESULTS: To circumvent this, we mined ∼18 million PubMed abstracts published till May 2019 and automatically selected ∼4.5 million of them that describe roles of particular genes in disease pathogenesis. Further, we fine-tuned the pretrained bidirectional encoder representations from transformers (BERT) for language modeling from the domain of natural language processing to learn vector representation of entities such as genes, diseases, tissues, cell-types, etc., in a way such that their relationship is preserved in a vector space. The repurposed BERT predicted disease-gene associations that are not cited in the training data, thereby highlighting the feasibility of in silico synthesis of hypotheses linking different biological entities such as genes and conditions. AVAILABILITY AND IMPLEMENTATION: PathoBERT pretrained model: https://github.com/Priyadarshini-Rai/Pathomap-Model. BioSentVec-based abstract classification model: https://github.com/Priyadarshini-Rai/Pathomap-Model. Pathomap R package: https://github.com/Priyadarshini-Rai/Pathomap.


Assuntos
Mineração de Dados , Humanos , Mineração de Dados/métodos , Biologia Computacional/métodos , Processamento de Linguagem Natural
4.
Genome Res ; 31(4): 689-697, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33674351

RESUMO

Systematic delineation of complex biological systems is an ever-challenging and resource-intensive process. Single-cell transcriptomics allows us to study cell-to-cell variability in complex tissues at an unprecedented resolution. Accurate modeling of gene expression plays a critical role in the statistical determination of tissue-specific gene expression patterns. In the past few years, considerable efforts have been made to identify appropriate parametric models for single-cell expression data. The zero-inflated version of Poisson/negative binomial and log-normal distributions have emerged as the most popular alternatives owing to their ability to accommodate high dropout rates, as commonly observed in single-cell data. Although the majority of the parametric approaches directly model expression estimates, we explore the potential of modeling expression ranks, as robust surrogates for transcript abundance. Here we examined the performance of the discrete generalized beta distribution (DGBD) on real data and devised a Wald-type test for comparing gene expression across two phenotypically divergent groups of single cells. We performed a comprehensive assessment of the proposed method to understand its advantages compared with some of the existing best-practice approaches. We concluded that besides striking a reasonable balance between Type I and Type II errors, ROSeq, the proposed differential expression test, is exceptionally robust to expression noise and scales rapidly with increasing sample size. For wider dissemination and adoption of the method, we created an R package called ROSeq and made it available on the Bioconductor platform.


Assuntos
Perfilação da Expressão Gênica , RNA-Seq , Análise de Célula Única , Transcriptoma
5.
Chembiochem ; 25(1): e202300577, 2024 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-37874183

RESUMO

Cellular genome is considered a dynamic blueprint of a cell since it encodes genetic information that gets temporally altered due to various endogenous and exogenous insults. Largely, the extent of genomic dynamicity is controlled by the trade-off between DNA repair processes and the genotoxic potential of the causative agent (genotoxins or potential carcinogens). A subset of genotoxins form DNA adducts by covalently binding to the cellular DNA, triggering structural or functional changes that lead to significant alterations in cellular processes via genetic (e. g., mutations) or non-genetic (e. g., epigenome) routes. Identification, quantification, and characterization of DNA adducts are indispensable for their comprehensive understanding and could expedite the ongoing efforts in predicting carcinogenicity and their mode of action. In this review, we elaborate on using Artificial Intelligence (AI)-based modeling in adducts biology and present multiple computational strategies to gain advancements in decoding DNA adducts. The proposed AI-based strategies encompass predictive modeling for adduct formation via metabolic activation, novel adducts' identification, prediction of biochemical routes for adduct formation, adducts' half-life predictions within biological ecosystems, and, establishing methods to predict the link between adducts chemistry and its location within the genomic DNA. In summary, we discuss some futuristic AI-based approaches in DNA adduct biology.


Assuntos
Adutos de DNA , Ecossistema , Inteligência Artificial , Mutagênicos , DNA/genética
6.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35868454

RESUMO

Artificial intelligence (AI)-based computational techniques allow rapid exploration of the chemical space. However, representation of the compounds into computational-compatible and detailed features is one of the crucial steps for quantitative structure-activity relationship (QSAR) analysis. Recently, graph-based methods are emerging as a powerful alternative to chemistry-restricted fingerprints or descriptors for modeling. Although graph-based modeling offers multiple advantages, its implementation demands in-depth domain knowledge and programming skills. Here we introduce deepGraphh, an end-to-end web service featuring a conglomerate of established graph-based methods for model generation for classification or regression tasks. The graphical user interface of deepGraphh supports highly configurable parameter support for model parameter tuning, model generation, cross-validation and testing of the user-supplied query molecules. deepGraphh supports four widely adopted methods for QSAR analysis, namely, graph convolution network, graph attention network, directed acyclic graph and Attentive FP. Comparative analysis revealed that deepGraphh supported methods are comparable to the descriptors-based machine learning techniques. Finally, we used deepGraphh models to predict the blood-brain barrier permeability of human and microbiome-generated metabolites. In summary, deepGraphh offers a one-stop web service for graph-based methods for chemoinformatics.


Assuntos
Inteligência Artificial , Relação Quantitativa Estrutura-Atividade , Humanos , Aprendizado de Máquina
7.
Nat Chem Biol ; 18(11): 1204-1213, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-35953549

RESUMO

The genome of a eukaryotic cell is often vulnerable to both intrinsic and extrinsic threats owing to its constant exposure to a myriad of heterogeneous compounds. Despite the availability of innate DNA damage responses, some genomic lesions trigger malignant transformation of cells. Accurate prediction of carcinogens is an ever-challenging task owing to the limited information about bona fide (non-)carcinogens. We developed Metabokiller, an ensemble classifier that accurately recognizes carcinogens by quantitatively assessing their electrophilicity, their potential to induce proliferation, oxidative stress, genomic instability, epigenome alterations, and anti-apoptotic response. Concomitant with the carcinogenicity prediction, Metabokiller is fully interpretable and outperforms existing best-practice methods for carcinogenicity prediction. Metabokiller unraveled potential carcinogenic human metabolites. To cross-validate Metabokiller predictions, we performed multiple functional assays using Saccharomyces cerevisiae and human cells with two Metabokiller-flagged human metabolites, namely 4-nitrocatechol and 3,4-dihydroxyphenylacetic acid, and observed high synergy between Metabokiller predictions and experimental validations.


Assuntos
Inteligência Artificial , Carcinógenos , Humanos , Carcinógenos/toxicidade , Ácido 3,4-Di-Hidroxifenilacético , Transformação Celular Neoplásica/genética , Instabilidade Genômica
8.
PLoS Comput Biol ; 19(4): e1010995, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-37068117

RESUMO

Our understanding of how speed and persistence of cell migration affects the growth rate and size of tumors remains incomplete. To address this, we developed a mathematical model wherein cells migrate in two-dimensional space, divide, die or intravasate into the vasculature. Exploring a wide range of speed and persistence combinations, we find that tumor growth positively correlates with increasing speed and higher persistence. As a biologically relevant example, we focused on Golgi fragmentation, a phenomenon often linked to alterations of cell migration. Golgi fragmentation was induced by depletion of Giantin, a Golgi matrix protein, the downregulation of which correlates with poor patient survival. Applying the experimentally obtained migration and invasion traits of Giantin depleted breast cancer cells to our mathematical model, we predict that loss of Giantin increases the number of intravasating cells. This prediction was validated, by showing that circulating tumor cells express significantly less Giantin than primary tumor cells. Altogether, our computational model identifies cell migration traits that regulate tumor progression and uncovers a role of Giantin in breast cancer progression.


Assuntos
Neoplasias da Mama , Proteínas de Membrana , Humanos , Feminino , Proteínas de Membrana/metabolismo , Proteínas da Matriz do Complexo de Golgi/metabolismo , Neoplasias da Mama/metabolismo , Complexo de Golgi/metabolismo , Complexo de Golgi/patologia
9.
J Biol Chem ; 298(8): 102177, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35753349

RESUMO

Cancers are caused by genomic alterations that may be inherited, induced by environmental carcinogens, or caused due to random replication errors. Postinduction of carcinogenicity, mutations further propagate and drastically alter the cancer genomes. Although a subset of driver mutations has been identified and characterized to date, most cancer-related somatic mutations are indistinguishable from germline variants or other noncancerous somatic mutations. Thus, such overlap impedes appreciation of many deleterious but previously uncharacterized somatic mutations. The major bottleneck arises due to patient-to-patient variability in mutational profiles, making it difficult to associate specific mutations with a given disease outcome. Here, we describe a newly developed technique Continuous Representation of Codon Switches (CRCS), a deep learning-based method that allows us to generate numerical vector representations of mutations, thereby enabling numerous machine learning-based tasks. We demonstrate three major applications of CRCS; first, we show how CRCS can help detect cancer-related somatic mutations in the absence of matched normal samples, which has applications in cell-free DNA-based assessment of tumor mutation burden. Second, the proposed approach also enables identification and exploration of driver genes; our analyses implicate DMD, RSK4, OFD1, WDR44, and AFF2 as potential cancer drivers. Finally, we used CRCS to score individual mutations in a tumor sample, which was found to be predictive of patient survival in bladder urothelial carcinoma, hepatocellular carcinoma, and lung adenocarcinoma. Taken together, we propose CRCS as a valuable computational tool for analysis of the functional significance of individual cancer mutations.


Assuntos
Carcinoma de Células de Transição , Aprendizado Profundo , Neoplasias , Neoplasias da Bexiga Urinária , Genômica/métodos , Humanos , Mutação , Neoplasias/genética
10.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34184038

RESUMO

Dramatic genomic alterations, either inducible or in a pathological state, dismantle the core regulatory networks, leading to the activation of normally silent genes. Despite possessing immense therapeutic potential, accurate detection of these transcripts is an ever-challenging task, as it requires prior knowledge of the physiological gene expression levels. Here, we introduce EcTracker, an R-/Shiny-based single-cell data analysis web server that bestows a plethora of functionalities that collectively enable the quantitative and qualitative assessments of bona fide cell types or tissue-specific transcripts and, conversely, the ectopically expressed genes in the single-cell ribonucleic acid sequencing datasets. Moreover, it also allows regulon analysis to identify the key transcriptional factors regulating the user-selected gene signatures. To demonstrate the EcTracker functionality, we reanalyzed the CRISPR interference (CRISPRi) dataset of the human embryonic stem cells differentiated into endoderm lineage and identified the prominent enrichment of a specific gene signature in the SMAD2 knockout cells whose identity was ambiguous in the original study. The key distinguishing features of EcTracker lie within its processing speed, availability of multiple add-on modules, interactive graphical user interface and comprehensiveness. In summary, EcTracker provides an easy-to-perform, integrative and end-to-end single-cell data analysis platform that allows decoding of cellular identities, identification of ectopically expressed genes and their regulatory networks, and therefore, collectively imparts a novel dimension for analyzing single-cell datasets.


Assuntos
Biologia Computacional , Expressão Ectópica do Gene , RNA-Seq , Análise de Célula Única , Software , Biologia Computacional/métodos , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Especificidade de Órgãos , Análise de Célula Única/métodos , Fatores de Transcrição/metabolismo , Interface Usuário-Computador , Navegador
11.
Brief Bioinform ; 22(2): 873-881, 2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-32810867

RESUMO

A prominent clinical symptom of 2019-novel coronavirus (nCoV) infection is hyposmia/anosmia (decrease or loss of sense of smell), along with general symptoms such as fatigue, shortness of breath, fever and cough. The identity of the cell lineages that underpin the infection-associated loss of olfaction could be critical for the clinical management of 2019-nCoV-infected individuals. Recent research has confirmed the role of angiotensin-converting enzyme 2 (ACE2) and transmembrane protease serine 2 (TMPRSS2) as key host-specific cellular moieties responsible for the cellular entry of the virus. Accordingly, the ongoing medical examinations and the autopsy reports of the deceased individuals indicate that organs/tissues with high expression levels of ACE2, TMPRSS2 and other putative viral entry-associated genes are most vulnerable to the infection. We studied if anosmia in 2019-nCoV-infected individuals can be explained by the expression patterns associated with these host-specific moieties across the known olfactory epithelial cell types, identified from a recently published single-cell expression study. Our findings underscore selective expression of these viral entry-associated genes in a subset of sustentacular cells (SUSs), Bowman's gland cells (BGCs) and stem cells of the olfactory epithelium. Co-expression analysis of ACE2 and TMPRSS2 and protein-protein interaction among the host and viral proteins elected regulatory cytoskeleton protein-enriched SUSs as the most vulnerable cell type of the olfactory epithelium. Furthermore, expression, structural and docking analyses of ACE2 revealed the potential risk of olfactory dysfunction in four additional mammalian species, revealing an evolutionarily conserved infection susceptibility. In summary, our findings provide a plausible cellular basis for the loss of smell in 2019-nCoV-infected patients.


Assuntos
Anosmia/patologia , COVID-19/complicações , Enzima de Conversão de Angiotensina 2/metabolismo , COVID-19/patologia , COVID-19/virologia , Humanos , SARS-CoV-2/isolamento & purificação , Proteínas Virais/metabolismo , Internalização do Vírus
12.
Nucleic Acids Res ; 49(3): e13, 2021 02 22.
Artigo em Inglês | MEDLINE | ID: mdl-33275158

RESUMO

Recent advances in single-cell open-chromatin and transcriptome profiling have created a challenge of exploring novel applications with a meaningful transformation of read-counts, which often have high variability in noise and drop-out among cells. Here, we introduce UniPath, for representing single-cells using pathway and gene-set enrichment scores by a transformation of their open-chromatin or gene-expression profiles. The robust statistical approach of UniPath provides high accuracy, consistency and scalability in estimating gene-set enrichment scores for every cell. Its framework provides an easy solution for handling variability in drop-out rate, which can sometimes create artefact due to systematic patterns. UniPath provides an alternative approach of dimension reduction of single-cell open-chromatin profiles. UniPath's approach of predicting temporal-order of single-cells using their pathway enrichment scores enables suppression of covariates to achieve correct order of cells. Analysis of mouse cell atlas using our approach yielded surprising, albeit biologically-meaningful co-clustering of cell-types from distant organs. By enabling an unconventional method of exploiting pathway co-occurrence to compare two groups of cells, our approach also proves to be useful in inferring context-specific regulations in cancer cells. Available at https://reggenlab.github.io/UniPathWeb/.


Assuntos
Epigenômica/métodos , RNA-Seq/métodos , Análise de Célula Única/métodos , Animais , Linhagem Celular Tumoral , Cromatina , Análise por Conglomerados , Epigenoma , Genes , Humanos , Camundongos , Neoplasias/genética
13.
J Biol Chem ; 297(2): 100956, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34265305

RESUMO

The molecular mechanisms of olfaction, or the sense of smell, are relatively underexplored compared with other sensory systems, primarily because of its underlying molecular complexity and the limited availability of dedicated predictive computational tools. Odorant receptors (ORs) allow the detection and discrimination of a myriad of odorant molecules and therefore mediate the first step of the olfactory signaling cascade. To date, odorant (or agonist) information for the majority of these receptors is still unknown, limiting our understanding of their functional relevance in odor-induced behavioral responses. In this study, we introduce OdoriFy, a Web server featuring powerful deep neural network-based prediction engines. OdoriFy enables (1) identification of odorant molecules for wildtype or mutant human ORs (Odor Finder); (2) classification of user-provided chemicals as odorants/nonodorants (Odorant Predictor); (3) identification of responsive ORs for a query odorant (OR Finder); and (4) interaction validation using Odorant-OR Pair Analysis. In addition, OdoriFy provides the rationale behind every prediction it makes by leveraging explainable artificial intelligence. This module highlights the basis of the prediction of odorants/nonodorants at atomic resolution and for the ORs at amino acid levels. A key distinguishing feature of OdoriFy is that it is built on a comprehensive repertoire of manually curated information of human ORs with their known agonists and nonagonists, making it a highly interactive and resource-enriched Web server. Moreover, comparative analysis of OdoriFy predictions with an alternative structure-based ligand interaction method revealed comparable results. OdoriFy is available freely as a web service at https://odorify.ahujalab.iiitd.edu.in/olfy/.


Assuntos
Inteligência Artificial , Odorantes , Ligantes , Neurônios Receptores Olfatórios/metabolismo , Transdução de Sinais
14.
Bioinformatics ; 37(12): 1769-1771, 2021 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-33416866

RESUMO

SUMMARY: Machine Learning-based techniques are emerging as state-of-the-art methods in chemoinformatics to selectively, effectively and speedily identify biologically relevant molecules from large databases. So far, a multitude of such techniques have been proposed, but unfortunately due to their sparse availability, and the dependency on high-end computational literacy, their wider adaptation faces challenges, at least in the context of G-Protein Coupled Receptors (GPCRs)-associated chemosensory research. Here, we report Machine-OlF-Action (MOA), a user-friendly, open-source computational framework, that utilizes user-supplied SMILES (simplified molecular input line entry system) of the chemicals, along with their activation status, to synthesize classification models. MOA integrates a number of popular chemical databases collectively harboring approximately 103 million chemical moieties. MOA also facilitates customized screening of user-supplied chemical datasets. A key feature of MOA is its ability to embed molecules based on the similarity of their local neighborhood, by utilizing a state-of-the-art model interpretability framework LIME. We demonstrate the utility of MOA in identifying previously unreported agonists for human and mouse olfactory receptors OR1A1 and MOR174-9 by leveraging the chemical features of their known agonists and non-agonists. In summary, here we develop an ML-powered software playground for performing supervisory learning tasks involving chemical compounds. AVAILABILITY AND IMPLEMENTATION: MOA is available for Windows, Mac and Linux operating systems. It's accessible at (https://ahuja-lab.in/). Source code, user manual, step-by-step guide and support is available at GitHub (https://github.com/the-ahuja-lab/Machine-Olf-Action). For results, reproducibility and hyperparameters, refer to Supplementary Notes. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

15.
BMC Genomics ; 21(1): 744, 2020 Oct 27.
Artigo em Inglês | MEDLINE | ID: mdl-33287695

RESUMO

BACKGROUND: Early diagnosis is crucial for effective medical management of cancer patients. Tissue biopsy has been widely used for cancer diagnosis, but its invasive nature limits its application, especially when repeated biopsies are needed. Over the past few years, genomic explorations have led to the discovery of various blood-based biomarkers. Tumor Educated Platelets (TEPs) have, of late, generated considerable interest due to their ability to infer tumor existence and subtype accurately. So far, a majority of the studies involving TEPs have offered marker-panels consisting of several hundreds of genes. Profiling large numbers of genes incur a significant cost, impeding its diagnostic adoption. As such, it is important to construct minimalistic molecular signatures comprising a small number of genes. RESULTS: To address the aforesaid challenges, we analyzed publicly available TEP expression profiles and identified a panel of 11 platelet-genes that reliably discriminates between cancer and healthy samples. To validate its efficacy, we chose non-small cell lung cancer (NSCLC), the most prevalent type of lung malignancy. When applied to platelet-gene expression data from a published study, our machine learning model could accurately discriminate between non-metastatic NSCLC cases and healthy samples. We further experimentally validated the panel on an in-house cohort of metastatic NSCLC patients and healthy controls via real-time quantitative Polymerase Chain Reaction (RT-qPCR) (AUC = 0.97). Model performance was boosted significantly after artificial data-augmentation using the EigenSample method (AUC = 0.99). Lastly, we demonstrated the cancer-specificity of the proposed gene-panel by benchmarking it on platelet transcriptomes from patients with Myocardial Infarction (MI). CONCLUSION: We demonstrated an end-to-end bioinformatic plus experimental workflow for identifying a minimal set of TEP associated marker-genes that are predictive of the existence of cancers. We also discussed a strategy for boosting the predictive model performance by artificial augmentation of gene expression data.


Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Biomarcadores Tumorais/genética , Plaquetas , Carcinoma Pulmonar de Células não Pequenas/diagnóstico , Carcinoma Pulmonar de Células não Pequenas/genética , Perfilação da Expressão Gênica , Humanos , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/genética
16.
BMC Genomics ; 21(1): 877, 2020 Dec 08.
Artigo em Inglês | MEDLINE | ID: mdl-33292182

RESUMO

An amendment to this paper has been published and can be accessed via the original article.

17.
Bioinformatics ; 2019 Nov 06.
Artigo em Inglês | MEDLINE | ID: mdl-31693086

RESUMO

SUMMARY: DropClust leverages Locality Sensitive Hashing (LSH) to speed up clustering of large scale single cell expression data. Here we present the improved dropClust, a complete R package that is, fast, interoperable and minimally resource intensive. The new dropClust features a novel batch effect removal algorithm that allows integrative analysis of single cell RNA-seq (scRNA-seq) datasets. AVAILABILITY AND IMPLEMENTATION: dropClust is freely available at https://github.com/debsin/dropClust as an R package. A lightweight online version of the dropClust is available at https://debsinha.shinyapps.io/dropClust/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

18.
Nucleic Acids Res ; 46(6): e36, 2018 04 06.
Artigo em Inglês | MEDLINE | ID: mdl-29361178

RESUMO

Droplet based single cell transcriptomics has recently enabled parallel screening of tens of thousands of single cells. Clustering methods that scale for such high dimensional data without compromising accuracy are scarce. We exploit Locality Sensitive Hashing, an approximate nearest neighbour search technique to develop a de novo clustering algorithm for large-scale single cell data. On a number of real datasets, dropClust outperformed the existing best practice methods in terms of execution time, clustering accuracy and detectability of minor cell sub-types.


Assuntos
Algoritmos , Análise por Conglomerados , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , RNA Citoplasmático Pequeno/genética , Células Cultivadas , Células HEK293 , Humanos , Células Jurkat , Leucócitos Mononucleares/citologia , Leucócitos Mononucleares/metabolismo , Células Progenitoras de Megacariócitos/citologia , Células Progenitoras de Megacariócitos/metabolismo , RNA Citoplasmático Pequeno/classificação , Reprodutibilidade dos Testes , Análise de Sequência de RNA , Análise de Célula Única/métodos
19.
Nucleic Acids Res ; 46(W1): W141-W147, 2018 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-29788498

RESUMO

Owing to the advent of high throughput single cell transcriptomics, past few years have seen exponential growth in production of gene expression data. Recently efforts have been made by various research groups to homogenize and store single cell expression from a large number of studies. The true value of this ever increasing data deluge can be unlocked by making it searchable. To this end, we propose CellAtlasSearch, a novel search architecture for high dimensional expression data, which is massively parallel as well as light-weight, thus infinitely scalable. In CellAtlasSearch, we use a Graphical Processing Unit (GPU) friendly version of Locality Sensitive Hashing (LSH) for unmatched speedup in data processing and query. Currently, CellAtlasSearch features over 300 000 reference expression profiles including both bulk and single-cell data. It enables the user query individual single cell transcriptomes and finds matching samples from the database along with necessary meta information. CellAtlasSearch aims to assist researchers and clinicians in characterizing unannotated single cells. It also facilitates noise free, low dimensional representation of single-cell expression profiles by projecting them on a wide variety of reference samples. The web-server is accessible at: http://www.cellatlassearch.com.


Assuntos
Perfilação da Expressão Gênica/métodos , Ferramenta de Busca , Análise de Célula Única/métodos , Animais , Linhagem Celular , Humanos , Internet , Camundongos , Células Neoplásicas Circulantes/metabolismo , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA