Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
BMC Bioinformatics ; 22(1): 557, 2021 Nov 19.
Artigo em Inglês | MEDLINE | ID: mdl-34798805

RESUMO

BACKGROUND: The research landscape of single-cell and single-nuclei RNA-sequencing is evolving rapidly. In particular, the area for the detection of rare cells was highly facilitated by this technology. However, an automated, unbiased, and accurate annotation of rare subpopulations is challenging. Once rare cells are identified in one dataset, it is usually necessary to generate further specific datasets to enrich the analysis (e.g., with samples from other tissues). From a machine learning perspective, the challenge arises from the fact that rare-cell subpopulations constitute an imbalanced classification problem. We here introduce a Machine Learning (ML)-based oversampling method that uses gene expression counts of already identified rare cells as an input to generate synthetic cells to then identify similar (rare) cells in other publicly available experiments. We utilize single-cell synthetic oversampling (sc-SynO), which is based on the Localized Random Affine Shadowsampling (LoRAS) algorithm. The algorithm corrects for the overall imbalance ratio of the minority and majority class. RESULTS: We demonstrate the effectiveness of our method for three independent use cases, each consisting of already published datasets. The first use case identifies cardiac glial cells in snRNA-Seq data (17 nuclei out of 8635). This use case was designed to take a larger imbalance ratio (~1 to 500) into account and only uses single-nuclei data. The second use case was designed to jointly use snRNA-Seq data and scRNA-Seq on a lower imbalance ratio (~1 to 26) for the training step to likewise investigate the potential of the algorithm to consider both single-cell capture procedures and the impact of "less" rare-cell types. The third dataset refers to the murine data of the Allen Brain Atlas, including more than 1 million cells. For validation purposes only, all datasets have also been analyzed traditionally using common data analysis approaches, such as the Seurat workflow. CONCLUSIONS: In comparison to baseline testing without oversampling, our approach identifies rare-cells with a robust precision-recall balance, including a high accuracy and low false positive detection rate. A practical benefit of our algorithm is that it can be readily implemented in other and existing workflows. The code basis in R and Python is publicly available at FairdomHub, as well as GitHub, and can easily be transferred to identify other rare-cell types.


Assuntos
RNA , Análise de Célula Única , Animais , Análise por Conglomerados , Aprendizado de Máquina , Camundongos , RNA/genética , Análise de Sequência de RNA
2.
J Allergy Clin Immunol ; 145(4): 1208-1218, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-31707051

RESUMO

BACKGROUND: Fifteen percent of atopic dermatitis (AD) liability-scale heritability could be attributed to 31 susceptibility loci identified by using genome-wide association studies, with only 3 of them (IL13, IL-6 receptor [IL6R], and filaggrin [FLG]) resolved to protein-coding variants. OBJECTIVE: We examined whether a significant portion of unexplained AD heritability is further explained by low-frequency and rare variants in the gene-coding sequence. METHODS: We evaluated common, low-frequency, and rare protein-coding variants using exome chip and replication genotype data of 15,574 patients and 377,839 control subjects combined with whole-transcriptome data on lesional, nonlesional, and healthy skin samples of 27 patients and 38 control subjects. RESULTS: An additional 12.56% (SE, 0.74%) of AD heritability is explained by rare protein-coding variation. We identified docking protein 2 (DOK2) and CD200 receptor 1 (CD200R1) as novel genome-wide significant susceptibility genes. Rare coding variants associated with AD are further enriched in 5 genes (IL-4 receptor [IL4R], IL13, Janus kinase 1 [JAK1], JAK2, and tyrosine kinase 2 [TYK2]) of the IL13 pathway, all of which are targets for novel systemic AD therapeutics. Multiomics-based network and RNA sequencing analysis revealed DOK2 as a central hub interacting with, among others, CD200R1, IL6R, and signal transducer and activator of transcription 3 (STAT3). Multitissue gene expression profile analysis for 53 tissue types from the Genotype-Tissue Expression project showed that disease-associated protein-coding variants exert their greatest effect in skin tissues. CONCLUSION: Our discoveries highlight a major role of rare coding variants in AD acting independently of common variants. Further extensive functional studies are required to detect all potential causal variants and to specify the contribution of the novel susceptibility genes DOK2 and CD200R1 to overall disease susceptibility.


Assuntos
Proteínas Adaptadoras de Transdução de Sinal/genética , Dermatite Atópica/genética , Genótipo , Receptores de Orexina/genética , Fosfoproteínas/genética , Pele/metabolismo , Adulto , Estudos de Coortes , Proteínas Filagrinas , Frequência do Gene , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Especificidade de Órgãos , Polimorfismo Genético , Risco , Transcriptoma
3.
JHEP Rep ; 6(2): 100988, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38304234

RESUMO

Background & Aims: Genetic and microbiome studies across patients with primary sclerosing cholangitis (PSC) and ulcerative colitis (UC) have indicated that UC in PSC is a separate disease entity to primary UC, but expression studies for PSC are lacking. Methods: We conducted whole blood RNA sequencing experiments for 495 patients with UC, 220 patients with PSC (including 177 with UC), and 320 healthy controls from Germany and Norway. Differential expression analyses, gene ontology and coexpression analyses and random forest machine learning were performed to identify genes, ontologies and transcriptional features that discriminate diagnoses. Results: The blood transcriptome in UC and PSC is dominated by neutrophil activation genes (e.g. S100A12). In UC, but not in PSC (neither PSC alone nor patients with an additional diagnosis of UC [PSC/UC]), ribosomal, mitochondrial, and energy metabolism genes are upregulated in conjunction with antibody transcript expression (MZB1, IGJ). In PSC, there is an increase in modules related to apoptosis and expression of genes of interferon-I-related ontologies. Random forest analysis could poorly discriminate PSC alone from PSC/UC (AUROC 0.56), but could discriminate PSC, UC, and controls with high accuracy (AUROC UC vs. controls 0.95, PSC vs. controls 0.88, UC vs. PSC 0.986). The main coexpression modules relevant for distinguishing PSC, UC, and controls are enriched in neutrophil degranulation and antibody production genes. Conclusions: Supported by machine learning results, PSC and UC appear to be separate entities on a molecular level, while PSC/UC and PSC are indistinguishable. Impact and implications: Clinical and genetic studies suggest that the colitis-like symptoms in primary sclerosing cholangitis (PSC) represent a different disease entity from primary ulcerative colitis (UC). The present study supports this assumption with transcriptomic data from whole blood and describes notable differences in gene expression between primary UC and PSC, providing insights into the still unclear pathophysiology of both diseases. These findings are of interest to scientists seeking to decipher the molecular pathophysiology of both diseases and provide evidence that a redefinition of the PSC-UC phenotype should be considered. The study practically supports future molecular research by providing a large transcriptomic whole blood reference cohort.

4.
Nutr Diabetes ; 12(1): 27, 2022 05 27.
Artigo em Inglês | MEDLINE | ID: mdl-35624098

RESUMO

BACKGROUND: Studies on Type-2 Diabetes Mellitus (T2DM) have revealed heterogeneous sub-populations in terms of underlying pathologies. However, the identification of sub-populations in epidemiological datasets remains unexplored. We here focus on the detection of T2DM clusters in epidemiological data, specifically analysing the National Family Health Survey-4 (NFHS-4) dataset from India containing a wide spectrum of features, including medical history, dietary and addiction habits, socio-economic and lifestyle patterns of 10,125 T2DM patients. METHODS: Epidemiological data provide challenges for analysis due to the diverse types of features in it. In this case, applying the state-of-the-art dimension reduction tool UMAP conventionally was found to be ineffective for the NFHS-4 dataset, which contains diverse feature types. We implemented a distributed clustering workflow combining different similarity measure settings of UMAP, for clustering continuous, ordinal and nominal features separately. We integrated the reduced dimensions from each feature-type-distributed clustering to obtain interpretable and unbiased clustering of the data. RESULTS: Our analysis reveals four significant clusters, with two of them comprising mainly of non-obese T2DM patients. These non-obese clusters have lower mean age and majorly comprises of rural residents. Surprisingly, one of the obese clusters had 90% of the T2DM patients practising a non-vegetarian diet though they did not show an increased intake of plant-based protein-rich foods. CONCLUSIONS: From a methodological perspective, we show that for diverse data types, frequent in epidemiological datasets, feature-type-distributed clustering using UMAP is effective as opposed to the conventional use of the UMAP algorithm. The application of UMAP-based clustering workflow for this type of dataset is novel in itself. Our findings demonstrate the presence of heterogeneity among Indian T2DM patients with regard to socio-demography and dietary patterns. From our analysis, we conclude that the existence of significant non-obese T2DM sub-populations characterized by younger age groups and economic disadvantage raises the need for different screening criteria for T2DM among rural Indian residents.


Assuntos
Diabetes Mellitus Tipo 2 , Aprendizado de Máquina não Supervisionado , Diabetes Mellitus Tipo 2/epidemiologia , Dieta , Humanos , Índia/epidemiologia , Obesidade
5.
J Pers Med ; 12(8)2022 Aug 04.
Artigo em Inglês | MEDLINE | ID: mdl-36013227

RESUMO

AI model development for synthetic data generation to improve Machine Learning (ML) methodologies is an integral part of research in Computer Science and is currently being transferred to related medical fields, such as Systems Medicine and Medical Informatics. In general, the idea of personalized decision-making support based on patient data has driven the motivation of researchers in the medical domain for more than a decade, but the overall sparsity and scarcity of data are still major limitations. This is in contrast to currently applied technology that allows us to generate and analyze patient data in diverse forms, such as tabular data on health records, medical images, genomics data, or even audio and video. One solution arising to overcome these data limitations in relation to medical records is the synthetic generation of tabular data based on real world data. Consequently, ML-assisted decision-support can be interpreted more conveniently, using more relevant patient data at hand. At a methodological level, several state-of-the-art ML algorithms generate and derive decisions from such data. However, there remain key issues that hinder a broad practical implementation in real-life clinical settings. In this review, we will give for the first time insights towards current perspectives and potential impacts of using synthetic data generation in palliative care screening because it is a challenging prime example of highly individualized, sparsely available patient information. Taken together, the reader will obtain initial starting points and suitable solutions relevant for generating and using synthetic data for ML-based screenings in palliative care and beyond.

6.
Commun Biol ; 5(1): 80, 2022 01 20.
Artigo em Inglês | MEDLINE | ID: mdl-35058554

RESUMO

Genetic correlations and an increased incidence of psychiatric disorders in inflammatory-bowel disease have been reported, but shared molecular mechanisms are unknown. We performed cross-tissue and multiple-gene conditioned transcriptome-wide association studies for 23 tissues of the gut-brain-axis using genome-wide association studies data sets (total 180,592 patients) for Crohn's disease, ulcerative colitis, primary sclerosing cholangitis, schizophrenia, bipolar disorder, major depressive disorder and attention-deficit/hyperactivity disorder. We identified NR5A2, SATB2, and PPP3CA (encoding a target for calcineurin inhibitors in refractory ulcerative colitis) as shared susceptibility genes with transcriptome-wide significance both for Crohn's disease, ulcerative colitis and schizophrenia, largely explaining fine-mapped association signals at nearby genome-wide association study susceptibility loci. Analysis of bulk and single-cell RNA-sequencing data showed that PPP3CA expression was strongest in neurons and in enteroendocrine and Paneth-like cells of the ileum, colon, and rectum, indicating a possible link to the gut-brain-axis. PPP3CA together with three further suggestive loci can be linked to calcineurin-related signaling pathways such as NFAT activation or Wnt.


Assuntos
Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Doenças Inflamatórias Intestinais/genética , Esquizofrenia/genética , Transcriptoma , Eixo Encéfalo-Intestino/fisiologia , Humanos , Doenças Inflamatórias Intestinais/metabolismo , Sistema de Registros , Esquizofrenia/metabolismo , Distribuição Tecidual
7.
Biomolecules ; 11(11)2021 10 27.
Artigo em Inglês | MEDLINE | ID: mdl-34827589

RESUMO

For any molecule, network, or process of interest, keeping up with new publications on these is becoming increasingly difficult. For many cellular processes, the amount molecules and their interactions that need to be considered can be very large. Automated mining of publications can support large-scale molecular interaction maps and database curation. Text mining and Natural-Language-Processing (NLP)-based techniques are finding their applications in mining the biological literature, handling problems such as Named Entity Recognition (NER) and Relationship Extraction (RE). Both rule-based and Machine-Learning (ML)-based NLP approaches have been popular in this context, with multiple research and review articles examining the scope of such models in Biological Literature Mining (BLM). In this review article, we explore self-attention-based models, a special type of Neural-Network (NN)-based architecture that has recently revitalized the field of NLP, applied to biological texts. We cover self-attention models operating either at the sentence level or an abstract level, in the context of molecular interaction extraction, published from 2019 onwards. We conducted a comparative study of the models in terms of their architecture. Moreover, we also discuss some limitations in the field of BLM that identifies opportunities for the extraction of molecular interactions from biological text.


Assuntos
Mineração de Dados , Processamento de Linguagem Natural , Aprendizado de Máquina
8.
Cells ; 10(9)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-34571938

RESUMO

Promising efforts are ongoing to extend genomics resources for pikeperch (Sander lucioperca), a species of high interest for the sustainable European aquaculture sector. Although previous work, including reference genome assembly, transcriptome sequence, and single-nucleotide polymorphism genotyping, added a great wealth of genomic tools, a comprehensive characterization of gene expression across major tissues in pikeperch still remains an unmet research need. Here, we used deep RNA-Sequencing of ten vital tissues collected in eight animals to build a high-confident and annotated trancriptome atlas, to detect the tissue-specificity of gene expression and co-expression network modules, and to investigate genome-wide selective signatures in the Percidae fish family. Pathway enrichment and protein-protein interaction network analyses were performed to characterize the unique biological functions of tissue-specific genes and co-expression modules. We detected strong functional correlations and similarities of tissues with respect to their expression patterns-but also significant differences in the complexity and composition of their transcriptomes. Moreover, functional analyses revealed that tissue-specific genes essentially play key roles in the specific physiological functions of the respective tissues. Identified network modules were also functionally coherent with tissues' main physiological functions. Although tissue specificity was not associated with positive selection, several genes under selection were found to be involved in hypoxia, immunity, and gene regulation processes, that are crucial for fish adaption and welfare. Overall, these new resources and insights will not only enhance the understanding of mechanisms of organ biology in pikeperch, but also complement the amount of genomic resources for this commercial species.


Assuntos
Regulação da Expressão Gênica , Redes Reguladoras de Genes , Percas/metabolismo , Mapas de Interação de Proteínas , Seleção Genética , Transcriptoma , Animais , Genoma , Anotação de Sequência Molecular , Especificidade de Órgãos , Percas/genética
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa