Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
Bioinformatics ; 40(6)2024 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-38889273

RESUMO

MOTIVATION: Identifying rare cell types is an important task to capture the heterogeneity of single-cell data, such as scRNA-seq. The widespread availability of such data enables to aggregate multiple samples, corresponding for example to different donors, into the same study. Yet, such aggregated data is often subject to batch effects between samples. Clustering it therefore generally requires the use of data integration methods, which can lead to overcorrection, making the identification of rare cells difficult. We present scCross, a biclustering method identifying rare subpopulations of cells present across multiple single-cell samples. It jointly identifies a group of cells with specific marker genes by relying on a global sum criterion, computed over entire subpopulation of cells, rather than pairwise comparisons between individual cells. This proves robust with respect to the high variability of scRNA-seq data, in particular batch effects. RESULTS: We show through several case studies that scCross is able to identify rare subpopulations across multiple samples without performing prior data integration. Namely, it identifies a cilium subpopulation with potential new ciliary genes from lung cancer cells, which is not detected by typical alternatives. It also highlights rare subpopulations in human pancreas samples sequenced with different protocols, despite visible shifts in expression levels between batches. We further show that scCross outperforms typical alternatives at identifying a target rare cell type in a controlled experiment with artificially created batch effects. This shows the ability of scCross to efficiently identify rare cell subpopulations characterized by specific genes despite the presence of batch effects. AVAILABILITY AND IMPLEMENTATION: The R and Scala implementation of scCross is freely available on GitHub, at https://github.com/agerniers/scCross/. A snapshot of the code and the data underlying this article are available on Zenodo, at https://zenodo.org/doi/10.5281/zenodo.10471063.


Assuntos
Análise de Célula Única , Análise de Célula Única/métodos , Humanos , Software , Neoplasias Pulmonares/genética , Algoritmos , Análise por Conglomerados , Análise de Sequência de RNA/métodos
2.
Global Health ; 18(1): 41, 2022 04 18.
Artigo em Inglês | MEDLINE | ID: mdl-35436927

RESUMO

BACKGROUND: Assessing the impact of government responses to Covid-19 is crucial to contain the pandemic and improve preparedness for future crises. We investigate here the impact of non-pharmaceutical interventions (NPIs) and infection threats on the daily evolution of cross-border movements of people during the Covid-19 pandemic. We use a unique database on Facebook users' mobility, and rely on regression and machine learning models to identify the role of infection threats and containment policies. Permutation techniques allow us to compare the impact and predictive power of these two categories of variables. RESULTS: In contrast with studies on within-border mobility, our models point to a stronger importance of containment policies in explaining changes in cross-border traffic as compared with international travel bans and fears of being infected. The latter are proxied by the numbers of Covid-19 cases and deaths at destination. Although the ranking among coercive policies varies across modelling techniques, containment measures in the destination country (such as cancelling of events, restrictions on internal movements and public gatherings), and school closures in the origin country (influencing parental leaves) have the strongest impacts on cross-border movements. CONCLUSION: While descriptive in nature, our findings have policy-relevant implications. Cross-border movements of people predominantly consist of labor commuting flows and business travels. These economic and essential flows are marginally influenced by the fear of infection and international travel bans. They are mostly governed by the stringency of internal containment policies and the ability to travel.


Assuntos
COVID-19 , Mídias Sociais , COVID-19/epidemiologia , COVID-19/prevenção & controle , Europa (Continente)/epidemiologia , Humanos , Pandemias/prevenção & controle , Viagem
3.
Bioinformatics ; 32(17): i445-i454, 2016 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-27587661

RESUMO

MOTIVATION: Subtyping cancer is key to an improved and more personalized prognosis/treatment. The increasing availability of tumor related molecular data provides the opportunity to identify molecular subtypes in a data-driven way. Molecular subtypes are defined as groups of samples that have a similar molecular mechanism at the origin of the carcinogenesis. The molecular mechanisms are reflected by subtype-specific mutational and expression features. Data-driven subtyping is a complex problem as subtyping and identifying the molecular mechanisms that drive carcinogenesis are confounded problems. Many current integrative subtyping methods use global mutational and/or expression tumor profiles to group tumor samples in subtypes but do not explicitly extract the subtype-specific features. We therefore present a method that solves both tasks of subtyping and identification of subtype-specific features simultaneously. Hereto our method integrates` mutational and expression data while taking into account the clonal properties of carcinogenesis. Key to our method is a formalization of the problem as a rank matrix factorization of ranked data that approaches the subtyping problem as multi-view bi-clustering RESULTS: We introduce a novel integrative framework to identify subtypes by combining mutational and expression features. The incomparable measurement data is integrated by transformation into ranked data and subtypes are defined as multi-view bi-clusters We formalize the model using rank matrix factorization, resulting in the SRF algorithm. Experiments on simulated data and the TCGA breast cancer data demonstrate that SRF is able to capture subtle differences that existing methods may miss. AVAILABILITY AND IMPLEMENTATION: The implementation is available at: https://github.com/rankmatrixfactorisation/SRF CONTACT: kathleen.marchal@intec.ugent.be, siegfried.nijssen@cs.kuleuven.be SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Neoplasias da Mama/genética , Mutação , Algoritmos , Carcinogênese , Análise por Conglomerados , Estudos de Associação Genética , Humanos , Prognóstico
4.
Nucleic Acids Res ; 40(12): e90, 2012 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-22422841

RESUMO

Computationally retrieving biologically relevant cis-regulatory modules (CRMs) is not straightforward. Because of the large number of candidates and the imperfection of the screening methods, many spurious CRMs are detected that are as high scoring as the biologically true ones. Using ChIP-information allows not only to reduce the regions in which the binding sites of the assayed transcription factor (TF) should be located, but also allows restricting the valid CRMs to those that contain the assayed TF (here referred to as applying CRM detection in a query-based mode). In this study, we show that exploiting ChIP-information in a query-based way makes in silico CRM detection a much more feasible endeavor. To be able to handle the large datasets, the query-based setting and other specificities proper to CRM detection on ChIP-Seq based data, we developed a novel powerful CRM detection method 'CPModule'. By applying it on a well-studied ChIP-Seq data set involved in self-renewal of mouse embryonic stem cells, we demonstrate how our tool can recover combinatorial regulation of five known TFs that are key in the self-renewal of mouse embryonic stem cells. Additionally, we make a number of new predictions on combinatorial regulation of these five key TFs with other TFs documented in TRANSFAC.


Assuntos
Imunoprecipitação da Cromatina , Elementos Reguladores de Transcrição , Software , Algoritmos , Animais , Simulação por Computador , Células-Tronco Embrionárias/metabolismo , Regulação da Expressão Gênica , Camundongos , Motivos de Nucleotídeos , Análise de Sequência de DNA , Fatores de Transcrição/metabolismo
5.
Front Artif Intell ; 6: 1124553, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37565044

RESUMO

This article provides a birds-eye view on the role of decision trees in machine learning and data science over roughly four decades. It sketches the evolution of decision tree research over the years, describes the broader context in which the research is situated, and summarizes strengths and weaknesses of decision trees in this context. The main goal of the article is to clarify the broad relevance to machine learning and artificial intelligence, both practical and theoretical, that decision trees still have today.

6.
J Chem Inf Model ; 46(2): 597-605, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16562988

RESUMO

Substructure mining algorithms are important drug discovery tools since they can find substructures that affect physicochemical and biological properties. Current methods, however, only consider a part of all chemical information that is present within a data set of compounds. Therefore, the overall aim of our study was to enable more exhaustive data mining by designing methods that detect all substructures of any size, shape, and level of chemical detail. A means of chemical representation was developed that uses atomic hierarchies, thus enabling substructure mining to consider general and/or highly specific features. As a proof-of-concept, the efficient, multipurpose graph mining system Gaston learned substructures of any size and shape from a mutagenicity data set that was represented in this manner. From these substructures, we extracted a set of only six nonredundant, discriminative substructures that represent relevant biochemical knowledge. Our results demonstrate the individual and synergistic importance of elaborate chemical representation and mining for nonlinear substructures. We conclude that the combination of elaborate chemical representation and Gaston provides an excellent method for 2D substructure mining as this recipe systematically explores all substructures in different levels of chemical detail.


Assuntos
Bases de Dados como Assunto , Modelos Teóricos , Mutagênicos/química , Preparações Farmacêuticas/química , Relação Estrutura-Atividade , Estrutura Molecular
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA