Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
BioData Min ; 16(1): 13, 2023 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-36973746

RESUMO

MOTIVATION: Clustering of genetic sequences is one of the key parts of bioinformatics analyses. Resulting phylogenetic trees are beneficial for solving many research questions, including tracing the history of species, studying migration in the past, or tracing a source of a virus outbreak. At the same time, biologists provide more data in the raw form of reads or only on contig-level assembly. Therefore, tools that are able to process those data without supervision need to be developed. RESULTS: In this paper, we present a tool for reference-free phylogeny capable of handling data where no mature-level assembly is available. The tool allows distance calculation for raw reads, contigs, and the combination of the latter. The tool provides an estimation of the Levenshtein distance between the sequences, which in turn estimates the number of mutations between the organisms. Compared to the previous research, the novelty of the method lies in a newly proposed combination of the read and contig measures, a new method for read-contig mapping, and an efficient embedding of contigs.

2.
BioData Min ; 13: 13, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32905086

RESUMO

BACKGROUND: Identification of non-trivial and meaningful patterns in omics data is one of the most important biological tasks. The patterns help to better understand biological systems and interpret experimental outcomes. A well-established method serving to explain such biological data is Gene Set Enrichment Analysis. However, this type of analysis is restricted to a specific type of evaluation. Abstracting from details, the analyst provides a sorted list of genes and ontological annotations of the individual genes; the method outputs a subset of ontological terms enriched in the gene list. Here, in contrary to enrichment analysis, we introduce a new tool/framework that allows for the induction of more complex patterns of 2-dimensional binary omics data. This extension allows to discover and describe semantically coherent biclusters. RESULTS: We present a new rapid method called sem1R that reveals interpretable hidden rules in omics data. These rules capture semantic differences between two classes: a target class as a collection of positive examples and a non-target class containing negative examples. The method is inspired by the CN2 rule learner and introduces a new refinement operator that exploits prior knowledge in the form of ontologies. In our work this knowledge serves to create accurate and interpretable rules. The novel refinement operator uses two reduction procedures: Redundant Generalization and Redundant Non-potential, both of which help to dramatically prune the rule space and consequently, speed-up the entire process of rule induction in comparison with the traditional refinement operator as is presented in CN2. CONCLUSIONS: Efficiency and effectivity of the novel refinement operator were tested on three real different gene expression datasets. Concretely, the Dresden Ovary Dataset, DISC, and m2816 were employed. The experiments show that the ontology-based refinement operator speeds-up the pattern induction drastically. The algorithm is written in C++ and is published as an R package available at http://github.com/fmalinka/sem1r.

3.
BMC Genomics ; 18(Suppl 7): 752, 2017 10 16.
Artigo em Inglês | MEDLINE | ID: mdl-29513193

RESUMO

BACKGROUND: One of the major challenges in the analysis of gene expression data is to identify local patterns composed of genes showing coherent expression across subsets of experimental conditions. Such patterns may provide an understanding of underlying biological processes related to these conditions. This understanding can further be improved by providing concise characterizations of the genes and situations delimiting the pattern. RESULTS: We propose a method called semantic biclustering with the aim to detect interpretable rectangular patterns in binary data matrices. As usual in biclustering, we seek homogeneous submatrices, however, we also require that the included elements can be jointly described in terms of semantic annotations pertaining to both rows (genes) and columns (samples). To find such interpretable biclusters, we explore two strategies. The first endows an existing biclustering algorithm with the semantic ingredients. The other is based on rule and tree learning known from machine learning. CONCLUSIONS: The two alternatives are tested in experiments with two Drosophila melanogaster gene expression datasets. Both strategies are shown to detect sets of compact biclusters with semantic descriptions that also remain largely valid for unseen (testing) data. This desirable generalization aspect is more emphasized in the strategy stemming from conventional biclustering although this is traded off by the complexity of the descriptions (number of ontology terms employed), which, on the other hand, is lower for the alternative strategy.


Assuntos
Mineração de Dados/métodos , Perfilação da Expressão Gênica , Semântica , Animais , Análise por Conglomerados , Drosophila melanogaster/genética , Aprendizado de Máquina , Anotação de Sequência Molecular
4.
BMC Bioinformatics ; 16: 348, 2015 Oct 28.
Artigo em Inglês | MEDLINE | ID: mdl-26511329

RESUMO

BACKGROUND: Set-level classification of gene expression data has received significant attention recently. In this setting, high-dimensional vectors of features corresponding to genes are converted into lower-dimensional vectors of features corresponding to biologically interpretable gene sets. The dimensionality reduction brings the promise of a decreased risk of overfitting, potentially resulting in improved accuracy of the learned classifiers. However, recent empirical research has not confirmed this expectation. Here we hypothesize that the reported unfavorable classification results in the set-level framework were due to the adoption of unsuitable gene sets defined typically on the basis of the Gene ontology and the KEGG database of metabolic networks. We explore an alternative approach to defining gene sets, based on regulatory interactions, which we expect to collect genes with more correlated expression. We hypothesize that such more correlated gene sets will enable to learn more accurate classifiers. METHODS: We define two families of gene sets using information on regulatory interactions, and evaluate them on phenotype-classification tasks using public prokaryotic gene expression data sets. From each of the two gene-set families, we first select the best-performing subtype. The two selected subtypes are then evaluated on independent (testing) data sets against state-of-the-art gene sets and against the conventional gene-level approach. RESULTS: The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. CONCLUSION: Novel gene sets defined on the basis of regulatory interactions improve set-level classification of gene expression data. The experimental scripts and other material needed to reproduce the experiments are available at http://ida.felk.cvut.cz/novelgenesets.tar.gz.


Assuntos
Regulação da Expressão Gênica , Células Procarióticas/metabolismo , Ontologia Genética , Aprendizado de Máquina , Redes e Vias Metabólicas/genética , Óperon/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
5.
Transplantation ; 97(2): 176-83, 2014 Jan 27.
Artigo em Inglês | MEDLINE | ID: mdl-24092381

RESUMO

BACKGROUND: Delayed graft function (DGF) caused by ischemia/reperfusion injury (I/RI) negatively influences the outcome of kidney transplantation. This prospective single-center study characterized the intrarenal transcriptome during I/RI as a means of identifying genes associated with DGF development. METHODS: Characterization of the intrarenal transcription profile associated with I/RI was carried out on three sequential graft biopsies from respective allografts before and during transplantation. The intragraft expression of 92 candidate genes was measured using quantitative real-time reverse transcriptase polymerase chain reaction (2) in delayed (n=9) and primary function allografts (n=26). RESULTS: Cold storage was not associated with significant changes to the expression profile of the target gene transcripts; however, up-regulation of 16 genes associated with enhanced activation of innate and adaptive immune responses and apoptosis was observed after reperfusion. Multivariate logistic regression analysis revealed that higher tubular atrophy scores (ct) together with a lower expression of Netrin-1 might predict DGF development (training area under the receiver operating curve=0.89, cross-validated area under the receiver operating curve=0.81). CONCLUSIONS: Poor baseline tubular cell quality (defined by a higher rate of tubular atrophy) combined with the reduced potential of apoptotic survival factors represented by decreased Netrin-1 gene expression were associated with delayed kidney graft function.


Assuntos
Função Retardada do Enxerto/etiologia , Transplante de Rim/efeitos adversos , Túbulos Renais/patologia , Fatores de Crescimento Neural/genética , Proteínas Supressoras de Tumor/genética , Atrofia , Biópsia , Função Retardada do Enxerto/metabolismo , Função Retardada do Enxerto/patologia , Regulação da Expressão Gênica , Humanos , Imuno-Histoquímica , Modelos Logísticos , Fatores de Crescimento Neural/análise , Netrina-1 , Análise de Componente Principal , Estudos Prospectivos , Traumatismo por Reperfusão/complicações , Proteínas Supressoras de Tumor/análise
6.
Proteome Sci ; 10(1): 66, 2012 Nov 12.
Artigo em Inglês | MEDLINE | ID: mdl-23146001

RESUMO

BACKGROUND: The process of protein-DNA binding has an essential role in the biological processing of genetic information. We use relational machine learning to predict DNA-binding propensity of proteins from their structures. Automatically discovered structural features are able to capture some characteristic spatial configurations of amino acids in proteins. RESULTS: Prediction based only on structural relational features already achieves competitive results to existing methods based on physicochemical properties on several protein datasets. Predictive performance is further improved when structural features are combined with physicochemical features. Moreover, the structural features provide some insights not revealed by physicochemical features. Our method is able to detect common spatial substructures. We demonstrate this in experiments with zinc finger proteins. CONCLUSIONS: We introduced a novel approach for DNA-binding propensity prediction using relational machine learning which could potentially be used also for protein function prediction in general.

7.
BMC Bioinformatics ; 13 Suppl 10: S15, 2012 Jun 25.
Artigo em Inglês | MEDLINE | ID: mdl-22759420

RESUMO

BACKGROUND: Analysis of gene expression data in terms of a priori-defined gene sets has recently received significant attention as this approach typically yields more compact and interpretable results than those produced by traditional methods that rely on individual genes. The set-level strategy can also be adopted with similar benefits in predictive classification tasks accomplished with machine learning algorithms. Initial studies into the predictive performance of set-level classifiers have yielded rather controversial results. The goal of this study is to provide a more conclusive evaluation by testing various components of the set-level framework within a large collection of machine learning experiments. RESULTS: Genuine curated gene sets constitute better features for classification than sets assembled without biological relevance. For identifying the best gene sets for classification, the Global test outperforms the gene-set methods GSEA and SAM-GS as well as two generic feature selection methods. To aggregate expressions of genes into a feature value, the singular value decomposition (SVD) method as well as the SetSig technique improve on simple arithmetic averaging. Set-level classifiers learned with 10 features constituted by the Global test slightly outperform baseline gene-level classifiers learned with all original data features although they are slightly less accurate than gene-level classifiers learned with a prior feature-selection step. CONCLUSION: Set-level classifiers do not boost predictive accuracy, however, they do achieve competitive accuracy if learned with the right combination of ingredients. AVAILABILITY: Open-source, publicly available software was used for classifier learning and testing. The gene expression datasets and the gene set database used are also publicly available. The full tabulation of experimental results is available at http://ida.felk.cvut.cz/CESLT.


Assuntos
Algoritmos , Inteligência Artificial , Perfilação da Expressão Gênica/métodos , Teorema de Bayes , Biologia Computacional/métodos , Árvores de Decisões , Máquina de Vetores de Suporte
8.
BMC Bioinformatics ; 13 Suppl 10: S3, 2012 Jun 25.
Artigo em Inglês | MEDLINE | ID: mdl-22759427

RESUMO

We contribute a novel, ball-histogram approach to DNA-binding propensity prediction of proteins. Unlike state-of-the-art methods based on constructing an ad-hoc set of features describing physicochemical properties of the proteins, the ball-histogram technique enables a systematic, Monte-Carlo exploration of the spatial distribution of amino acids complying with automatically selected properties. This exploration yields a model for the prediction of DNA binding propensity. We validate our method in prediction experiments, improving on state-of-the-art accuracies. Moreover, our method also provides interpretable features involving spatial distributions of selected amino acids.


Assuntos
Biologia Computacional/métodos , DNA/análise , Proteínas/análise , Algoritmos , Aminoácidos/análise , Modelos Teóricos , Método de Monte Carlo , Ligação Proteica , Estrutura Secundária de Proteína
9.
Transplantation ; 93(6): 589-96, 2012 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-22334040

RESUMO

BACKGROUND: Induction therapy is associated with excellent short-term kidney graft outcome. The aim of this study was to evaluate differences in the intragraft transcriptome after successful induction therapy using two rabbit antithymocyte globulins. METHODS: The expression of 376 target genes involved in tolerance, inflammation, T- and B-cell immune response, and apoptosis was evaluated using the quantitative real-time reverse-transcriptase polymerase chain reaction (2(-ΔΔCt)) method in kidney graft biopsies with normal histological findings and stable renal function, 3 months posttransplantation after induction therapy with Thymoglobulin, ATG-Fresenius S (ATG-F), and a control group without induction therapy. RESULTS: The transcriptional pattern induced by Thymoglobulin differed from ATG-F in 18 differentially expressed genes. Down-regulation of genes involved in the nuclear factor-κB pathway (TLR4, MYD88, and CD209), costimulation (CD80 and CTLA4), apoptosis (NLRP1), chemoattraction (CCR10), and dendritic cell function (CLEC4C) was observed in the biopsies from patients treated with Thymoglobulin. A hierarchical clustering analysis clearly separated the Thymoglobulin group from the ATG-F group, while the control group had a similar profile as the Thymoglobulin group. CONCLUSIONS: Despite normal morphology in graft biopsy taken 3 months posttransplantation, the intrarenal transcriptome differed in patients treated with induction therapy using different rATGs. In the Thymoglobulin high-risk group, the transcriptome profile was identical to the low-risk group. Therefore, the down-regulation of the nuclear factor-κB pathway after Thymoglobulin induction in vivo is likely to explain the clinical success of this biologic.


Assuntos
Soro Antilinfocitário/farmacologia , Terapia de Imunossupressão/métodos , Transplante de Rim/imunologia , Transplante de Rim/fisiologia , NF-kappa B/metabolismo , Transdução de Sinais/efeitos dos fármacos , Transdução de Sinais/fisiologia , Adulto , Idoso , Animais , Apoptose , Biópsia , Regulação para Baixo/efeitos dos fármacos , Regulação para Baixo/imunologia , Regulação para Baixo/fisiologia , Feminino , Seguimentos , Perfilação da Expressão Gênica , Rejeição de Enxerto/imunologia , Rejeição de Enxerto/patologia , Rejeição de Enxerto/prevenção & controle , Humanos , Rim/metabolismo , Rim/patologia , Transplante de Rim/patologia , Masculino , Pessoa de Meia-Idade , NF-kappa B/genética , RNA Mensageiro/metabolismo , Coelhos , Transdução de Sinais/imunologia
10.
J Biomed Inform ; 37(4): 269-84, 2004 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-15465480

RESUMO

Finding disease markers (classifiers) from gene expression data by machine learning algorithms is characterized by a high risk of overfitting the data due the abundance of attributes (simultaneously measured gene expression values) and shortage of available examples (observations). To avoid this pitfall and achieve predictor robustness, state-of-the-art approaches construct complex classifiers that combine relatively weak contributions of up to thousands of genes (attributes) to classify a disease. The complexity of such classifiers limits their transparency and consequently the biological insights they can provide. The goal of this study is to apply to this domain the methodology of constructing simple yet robust logic-based classifiers amenable to direct expert interpretation. On two well-known, publicly available gene expression classification problems, the paper shows the feasibility of this approach, employing a recently developed subgroup discovery methodology. Some of the discovered classifiers allow for novel biological interpretations.


Assuntos
Algoritmos , Inteligência Artificial , Perfilação da Expressão Gênica/métodos , Neoplasias/diagnóstico , Neoplasias/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Reconhecimento Automatizado de Padrão/métodos , Biomarcadores Tumorais/genética , Diagnóstico por Computador/métodos , Marcadores Genéticos/genética , Testes Genéticos/métodos , Variação Genética/genética , Humanos , Modelos Genéticos , Neoplasias/classificação , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Análise de Sequência de DNA/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...