Búsqueda | Biblioteca Virtual en Salud Fronteriza

1.

CamurWeb: a classification software and a large knowledge base for gene expression data of cancer.

Weitschek, Emanuel; Lauro, Silvia Di; Cappelli, Eleonora; Bertolazzi, Paola; Felici, Giovanni.

BMC Bioinformatics ; 19(Suppl 10): 354, 2018 Oct 15.

Artículo en Inglés | MEDLINE | ID: mdl-30367574

RESUMEN

BACKGROUND: The high growth of Next Generation Sequencing data currently demands new knowledge extraction methods. In particular, the RNA sequencing gene expression experimental technique stands out for case-control studies on cancer, which can be addressed with supervised machine learning techniques able to extract human interpretable models composed of genes, and their relation to the investigated disease. State of the art rule-based classifiers are designed to extract a single classification model, possibly composed of few relevant genes. Conversely, we aim to create a large knowledge base composed of many rule-based models, and thus determine which genes could be potentially involved in the analyzed tumor. This comprehensive and open access knowledge base is required to disseminate novel insights about cancer. RESULTS: We propose CamurWeb, a new method and web-based software that is able to extract multiple and equivalent classification models in form of logic formulas ("if then" rules) and to create a knowledge base of these rules that can be queried and analyzed. The method is based on an iterative classification procedure and an adaptive feature elimination technique that enables the computation of many rule-based models related to the cancer under study. Additionally, CamurWeb includes a user friendly interface for running the software, querying the results, and managing the performed experiments. The user can create her profile, upload her gene expression data, run the classification analyses, and interpret the results with predefined queries. In order to validate the software we apply it to all public available RNA sequencing datasets from The Cancer Genome Atlas database obtaining a large open access knowledge base about cancer. CamurWeb is available at http://bioinformatics.iasi.cnr.it/camurweb . CONCLUSIONS: The experiments prove the validity of CamurWeb, obtaining many classification models and thus several genes that are associated to 21 different cancer types. Finally, the comprehensive knowledge base about cancer and the software tool are released online; interested researchers have free access to them for further studies and to design biological experiments in cancer research.

Asunto(s)

Regulación Neoplásica de la Expresión Génica , Bases del Conocimiento , Neoplasias/genética , Programas Informáticos , Secuencia de Bases , Genes Relacionados con las Neoplasias , Genoma Humano , Humanos , Análisis de Secuencia de ARN

2.

Combining EEG signal processing with supervised methods for Alzheimer's patients classification.

Fiscon, Giulia; Weitschek, Emanuel; Cialini, Alessio; Felici, Giovanni; Bertolazzi, Paola; De Salvo, Simona; Bramanti, Alessia; Bramanti, Placido; De Cola, Maria Cristina.

BMC Med Inform Decis Mak ; 18(1): 35, 2018 05 31.

Artículo en Inglés | MEDLINE | ID: mdl-29855305

RESUMEN

BACKGROUND: Alzheimer's Disease (AD) is a neurodegenaritive disorder characterized by a progressive dementia, for which actually no cure is known. An early detection of patients affected by AD can be obtained by analyzing their electroencephalography (EEG) signals, which show a reduction of the complexity, a perturbation of the synchrony, and a slowing down of the rhythms. METHODS: In this work, we apply a procedure that exploits feature extraction and classification techniques to EEG signals, whose aim is to distinguish patient affected by AD from the ones affected by Mild Cognitive Impairment (MCI) and healthy control (HC) samples. Specifically, we perform a time-frequency analysis by applying both the Fourier and Wavelet Transforms on 109 samples belonging to AD, MCI, and HC classes. The classification procedure is designed with the following steps: (i) preprocessing of EEG signals; (ii) feature extraction by means of the Discrete Fourier and Wavelet Transforms; and (iii) classification with tree-based supervised methods. RESULTS: By applying our procedure, we are able to extract reliable human-interpretable classification models that allow to automatically assign the patients into their belonging class. In particular, by exploiting a Wavelet feature extraction we achieve 83%, 92%, and 79% of accuracy when dealing with HC vs AD, HC vs MCI, and MCI vs AD classification problems, respectively. CONCLUSIONS: Finally, by comparing the classification performances with both feature extraction methods, we find out that Wavelets analysis outperforms Fourier. Hence, we suggest it in combination with supervised methods for automatic patients classification based on their EEG signals for aiding the medical diagnosis of dementia.

Asunto(s)

Enfermedad de Alzheimer/diagnóstico , Clasificación/métodos , Disfunción Cognitiva/diagnóstico , Electroencefalografía/métodos , Procesamiento de Señales Asistido por Computador , Anciano , Anciano de 80 o más Años , Enfermedad de Alzheimer/fisiopatología , Disfunción Cognitiva/fisiopatología , Femenino , Humanos , Masculino , Persona de Mediana Edad

3.

TCGA2BED: extracting, extending, integrating, and querying The Cancer Genome Atlas.

Cumbo, Fabio; Fiscon, Giulia; Ceri, Stefano; Masseroli, Marco; Weitschek, Emanuel.

BMC Bioinformatics ; 18(1): 6, 2017 Jan 03.

Artículo en Inglés | MEDLINE | ID: mdl-28049410

RESUMEN

BACKGROUND: Data extraction and integration methods are becoming essential to effectively access and take advantage of the huge amounts of heterogeneous genomics and clinical data increasingly available. In this work, we focus on The Cancer Genome Atlas, a comprehensive archive of tumoral data containing the results of high-throughout experiments, mainly Next Generation Sequencing, for more than 30 cancer types. RESULTS: We propose TCGA2BED a software tool to search and retrieve TCGA data, and convert them in the structured BED format for their seamless use and integration. Additionally, it supports the conversion in CSV, GTF, JSON, and XML standard formats. Furthermore, TCGA2BED extends TCGA data with information extracted from other genomic databases (i.e., NCBI Entrez Gene, HGNC, UCSC, and miRBase). We also provide and maintain an automatically updated data repository with publicly available Copy Number Variation, DNA-methylation, DNA-seq, miRNA-seq, and RNA-seq (V1,V2) experimental data of TCGA converted into the BED format, and their associated clinical and biospecimen meta data in attribute-value text format. CONCLUSIONS: The availability of the valuable TCGA data in BED format reduces the time spent in taking advantage of them: it is possible to efficiently and effectively deal with huge amounts of cancer genomic data integratively, and to search, retrieve and extend them with additional information. The BED format facilitates the investigators allowing several knowledge discovery analyses on all tumor types in TCGA with the final aim of understanding pathological mechanisms and aiding cancer treatments.

Asunto(s)

Neoplasias/genética , Interfaz Usuario-Computador , Variaciones en el Número de Copia de ADN , Metilación de ADN , Bases de Datos Genéticas , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Internet , MicroARNs/química , MicroARNs/metabolismo , Neoplasias/patología , Análisis de Secuencia de ADN

4.

CAMUR: Knowledge extraction from RNA-seq cancer data through equivalent classification rules.

Cestarelli, Valerio; Fiscon, Giulia; Felici, Giovanni; Bertolazzi, Paola; Weitschek, Emanuel.

Bioinformatics ; 32(5): 697-704, 2016 03 01.

Artículo en Inglés | MEDLINE | ID: mdl-26519501

RESUMEN

MOTIVATION: Nowadays, knowledge extraction methods from Next Generation Sequencing data are highly requested. In this work, we focus on RNA-seq gene expression analysis and specifically on case-control studies with rule-based supervised classification algorithms that build a model able to discriminate cases from controls. State of the art algorithms compute a single classification model that contains few features (genes). On the contrary, our goal is to elicit a higher amount of knowledge by computing many classification models, and therefore to identify most of the genes related to the predicted class. RESULTS: We propose CAMUR, a new method that extracts multiple and equivalent classification models. CAMUR iteratively computes a rule-based classification model, calculates the power set of the genes present in the rules, iteratively eliminates those combinations from the data set, and performs again the classification procedure until a stopping criterion is verified. CAMUR includes an ad-hoc knowledge repository (database) and a querying tool.We analyze three different types of RNA-seq data sets (Breast, Head and Neck, and Stomach Cancer) from The Cancer Genome Atlas (TCGA) and we validate CAMUR and its models also on non-TCGA data. Our experimental results show the efficacy of CAMUR: we obtain several reliable equivalent classification models, from which the most frequent genes, their relationships, and the relation with a particular cancer are deduced. AVAILABILITY AND IMPLEMENTATION: dmb.iasi.cnr.it/camur.php CONTACT: emanuel@iasi.cnr.it SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Neoplasias , Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , ARN , Análisis de Secuencia de ARN

5.

Classification of selectively constrained DNA elements using feature vectors and rule-based classifiers.

Polychronopoulos, Dimitris; Weitschek, Emanuel; Dimitrieva, Slavica; Bucher, Philipp; Felici, Giovanni; Almirantis, Yannis.

Genomics ; 104(2): 79-86, 2014 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-25058025

RESUMEN

Scarce work has been done in the analysis of the composition of conserved non-coding elements (CNEs) that are identified by comparisons of two or more genomes and are found to exist in all metazoan genomes. Here we present the analysis of CNEs with a methodology that takes into account word occurrence at various lengths scales in the form of feature vector representation and rule based classifiers. We implement our approach on both protein-coding exons and CNEs, originating from human, insect (Drosophila melanogaster) and worm (Caenorhabditis elegans) genomes, that are either identified in the present study or obtained from the literature. Alignment free feature vector representation of sequences combined with rule-based classification methods leads to successful classification of the different CNEs classes. Biologically meaningful results are derived by comparison with the genomic signatures approach, and classification rates for a variety of functional elements of the genomes along with surrogates are presented.

Asunto(s)

Caenorhabditis elegans/genética , ADN Intergénico/genética , Drosophila melanogaster/genética , Análisis de Secuencia de ADN/métodos , Animales , Secuencia Conservada/genética , Evolución Molecular , Exones , Genómica , Humanos , Alineación de Secuencia

6.

Human polyomaviruses identification by logic mining techniques.

Weitschek, Emanuel; Lo Presti, Alessandra; Drovandi, Guido; Felici, Giovanni; Ciccozzi, Massimo; Ciotti, Marco; Bertolazzi, Paola.

Virol J ; 9: 58, 2012 Mar 02.

Artículo en Inglés | MEDLINE | ID: mdl-22385517

RESUMEN

BACKGROUND: Differences in genomic sequences are crucial for the classification of viruses into different species. In this work, viral DNA sequences belonging to the human polyomaviruses BKPyV, JCPyV, KIPyV, WUPyV, and MCPyV are analyzed using a logic data mining method in order to identify the nucleotides which are able to distinguish the five different human polyomaviruses. RESULTS: The approach presented in this work is successful as it discovers several logic rules that effectively characterize the different five studied polyomaviruses. The individuated logic rules are able to separate precisely one viral type from the other and to assign an unknown DNA sequence to one of the five analyzed polyomaviruses. CONCLUSIONS: The data mining analysis is performed by considering the complete sequences of the viruses and the sequences of the different gene regions separately, obtaining in both cases extremely high correct recognition rates.

Asunto(s)

Biología Computacional/métodos , ADN Viral/química , Minería de Datos , Poliomavirus/clasificación , Poliomavirus/genética , Secuencia de Bases , Humanos

7.

Learning to classify species with barcodes.

Bertolazzi, Paola; Felici, Giovanni; Weitschek, Emanuel.

BMC Bioinformatics ; 10 Suppl 14: S7, 2009 Nov 10.

Artículo en Inglés | MEDLINE | ID: mdl-19900303

RESUMEN

BACKGROUND: According to many field experts, specimens classification based on morphological keys needs to be supported with automated techniques based on the analysis of DNA fragments. The most successful results in this area are those obtained from a particular fragment of mitochondrial DNA, the gene cytochrome c oxidase I (COI) (the "barcode"). Since 2004 the Consortium for the Barcode of Life (CBOL) promotes the collection of barcode specimens and the development of methods to analyze the barcode for several tasks, among which the identification of rules to correctly classify an individual into its species by reading its barcode. RESULTS: We adopt a Logic Mining method based on two optimization models and present the results obtained on two datasets where a number of COI fragments are used to describe the individuals that belong to different species. The method proposed exhibits high correct recognition rates on a training-testing split of the available data using a small proportion of the information available (e.g., correct recognition approx. 97% when only 20 sites of the 648 available are used). The method is able to provide compact formulas on the values (A, C, G, T) at the selected sites that synthesize the characteristic of each species, a relevant information for taxonomists. CONCLUSION: We have presented a Logic Mining technique designed to analyze barcode data and to provide detailed output of interest to the taxonomists and the barcode community represented in the CBOL Consortium. The method has proven to be effective, efficient and precise.

Asunto(s)

Clasificación/métodos , Biología Computacional/métodos , Procesamiento Automatizado de Datos , Análisis de Secuencia de ADN/métodos , Animales , Humanos

8.

Combining DNA methylation and RNA sequencing data of cancer for supervised knowledge extraction.

Cappelli, Eleonora; Felici, Giovanni; Weitschek, Emanuel.

BioData Min ; 11: 22, 2018.

Artículo en Inglés | MEDLINE | ID: mdl-30386434

RESUMEN

BACKGROUND: In the Next Generation Sequencing (NGS) era a large amount of biological data is being sequenced, analyzed, and stored in many public databases, whose interoperability is often required to allow an enhanced accessibility. The combination of heterogeneous NGS genomic data is an open challenge: the analysis of data from different experiments is a fundamental practice for the study of diseases. In this work, we propose to combine DNA methylation and RNA sequencing NGS experiments at gene level for supervised knowledge extraction in cancer. METHODS: We retrieve DNA methylation and RNA sequencing datasets from The Cancer Genome Atlas (TCGA), focusing on the Breast Invasive Carcinoma (BRCA), the Thyroid Carcinoma (THCA), and the Kidney Renal Papillary Cell Carcinoma (KIRP). We combine the RNA sequencing gene expression values with the gene methylation quantity, as a new measure that we define for representing the methylation quantity associated to a gene. Additionally, we propose to analyze the combined data through tree- and rule-based classification algorithms (C4.5, Random Forest, RIPPER, and CAMUR). RESULTS: We extract more than 15,000 classification models (composed of gene sets), which allow to distinguish the tumoral samples from the normal ones with an average accuracy of 95%. From the integrated experiments we obtain about 5000 classification models that consider both the gene measures related to the RNA sequencing and the DNA methylation experiments. CONCLUSIONS: We compare the sets of genes obtained from the classifications on RNA sequencing and DNA methylation data with the genes obtained from the integration of the two experiments. The comparison results in several genes that are in common among the single experiments and the integrated ones (733 for BRCA, 35 for KIRP, and 861 for THCA) and 509 genes that are in common among the different experiments. Finally, we investigate the possible relationships among the different analyzed tumors by extracting a core set of 13 genes that appear in all tumors. A preliminary functional analysis confirms the relation of part of those genes (5 out of 13 and 279 out of 509) with cancer, suggesting to focus further studies on the new individuated ones.

9.

MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification.

Fiscon, Giulia; Weitschek, Emanuel; Cella, Eleonora; Lo Presti, Alessandra; Giovanetti, Marta; Babakir-Mina, Muhammed; Ciotti, Marco; Ciccozzi, Massimo; Pierangeli, Alessandra; Bertolazzi, Paola; Felici, Giovanni.

BioData Min ; 9: 38, 2016.

Artículo en Inglés | MEDLINE | ID: mdl-27980679

RESUMEN

BACKGROUND: Continuous improvements in next generation sequencing technologies led to ever-increasing collections of genomic sequences, which have not been easily characterized by biologists, and whose analysis requires huge computational effort. The classification of species emerged as one of the main applications of DNA analysis and has been addressed with several approaches, e.g., multiple alignments-, phylogenetic trees-, statistical- and character-based methods. RESULTS: We propose a supervised method based on a genetic algorithm to identify small genomic subsequences that discriminate among different species. The method identifies multiple subsequences of bounded length with the same information power in a given genomic region. The algorithm has been successfully evaluated through its integration into a rule-based classification framework and applied to three different biological data sets: Influenza, Polyoma, and Rhino virus sequences. CONCLUSIONS: We discover a large number of small subsequences that can be used to identify each virus type with high accuracy and low computational time, and moreover help to characterize different genomic regions. Bounding their length to 20, our method found 1164 characterizing subsequences for all the Influenza virus subtypes, 194 for all the Polyoma viruses, and 11 for Rhino viruses. The abundance of small separating subsequences extracted for each genomic region may be an important support for quick and robust virus identification. Finally, useful biological information can be derived by the relative location and abundance of such subsequences along the different regions.

10.

LAF: Logic Alignment Free and its application to bacterial genomes classification.

Weitschek, Emanuel; Cunial, Fabio; Felici, Giovanni.

BioData Min ; 8: 39, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-26664519

RESUMEN

Alignment-free algorithms can be used to estimate the similarity of biological sequences and hence are often applied to the phylogenetic reconstruction of genomes. Most of these algorithms rely on comparing the frequency of all the distinct substrings of fixed length (k-mers) that occur in the analyzed sequences. In this paper, we present Logic Alignment Free (LAF), a method that combines alignment-free techniques and rule-based classification algorithms in order to assign biological samples to their taxa. This method searches for a minimal subset of k-mers whose relative frequencies are used to build classification models as disjunctive-normal-form logic formulas (if-then rules). We apply LAF successfully to the classification of bacterial genomes to their corresponding taxonomy. In particular, we succeed in obtaining reliable classification at different taxonomic levels by extracting a handful of rules, each one based on the frequency of just few k-mers. State of the art methods to adjust the frequency of k-mers to the character distribution of the underlying genomes have negligible impact on classification performance, suggesting that the signal of each class is strong and that LAF is effective in identifying it.

11.

Supervised DNA Barcodes species classification: analysis, comparisons and results.

Weitschek, Emanuel; Fiscon, Giulia; Felici, Giovanni.

BioData Min ; 7(1): 4, 2014 Apr 11.

Artículo en Inglés | MEDLINE | ID: mdl-24721333

RESUMEN

BACKGROUND: Specific fragments, coming from short portions of DNA (e.g., mitochondrial, nuclear, and plastid sequences), have been defined as DNA Barcode and can be used as markers for organisms of the main life kingdoms. Species classification with DNA Barcode sequences has been proven effective on different organisms. Indeed, specific gene regions have been identified as Barcode: COI in animals, rbcL and matK in plants, and ITS in fungi. The classification problem assigns an unknown specimen to a known species by analyzing its Barcode. This task has to be supported with reliable methods and algorithms. METHODS: In this work the efficacy of supervised machine learning methods to classify species with DNA Barcode sequences is shown. The Weka software suite, which includes a collection of supervised classification methods, is adopted to address the task of DNA Barcode analysis. Classifier families are tested on synthetic and empirical datasets belonging to the animal, fungus, and plant kingdoms. In particular, the function-based method Support Vector Machines (SVM), the rule-based RIPPER, the decision tree C4.5, and the Naïve Bayes method are considered. Additionally, the classification results are compared with respect to ad-hoc and well-established DNA Barcode classification methods. RESULTS: A software that converts the DNA Barcode FASTA sequences to the Weka format is released, to adapt different input formats and to allow the execution of the classification procedure. The analysis of results on synthetic and real datasets shows that SVM and Naïve Bayes outperform on average the other considered classifiers, although they do not provide a human interpretable classification model. Rule-based methods have slightly inferior classification performances, but deliver the species specific positions and nucleotide assignments. On synthetic data the supervised machine learning methods obtain superior classification performances with respect to the traditional DNA Barcode classification methods. On empirical data their classification performances are at a comparable level to the other methods. CONCLUSIONS: The classification analysis shows that supervised machine learning methods are promising candidates for handling with success the DNA Barcoding species classification problem, obtaining excellent performances. To conclude, a powerful tool to perform species identification is now available to the DNA Barcoding community.

12.

Next generation sequencing reads comparison with an alignment-free distance.

Weitschek, Emanuel; Santoni, Daniele; Fiscon, Giulia; De Cola, Maria Cristina; Bertolazzi, Paola; Felici, Giovanni.

BMC Res Notes ; 7: 869, 2014 Dec 03.

Artículo en Inglés | MEDLINE | ID: mdl-25465386

RESUMEN

BACKGROUND: Next Generation Sequencing (NGS) machines extract from a biological sample a large number of short DNA fragments (reads). These reads are then used for several applications, e.g., sequence reconstruction, DNA assembly, gene expression profiling, mutation analysis. METHODS: We propose a method to evaluate the similarity between reads. This method does not rely on the alignment of the reads and it is based on the distance between the frequencies of their substrings of fixed dimensions (k-mers). We compare this alignment-free distance with the similarity measures derived from two alignment methods: Needleman-Wunsch and Blast. The comparison is based on a simple assumption: the most correct distance is obtained by knowing in advance the reference sequence. Therefore, we first align the reads on the original DNA sequence, compute the overlap between the aligned reads, and use this overlap as an ideal distance. We then verify how the alignment-free and the alignment-based distances reproduce this ideal distance. The ability of correctly reproducing the ideal distance is evaluated over samples of read pairs from Saccharomyces cerevisiae, Escherichia coli, and Homo sapiens. The comparison is based on the correctness of threshold predictors cross-validated over different samples. RESULTS: We exhibit experimental evidence that the proposed alignment-free distance is a potentially useful read-to-read distance measure and performs better than the more time consuming distances based on alignment. CONCLUSIONS: Alignment-free distances may be used effectively for reads comparison, and may provide a significant speed-up in several processes based on NGS sequencing (e.g., DNA assembly, reads classification).

Asunto(s)

Algoritmos , ADN Bacteriano/genética , ADN de Hongos/genética , Alineación de Secuencia/estadística & datos numéricos , Análisis de Secuencia de ADN/estadística & datos numéricos , ADN Bacteriano/química , ADN de Hongos/química , Escherichia coli/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Saccharomyces cerevisiae/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos

13.

BLOG 2.0: a software system for character-based species classification with DNA Barcode sequences. What it does, how to use it.

Weitschek, Emanuel; Van Velzen, Robin; Felici, Giovanni; Bertolazzi, Paola.

Mol Ecol Resour ; 13(6): 1043-6, 2013 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-23350601

RESUMEN

BLOG (Barcoding with LOGic) is a diagnostic and character-based DNA Barcode analysis method. Its aim is to classify specimens to species based on DNA Barcode sequences and on a supervised machine learning approach, using classification rules that compactly characterize species in terms of DNA Barcode locations of key diagnostic nucleotides. The BLOG 2.0 software, its fundamental modules, online/offline user interfaces and recent improvements are described. These improvements affect both methodology and software design, and lead to the availability of different releases on the website http://dmb.iasi.cnr.it/blog-downloads.php. Previous and new experimental tests show that BLOG 2.0 outperforms previous versions as well as other DNA Barcode analysis methods.

Asunto(s)

Código de Barras del ADN Taxonómico , Programas Informáticos , Clasificación/métodos , Especificidad de la Especie , Interfaz Usuario-Computador

14.

DNA barcoding of recently diverged species: relative performance of matching methods.

van Velzen, Robin; Weitschek, Emanuel; Felici, Giovanni; Bakker, Freek T.

PLoS One ; 7(1): e30490, 2012.

Artículo en Inglés | MEDLINE | ID: mdl-22272356

RESUMEN

Recently diverged species are challenging for identification, yet they are frequently of special interest scientifically as well as from a regulatory perspective. DNA barcoding has proven instrumental in species identification, especially in insects and vertebrates, but for the identification of recently diverged species it has been reported to be problematic in some cases. Problems are mostly due to incomplete lineage sorting or simply lack of a 'barcode gap' and probably related to large effective population size and/or low mutation rate. Our objective was to compare six methods in their ability to correctly identify recently diverged species with DNA barcodes: neighbor joining and parsimony (both tree-based), nearest neighbor and BLAST (similarity-based), and the diagnostic methods DNA-BAR, and BLOG. We analyzed simulated data assuming three different effective population sizes as well as three selected empirical data sets from published studies. Results show, as expected, that success rates are significantly lower for recently diverged species (â¼75%) than for older species (â¼97%) (P<0.00001). Similarity-based and diagnostic methods significantly outperform tree-based methods, when applied to simulated DNA barcode data (P<0.00001). The diagnostic method BLOG had highest correct query identification rate based on simulated (86.2%) as well as empirical data (93.1%), indicating that it is a consistently better method overall. Another advantage of BLOG is that it offers species-level information that can be used outside the realm of DNA barcoding, for instance in species description or molecular detection assays. Even though we can confirm that identification success based on DNA barcoding is generally high in our data, recently diverged species remain difficult to identify. Nevertheless, our results contribute to improved solutions for their accurate identification.

Asunto(s)

Biología Computacional/métodos , Código de Barras del ADN Taxonómico/métodos , Variación Genética , Filogenia , Animales , Secuencia de Bases , Simulación por Computador , Drosophila/clasificación , Drosophila/genética , Fabaceae/clasificación , Fabaceae/genética , Moluscos/clasificación , Moluscos/genética , Reproducibilidad de los Resultados , Homología de Secuencia de Ácido Nucleico , Especificidad de la Especie

15.

Gene expression biomarkers in the brain of a mouse model for Alzheimer's disease: mining of microarray data by logic classification and feature selection.

Arisi, Ivan; D'Onofrio, Mara; Brandi, Rossella; Felsani, Armando; Capsoni, Simona; Drovandi, Guido; Felici, Giovanni; Weitschek, Emanuel; Bertolazzi, Paola; Cattaneo, Antonino.

J Alzheimers Dis ; 24(4): 721-38, 2011.

Artículo en Inglés | MEDLINE | ID: mdl-21321390

RESUMEN

The identification of early and stage-specific biomarkers for Alzheimer's disease (AD) is critical, as the development of disease-modification therapies may depend on the discovery and validation of such markers. The identification of early reliable biomarkers depends on the development of new diagnostic algorithms to computationally exploit the information in large biological datasets. To identify potential biomarkers from mRNA expression profile data, we used the Logic Mining method for the unbiased analysis of a large microarray expression dataset from the anti-NGF AD11 transgenic mouse model. The gene expression profile of AD11 brain regions was investigated at different neurodegeneration stages by whole genome microarrays. A new implementation of the Logic Mining method was applied both to early (1-3 months) and late stage (6-15 months) expression data, coupled to standard statistical methods. A small number of "fingerprinting" formulas was isolated, encompassing mRNAs whose expression levels were able to discriminate between diseased and control mice. We selected three differential "signature" genes specific for the early stage (Nudt19, Arl16, Aph1b), five common to both groups (Slc15a2, Agpat5, Sox2ot, 2210015, D19Rik, Wdfy1), and seven specific for late stage (D14Ertd449, Tia1, Txnl4, 1810014B01Rik, Snhg3, Actl6a, Rnf25). We suggest these genes as potential biomarkers for the early and late stage of AD-like neurodegeneration in this model and conclude that Logic Mining is a powerful and reliable approach for large scale expression data analysis. Its application to large expression datasets from brain or peripheral human samples may facilitate the discovery of early and stage-specific AD biomarkers.

Asunto(s)

Enfermedad de Alzheimer/genética , Química Encefálica/genética , Minería de Datos/métodos , Modelos Animales de Enfermedad , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Enfermedad de Alzheimer/metabolismo , Enfermedad de Alzheimer/patología , Animales , Femenino , Marcadores Genéticos/genética , Ratones , Ratones Endogámicos C57BL , Ratones Transgénicos

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA