Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Mais filtros

Base de dados
Ano de publicação
Tipo de documento
Intervalo de ano de publicação
1.
Entropy (Basel) ; 22(12)2020 Dec 09.
Artigo em Inglês | MEDLINE | ID: mdl-33316972

RESUMO

The most common machine-learning methods solve supervised and unsupervised problems based on datasets where the problem's features belong to a numerical space. However, many problems often include data where numerical and categorical data coexist, which represents a challenge to manage them. To transform categorical data into a numeric form, preprocessing tasks are compulsory. Methods such as one-hot and feature-hashing have been the most widely used encoding approaches at the expense of a significant increase in the dimensionality of the dataset. This effect introduces unexpected challenges to deal with the overabundance of variables and/or noisy data. In this regard, in this paper we propose a novel encoding approach that maps mixed-type data into an information space using Shannon's Theory to model the amount of information contained in the original data. We evaluated our proposal with ten mixed-type datasets from the UCI repository and two datasets representing real-world problems obtaining promising results. For demonstrating the performance of our proposal, this was applied for preparing these datasets for classification, regression, and clustering tasks. We demonstrate that our encoding proposal is remarkably superior to one-hot and feature-hashing encoding in terms of memory efficiency. Our proposal can preserve the information conveyed by the original data.

2.
Pathol Int ; 61(1): 1-6, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21166936

RESUMO

Standard Guthrie cards have been widely used to collect blood samples from essentially all USA and Japanese neonates for newborn screening programs. Thus, archival blood spot samples are a unique and comprehensive resource for molecular pathology studies. However, the challenge in using these samples is the presumed low quantity and degraded quality of nucleic acids that can be isolated from these samples, particularly the RNA. Here, we report a new assay using Agilent 4x44K microarrays for acquiring genome-wide gene expression profiles from blood spots on Guthrie cards. Due to the small amount of RNA obtained from each sample, major modifications, such as concentrating and amplifying the RNA and using a different labeling procedure, were performed. Approximately 9000 expressed genes can be detected after normalization of data, an increment of 260% in detection power compared with previously reported cDNA microarrays made in-house with standard procedures. The correlation coefficients in technical and biological replicates were 0.92 and 0.85, respectively, confirming the reproducibility of this study. This new and comprehensive assay will add value to the utility of archival Guthrie cards (e.g. neonatal blood spot cards) and open new opportunities to molecular epidemiology, pathology, genomic, and diagnostic studies of perinatal diseases.


Assuntos
Coleta de Amostras Sanguíneas/métodos , Perfilação da Expressão Gênica/métodos , Triagem Neonatal/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Humanos , Recém-Nascido , Reação em Cadeia da Polimerase
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA