Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
2.
J Appl Genet ; 63(2): 361-368, 2022 May.
Artículo en Inglés | MEDLINE | ID: mdl-35322332

RESUMEN

Rare disease datasets are typically structured such that a small number of patients (cases) are represented by multidimensional feature vectors. In this report, we considered a rare disease, mucopolysaccharidosis (MPS). This disease is divided into 11 types and subtypes, depending on the genetic defect, type of deficient enzyme, and nature of accumulated glycosaminoglycan(s). Among them, 7 types are known as possibly neuronopathic and 4 are non-neuronopathic, and in the case of the former group, prediction of the course of the disease is crucial for patient's treatment and the management. Here, we have used transcriptomic data available for one patient from each MPS type/subtype. The approach to gene grouping considered by us was based on the minimization of the perceptron criterion in the form of convex and piecewise linear function (CPL). This approach allows designing complexes of linear classifiers on the basis of small samples of multivariate vectors. As a result, distinguishing neuronopathic and non-neuronopathic forms of MPS was possible on the basis of bioinformatic analysis of gene expression patterns where each MPS type was represented by only one patient. This approach can be potentially used also for assessing other features of patients suffering from rare diseases, for which large body of data (like transcriptomic data) is available from only one or a few representatives.


Asunto(s)
Mucopolisacaridosis , Enfermedades Raras , Análisis por Conglomerados , Humanos , Mucopolisacaridosis/genética , Transcriptoma/genética
3.
Nephrol Dial Transplant ; 31(12): 2033-2040, 2016 12.
Artículo en Inglés | MEDLINE | ID: mdl-27190335

RESUMEN

BACKGROUND: In complex diseases such as chronic kidney disease (CKD), the risk of clinical complications is determined by interactions between phenotypic and genotypic factors. However, clinical epidemiological studies rarely attempt to analyse the combined effect of large numbers of phenotype and genotype features. We have recently shown that the relaxed linear separability (RLS) model of feature selection can address such complex issues. Here, it is applied to identify risk factors for inflammation in CKD. METHODS: The RLS model was applied in 225 CKD stage 5 patients sampled in conjunction with dialysis initiation. Fifty-seven anthropometric or biochemical measurements and 79 genetic polymorphisms were entered into the model. The model was asked to identify phenotypes and genotypes that, when combined, could separate inflamed from non-inflamed patients. Inflammation was defined as a high-sensitivity C-reactive protein concentration above the median (5 mg/L). RESULTS: Among the 60 genotypic and phenotypic features predicting inflammation, 31 were genetic. Among the 10 strongest predictors of inflammation, 8 were single nucleotide polymorphisms located in the NAMPT, CIITA, BMP2 and PIK3CB genes, whereas fibrinogen and bone mineral density were the only phenotypic biomarkers. CONCLUSION: These results indicate a larger involvement of hereditary factors in inflammation than might have been expected and suggest that inclusion of genotype features in risk assessment studies is critical. The RLS model demonstrates that inflammation in CKD is determined by an extensive panel of factors and may prove to be a suitable tool that could enable a much-needed multifactorial approach as opposed to the commonly utilized single-factor analysis.


Asunto(s)
Biomarcadores/metabolismo , Densidad Ósea , Inflamación/diagnóstico , Polimorfismo de Nucleótido Simple/genética , Insuficiencia Renal Crónica/complicaciones , Adulto , Anciano , Femenino , Genotipo , Humanos , Inflamación/etiología , Inflamación/metabolismo , Masculino , Persona de Mediana Edad , Fenotipo , Factores de Riesgo , Adulto Joven
4.
Artif Intell Med ; 66: 63-71, 2016 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-26674595

RESUMEN

OBJECTIVE: Feature selection is a technique widely used in data mining. The aim is to select the best subset of features relevant to the problem being considered. In this paper, we consider feature selection for the classification of gene datasets. Gene data is usually composed of just a few dozen objects described by thousands of features. For this kind of data, it is easy to find a model that fits the learning data. However, it is not easy to find one that will simultaneously evaluate new data equally well as learning data. This overfitting issue is well known as regards classification and regression, but it also applies to feature selection. METHODS AND MATERIALS: We address this problem and investigate its importance in an empirical study of four feature selection methods applied to seven high-dimensional gene datasets. We chose datasets that are well studied in the literature-colon cancer, leukemia and breast cancer. All the datasets are characterized by a significant number of features and the presence of exactly two decision classes. The feature selection methods used are ReliefF, minimum redundancy maximum relevance, support vector machine-recursive feature elimination and relaxed linear separability. RESULTS: Our main result reveals the existence of positive feature selection bias in all 28 experiments (7 datasets and 4 feature selection methods). Bias was calculated as the difference between validation and test accuracies and ranges from 2.6% to as much as 41.67%. The validation accuracy (biased accuracy) was calculated on the same dataset on which the feature selection was performed. The test accuracy was calculated for data that was not used for feature selection (by so called external cross-validation). CONCLUSIONS: This work provides evidence that using the same dataset for feature selection and learning is not appropriate. We recommend using cross-validation for feature selection in order to reduce selection bias.


Asunto(s)
Biomarcadores de Tumor/genética , Biología Computacional/métodos , Minería de Datos/métodos , Bases de Datos Genéticas , Técnicas de Apoyo para la Decisión , Máquina de Vectores de Soporte , Algoritmos , Sesgo , Perfilación de la Expresión Génica/métodos , Regulación Neoplásica de la Expresión Génica , Humanos , Modelos Lineales , Análisis de Secuencia por Matrices de Oligonucleótidos , Reconocimiento de Normas Patrones Automatizadas , Reproducibilidad de los Resultados
5.
PLoS One ; 9(1): e86630, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24489753

RESUMEN

Identification of risk factors in patients with a particular disease can be analyzed in clinical data sets by using feature selection procedures of pattern recognition and data mining methods. The applicability of the relaxed linear separability (RLS) method of feature subset selection was checked for high-dimensional and mixed type (genetic and phenotypic) clinical data of patients with end-stage renal disease. The RLS method allowed for substantial reduction of the dimensionality through omitting redundant features while maintaining the linear separability of data sets of patients with high and low levels of an inflammatory biomarker. The synergy between genetic and phenotypic features in differentiation between these two subgroups was demonstrated.


Asunto(s)
Algoritmos , Inflamación/genética , Inflamación/patología , Diálisis Renal , Humanos , Fenotipo , Reproducibilidad de los Resultados
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA