Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
2.
J Appl Genet ; 63(2): 361-368, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-35322332

RESUMO

Rare disease datasets are typically structured such that a small number of patients (cases) are represented by multidimensional feature vectors. In this report, we considered a rare disease, mucopolysaccharidosis (MPS). This disease is divided into 11 types and subtypes, depending on the genetic defect, type of deficient enzyme, and nature of accumulated glycosaminoglycan(s). Among them, 7 types are known as possibly neuronopathic and 4 are non-neuronopathic, and in the case of the former group, prediction of the course of the disease is crucial for patient's treatment and the management. Here, we have used transcriptomic data available for one patient from each MPS type/subtype. The approach to gene grouping considered by us was based on the minimization of the perceptron criterion in the form of convex and piecewise linear function (CPL). This approach allows designing complexes of linear classifiers on the basis of small samples of multivariate vectors. As a result, distinguishing neuronopathic and non-neuronopathic forms of MPS was possible on the basis of bioinformatic analysis of gene expression patterns where each MPS type was represented by only one patient. This approach can be potentially used also for assessing other features of patients suffering from rare diseases, for which large body of data (like transcriptomic data) is available from only one or a few representatives.


Assuntos
Mucopolissacaridoses , Doenças Raras , Análise por Conglomerados , Humanos , Mucopolissacaridoses/genética , Transcriptoma/genética
3.
Nephrol Dial Transplant ; 31(12): 2033-2040, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-27190335

RESUMO

BACKGROUND: In complex diseases such as chronic kidney disease (CKD), the risk of clinical complications is determined by interactions between phenotypic and genotypic factors. However, clinical epidemiological studies rarely attempt to analyse the combined effect of large numbers of phenotype and genotype features. We have recently shown that the relaxed linear separability (RLS) model of feature selection can address such complex issues. Here, it is applied to identify risk factors for inflammation in CKD. METHODS: The RLS model was applied in 225 CKD stage 5 patients sampled in conjunction with dialysis initiation. Fifty-seven anthropometric or biochemical measurements and 79 genetic polymorphisms were entered into the model. The model was asked to identify phenotypes and genotypes that, when combined, could separate inflamed from non-inflamed patients. Inflammation was defined as a high-sensitivity C-reactive protein concentration above the median (5 mg/L). RESULTS: Among the 60 genotypic and phenotypic features predicting inflammation, 31 were genetic. Among the 10 strongest predictors of inflammation, 8 were single nucleotide polymorphisms located in the NAMPT, CIITA, BMP2 and PIK3CB genes, whereas fibrinogen and bone mineral density were the only phenotypic biomarkers. CONCLUSION: These results indicate a larger involvement of hereditary factors in inflammation than might have been expected and suggest that inclusion of genotype features in risk assessment studies is critical. The RLS model demonstrates that inflammation in CKD is determined by an extensive panel of factors and may prove to be a suitable tool that could enable a much-needed multifactorial approach as opposed to the commonly utilized single-factor analysis.


Assuntos
Biomarcadores/metabolismo , Densidade Óssea , Inflamação/diagnóstico , Polimorfismo de Nucleotídeo Único/genética , Insuficiência Renal Crônica/complicações , Adulto , Idoso , Feminino , Genótipo , Humanos , Inflamação/etiologia , Inflamação/metabolismo , Masculino , Pessoa de Meia-Idade , Fenótipo , Fatores de Risco , Adulto Jovem
4.
Artif Intell Med ; 66: 63-71, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26674595

RESUMO

OBJECTIVE: Feature selection is a technique widely used in data mining. The aim is to select the best subset of features relevant to the problem being considered. In this paper, we consider feature selection for the classification of gene datasets. Gene data is usually composed of just a few dozen objects described by thousands of features. For this kind of data, it is easy to find a model that fits the learning data. However, it is not easy to find one that will simultaneously evaluate new data equally well as learning data. This overfitting issue is well known as regards classification and regression, but it also applies to feature selection. METHODS AND MATERIALS: We address this problem and investigate its importance in an empirical study of four feature selection methods applied to seven high-dimensional gene datasets. We chose datasets that are well studied in the literature-colon cancer, leukemia and breast cancer. All the datasets are characterized by a significant number of features and the presence of exactly two decision classes. The feature selection methods used are ReliefF, minimum redundancy maximum relevance, support vector machine-recursive feature elimination and relaxed linear separability. RESULTS: Our main result reveals the existence of positive feature selection bias in all 28 experiments (7 datasets and 4 feature selection methods). Bias was calculated as the difference between validation and test accuracies and ranges from 2.6% to as much as 41.67%. The validation accuracy (biased accuracy) was calculated on the same dataset on which the feature selection was performed. The test accuracy was calculated for data that was not used for feature selection (by so called external cross-validation). CONCLUSIONS: This work provides evidence that using the same dataset for feature selection and learning is not appropriate. We recommend using cross-validation for feature selection in order to reduce selection bias.


Assuntos
Biomarcadores Tumorais/genética , Biologia Computacional/métodos , Mineração de Dados/métodos , Bases de Dados Genéticas , Técnicas de Apoio para a Decisão , Máquina de Vetores de Suporte , Algoritmos , Viés , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica , Humanos , Modelos Lineares , Análise de Sequência com Séries de Oligonucleotídeos , Reconhecimento Automatizado de Padrão , Reprodutibilidade dos Testes
5.
PLoS One ; 9(1): e86630, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24489753

RESUMO

Identification of risk factors in patients with a particular disease can be analyzed in clinical data sets by using feature selection procedures of pattern recognition and data mining methods. The applicability of the relaxed linear separability (RLS) method of feature subset selection was checked for high-dimensional and mixed type (genetic and phenotypic) clinical data of patients with end-stage renal disease. The RLS method allowed for substantial reduction of the dimensionality through omitting redundant features while maintaining the linear separability of data sets of patients with high and low levels of an inflammatory biomarker. The synergy between genetic and phenotypic features in differentiation between these two subgroups was demonstrated.


Assuntos
Algoritmos , Inflamação/genética , Inflamação/patologia , Diálise Renal , Humanos , Fenótipo , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...