Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 41
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Hum Genet ; 141(9): 1515-1528, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34862561

RESUMO

Genetic data have become increasingly complex within the past decade, leading researchers to pursue increasingly complex questions, such as those involving epistatic interactions and protein prediction. Traditional methods are ill-suited to answer these questions, but machine learning (ML) techniques offer an alternative solution. ML algorithms are commonly used in genetics to predict or classify subjects, but some methods evaluate which features (variables) are responsible for creating a good prediction; this is called feature importance. This is critical in genetics, as researchers are often interested in which features (e.g., SNP genotype or environmental exposure) are responsible for a good prediction. This allows for the deeper analysis beyond simple prediction, including the determination of risk factors associated with a given phenotype. Feature importance further permits the researcher to peer inside the black box of many ML algorithms to see how they work and which features are critical in informing a good prediction. This review focuses on ML methods that provide feature importance metrics for the analysis of genetic data. Five major categories of ML algorithms: k nearest neighbors, artificial neural networks, deep learning, support vector machines, and random forests are described. The review ends with a discussion of how to choose the best machine for a data set. This review will be particularly useful for genetic researchers looking to use ML methods to answer questions beyond basic prediction and classification.


Assuntos
Aprendizado de Máquina , Máquina de Vetores de Suporte , Algoritmos , Humanos , Redes Neurais de Computação
2.
Genome Res ; 24(7): 1209-23, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24985915

RESUMO

Accurate gene model annotation of reference genomes is critical for making them useful. The modENCODE project has improved the D. melanogaster genome annotation by using deep and diverse high-throughput data. Since transcriptional activity that has been evolutionarily conserved is likely to have an advantageous function, we have performed large-scale interspecific comparisons to increase confidence in predicted annotations. To support comparative genomics, we filled in divergence gaps in the Drosophila phylogeny by generating draft genomes for eight new species. For comparative transcriptome analysis, we generated mRNA expression profiles on 81 samples from multiple tissues and developmental stages of 15 Drosophila species, and we performed cap analysis of gene expression in D. melanogaster and D. pseudoobscura. We also describe conservation of four distinct core promoter structures composed of combinations of elements at three positions. Overall, each type of genomic feature shows a characteristic divergence rate relative to neutral models, highlighting the value of multispecies alignment in annotating a target genome that should prove useful in the annotation of other high priority genomes, especially human and other mammalian genomes that are rich in noncoding sequences. We report that the vast majority of elements in the annotation are evolutionarily conserved, indicating that the annotation will be an important springboard for functional genetic testing by the Drosophila community.


Assuntos
Biologia Computacional/métodos , Drosophila melanogaster/genética , Perfilação da Expressão Gênica , Anotação de Sequência Molecular , Transcriptoma , Animais , Análise por Conglomerados , Drosophila melanogaster/classificação , Evolução Molecular , Éxons , Feminino , Genoma de Inseto , Humanos , Masculino , Motivos de Nucleotídeos , Filogenia , Matrizes de Pontuação de Posição Específica , Regiões Promotoras Genéticas , Edição de RNA , Sítios de Splice de RNA , Splicing de RNA , Reprodutibilidade dos Testes , Sítio de Iniciação de Transcrição
3.
Genet Epidemiol ; 38(3): 209-19, 2014 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-24535726

RESUMO

As the cost of genome-wide genotyping decreases, the number of genome-wide association studies (GWAS) has increased considerably. However, the transition from GWAS findings to the underlying biology of various phenotypes remains challenging. As a result, due to its system-level interpretability, pathway analysis has become a popular tool for gaining insights on the underlying biology from high-throughput genetic association data. In pathway analyses, gene sets representing particular biological processes are tested for significant associations with a given phenotype. Most existing pathway analysis approaches rely on single-marker statistics and assume that pathways are independent of each other. As biological systems are driven by complex biomolecular interactions, embracing the complex relationships between single-nucleotide polymorphisms (SNPs) and pathways needs to be addressed. To incorporate the complexity of gene-gene interactions and pathway-pathway relationships, we propose a system-level pathway analysis approach, synthetic feature random forest (SF-RF), which is designed to detect pathway-phenotype associations without making assumptions about the relationships among SNPs or pathways. In our approach, the genotypes of SNPs in a particular pathway are aggregated into a synthetic feature representing that pathway via Random Forest (RF). Multiple synthetic features are analyzed using RF simultaneously and the significance of a synthetic feature indicates the significance of the corresponding pathway. We further complement SF-RF with pathway-based Statistical Epistasis Network (SEN) analysis that evaluates interactions among pathways. By investigating the pathway SEN, we hope to gain additional insights into the genetic mechanisms contributing to the pathway-phenotype association. We apply SF-RF to a population-based genetic study of bladder cancer and further investigate the mechanisms that help explain the pathway-phenotype associations using SEN. The bladder cancer associated pathways we found are both consistent with existing biological knowledge and reveal novel and plausible hypotheses for future biological validations.


Assuntos
Epistasia Genética/genética , Modelos Genéticos , Fenótipo , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Modelos Logísticos , Polimorfismo de Nucleotídeo Único/genética , Reprodutibilidade dos Testes , Neoplasias da Bexiga Urinária/genética
4.
Behav Res Methods ; 47(1): 235-50, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-24706080

RESUMO

The System for Continuous Observation of Rodents in Home-cage Environment (SCORHE) was developed to demonstrate the viability of compact and scalable designs for quantifying activity levels and behavior patterns for mice housed within a commercial ventilated cage rack. The SCORHE in-rack design provides day- and night-time monitoring with the consistency and convenience of the home-cage environment. The dual-video camera custom hardware design makes efficient use of space, does not require home-cage modification, and is animal-facility user-friendly. Given the system's low cost and suitability for use in existing vivariums without modification to the animal husbandry procedures or housing setup, SCORHE opens up the potential for the wider use of automated video monitoring in animal facilities. SCORHE's potential uses include day-to-day health monitoring, as well as advanced behavioral screening and ethology experiments, ranging from the assessment of the short- and long-term effects of experimental cancer treatments to the evaluation of mouse models. When used for phenotyping and animal model studies, SCORHE aims to eliminate the concerns often associated with many mouse-monitoring methods, such as circadian rhythm disruption, acclimation periods, lack of night-time measurements, and short monitoring periods. Custom software integrates two video streams to extract several mouse activity and behavior measures. Studies comparing the activity levels of ABCB5 knockout and HMGN1 overexpresser mice with their respective C57BL parental strains demonstrate SCORHE's efficacy in characterizing the activity profiles for singly- and doubly-housed mice. Another study was conducted to demonstrate the ability of SCORHE to detect a change in activity resulting from administering a sedative.


Assuntos
Comportamento Animal/efeitos dos fármacos , Abrigo para Animais , Hipnóticos e Sedativos/farmacologia , Gravação em Vídeo/métodos , Adaptação Psicológica , Animais , Ritmo Circadiano , Desenho Assistido por Computador , Camundongos , Camundongos Endogâmicos C57BL , Modelos Animais
5.
Phys Lett A ; 378(35): 2611-2613, 2014 Jul 11.
Artigo em Inglês | MEDLINE | ID: mdl-25197159

RESUMO

We show that a reduced form of the structural requirements for deterministic hidden variables used in Bell-Kochen-Specker theorems is already sufficient for the no-go results. Those requirements are captured by the following principle: an observable takes a spectral value x if and only if the spectral projector associated with x takes the value 1. We show that the "only if" part of this condition suffices. The proof identifies an important structural feature behind the no-go results; namely, if at least one projector is assigned the value 1 in any resolution of the identity, then at most one is.

6.
Biom J ; 56(4): 534-63, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24478134

RESUMO

Probability estimation for binary and multicategory outcome using logistic and multinomial logistic regression has a long-standing tradition in biostatistics. However, biases may occur if the model is misspecified. In contrast, outcome probabilities for individuals can be estimated consistently with machine learning approaches, including k-nearest neighbors (k-NN), bagged nearest neighbors (b-NN), random forests (RF), and support vector machines (SVM). Because machine learning methods are rarely used by applied biostatisticians, the primary goal of this paper is to explain the concept of probability estimation with these methods and to summarize recent theoretical findings. Probability estimation in k-NN, b-NN, and RF can be embedded into the class of nonparametric regression learning machines; therefore, we start with the construction of nonparametric regression estimates and review results on consistency and rates of convergence. In SVMs, outcome probabilities for individuals are estimated consistently by repeatedly solving classification problems. For SVMs we review classification problem and then dichotomous probability estimation. Next we extend the algorithms for estimating probabilities using k-NN, b-NN, and RF to multicategory outcomes and discuss approaches for the multicategory probability estimation problem using SVM. In simulation studies for dichotomous and multicategory dependent variables we demonstrate the general validity of the machine learning methods and compare it with logistic regression. However, each method fails in at least one simulation scenario. We conclude with a discussion of the failures and give recommendations for selecting and tuning the methods. Applications to real data and example code are provided in a companion article (doi:10.1002/bimj.201300077).


Assuntos
Inteligência Artificial , Probabilidade , Modelos Teóricos , Análise de Regressão , Estatísticas não Paramétricas , Máquina de Vetores de Suporte
7.
Nat Genet ; 32(1): 166-74, 2002 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-12185365

RESUMO

Retroviral insertional mutagenesis in BXH2 and AKXD mice induces a high incidence of myeloid leukemia and B- and T-cell lymphoma, respectively. The retroviral integration sites (RISs) in these tumors thus provide powerful genetic tags for the discovery of genes involved in cancer. Here we report the first large-scale use of retroviral tagging for cancer gene discovery in the post-genome era. Using high throughput inverse PCR, we cloned and analyzed the sequences of 884 RISs from a tumor panel composed primarily of B-cell lymphomas. We then compared these sequences, and another 415 RIS sequences previously cloned from BXH2 myeloid leukemias and from a few AKXD lymphomas, against the recently assembled mouse genome sequence. These studies identified 152 loci that are targets of retroviral integration in more than one tumor (common retroviral integration sites, CISs) and therefore likely to encode a cancer gene. Thirty-six CISs encode genes that are known or predicted to be genes involved in human cancer or their homologs, whereas others encode candidate genes that have not yet been examined for a role in human cancer. Our studies demonstrate the power of retroviral tagging for cancer gene discovery in the post-genome era and indicate a largely unrecognized complexity in mouse and presumably human cancer.


Assuntos
Leucemia Mieloide/genética , Linfoma de Células B/genética , Retroviridae/genética , Integração Viral/genética , Animais , Genes Supressores de Tumor , Humanos , Camundongos , Camundongos Endogâmicos , Oncogenes , Reação em Cadeia da Polimerase , Provírus/genética
8.
Genet Epidemiol ; 35 Suppl 1: S5-11, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-22128059

RESUMO

Genetics Analysis Workshop 17 provided common and rare genetic variants from exome sequencing data and simulated binary and quantitative traits in 200 replicates. We provide a brief review of the machine learning and regression-based methods used in the analyses of these data. Several regression and machine learning methods were used to address different problems inherent in the analyses of these data, which are high-dimension, low-sample-size data typical of many genetic association studies. Unsupervised methods, such as cluster analysis, were used for data segmentation and, subset selection. Supervised learning methods, which include regression-based methods (e.g., generalized linear models, logic regression, and regularized regression) and tree-based methods (e.g., decision trees and random forests), were used for variable selection (selecting genetic and clinical features most associated or predictive of outcome) and prediction (developing models using common and rare genetic variants to accurately predict outcome), with the outcome being case-control status or quantitative trait value. We include a discussion of cross-validation for model selection and assessment, and a description of available software resources for these methods.


Assuntos
Epidemiologia Molecular/métodos , Análise de Regressão , Algoritmos , Inteligência Artificial , Análise por Conglomerados , Congressos como Assunto , Árvores de Decisões , Genética , Humanos
9.
J Am Assoc Lab Anim Sci ; 60(3): 298-305, 2021 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-33653438

RESUMO

Over the past 2 decades, zebrafish, Danio rerio, have become a mainstream laboratory animal model, yet zebrafish husbandry practices remain far from standardized. Feeding protocols play a critical role in the health, wellbeing, and productivity of zebrafish laboratories, yet they vary significantly between facilities. In this study, we compared our current feeding protocol for juvenile zebrafish (30 dpf to 75 dpf), a 3:1mixture of fish flake and freeze-dried krill fed twice per day with live artemia twice per day (FKA), to a diet of Gemma Micro 300 fed once per day with live artemia once per day (GMA). Our results showed that juvenile EK wild-type zebrafish fed GMA were longer and heavier than juveniles fed FKA. As compared with FKA-fed juveniles, fish fed GMA as juveniles showed better reproductive performance as measured by spawning success, fertilization rate, and clutch size. As adults, fish from both feeding protocols were acclimated to our standard adult feeding protocol, and the long-term effects of juvenile diet were assessed. At 2 y of age, the groups showed no difference in mortality or fecundity. Reproductive performance is a crucial aspect of zebrafish research, as much of the research focuses on the developing embryo. Here we show that switching juvenile zebrafish from a mixture of flake and krill to Gemma Micro 300 improves reproductive performance, even with fewer feedings of live artemia, thus simplifying husbandry practices.


Assuntos
Reprodução , Peixe-Zebra , Ração Animal , Animais , Artemia , Dieta/veterinária , Fertilidade
10.
BMC Bioinformatics ; 11: 110, 2010 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-20187966

RESUMO

BACKGROUND: Random forests (RF) have been increasingly used in applications such as genome-wide association and microarray studies where predictor correlation is frequently observed. Recent works on permutation-based variable importance measures (VIMs) used in RF have come to apparently contradictory conclusions. We present an extended simulation study to synthesize results. RESULTS: In the case when both predictor correlation was present and predictors were associated with the outcome (HA), the unconditional RF VIM attributed a higher share of importance to correlated predictors, while under the null hypothesis that no predictors are associated with the outcome (H0) the unconditional RF VIM was unbiased. Conditional VIMs showed a decrease in VIM values for correlated predictors versus the unconditional VIMs under HA and was unbiased under H0. Scaled VIMs were clearly biased under HA and H0. CONCLUSIONS: Unconditional unscaled VIMs are a computationally tractable choice for large datasets and are unbiased under the null hypothesis. Whether the observed increased VIMs for correlated predictors may be considered a "bias" - because they do not directly reflect the coefficients in the generating model - or if it is a beneficial attribute of these VIMs is dependent on the application. For example, in genetic association studies, where correlation between markers may help to localize the functionally relevant variant, the increased importance of correlated predictors may be an advantage. On the other hand, we show examples where this increased importance may result in spurious signals.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Algoritmos , Genoma
11.
Bioinformatics ; 25(15): 1884-90, 2009 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-19460890

RESUMO

MOTIVATION: The advent of high-throughput genomics has produced studies with large numbers of predictors (e.g. genome-wide association, microarray studies). Machine learning algorithms (MLAs) are a computationally efficient way to identify phenotype-associated variables in high-dimensional data. There are important results from mathematical theory and numerous practical results documenting their value. One attractive feature of MLAs is that many operate in a fully multivariate environment, allowing for small-importance variables to be included when they act cooperatively. However, certain properties of MLAs under conditions common in genomic-related data have not been well-studied--in particular, correlations among predictors pose a problem. RESULTS: Using extensive simulation, we showed considering correlation within predictors is crucial in making valid inferences using variable importance measures (VIMs) from three MLAs: random forest (RF), conditional inference forest (CIF) and Monte Carlo logic regression (MCLR). Using a case-control illustration, we showed that the RF VIMs--even permutation-based--were less able to detect association than other algorithms at effect sizes encountered in complex disease studies. This reduction occurred when 'causal' predictors were correlated with other predictors, and was sharpest when RF tree building used the Gini index. Indeed, RF Gini VIMs are biased under correlation, dependent on predictor correlation strength/number and over-trained to random fluctuations in data when tree terminal node size was small. Permutation-based VIM distributions were less variable for correlated predictors and are unbiased, thus may be preferred when predictors are correlated. MLAs are a powerful tool for high-dimensional data analysis, but well-considered use of algorithms is necessary to draw valid conclusions. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Inteligência Artificial , Genoma , Genômica/métodos , Estudo de Associação Genômica Ampla , Análise de Sequência com Séries de Oligonucleotídeos , Fenótipo , Polimorfismo de Nucleotídeo Único
12.
Medicine (Baltimore) ; 85(2): 111-127, 2006 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-16609350

RESUMO

The idiopathic inflammatory myopathies (IIM) are systemic connective tissue diseases defined by chronic muscle inflammation and weakness associated with autoimmunity. We have performed low to high resolution molecular typing to assess the genetic variability of major histocompatibility complex loci (HLA-A, -B, -Cw, -DRB1, and -DQA1) in a large population of European American patients with IIM (n = 571) representing the major myositis autoantibody groups. We established that alleles of the 8.1 ancestral haplotype (8.1 AH) are important risk factors for the development of IIM in patients producing anti-synthetase/anti-Jo-1, -La, -PM/Scl, and -Ro autoantibodies. Moreover, a random forests classification analysis suggested that 8.1 AH-associated alleles B*0801 and DRB1*0301 are the principal HLA risk markers. In addition, we have identified several novel HLA susceptibility factors associated distinctively with particular myositis-specific (MSA) and myositis-associated autoantibody (MAA) groups of the IIM. IIM patients with anti-PL-7 (anti-threonyl-tRNA synthetase) autoantibodies have a unique HLA Class I risk allele, Cw*0304 (pcorr = 0.046), and lack the 8.1 AH markers associated with other anti-synthetase autoantibodies (for example, anti-Jo-1 and anti-PL-12). In addition, HLA-B*5001 and DQA1*0104 are novel potential risk factors among anti-signal recognition particle autoantibody-positive IIM patients (pcorr = 0.024 and p = 0.010, respectively). Among those patients with MAA, HLA DRB1*11 and DQA1*06 alleles were identified as risk factors for myositis patients with anti-Ku (pcorr = 0.041) and anti-La (pcorr = 0.023) autoantibodies, respectively. Amino acid sequence analysis of the HLA DRB1 third hypervariable region identified a consensus motif, 70D (hydrophilic)/71R (basic)/74A (hydrophobic), conferring protection among patients producing anti-synthetase/anti-Jo-1 and -PM/Scl autoantibodies. Together, these data demonstrate that HLA signatures, comprising both risk and protective alleles or motifs, distinguish IIM patients with different myositis autoantibodies and may have diagnostic and pathogenic implications. Variations in associated polymorphisms for these immune response genes may reflect divergent pathogenic mechanisms and/or responses to unique environmental triggers in different groups of subjects resulting in the heterogeneous syndromes of the IIM.


Assuntos
Autoanticorpos/análise , Antígenos HLA/genética , Antígenos HLA/imunologia , Miosite/genética , Miosite/imunologia , Alelos , Motivos de Aminoácidos , Doenças Autoimunes/genética , Doenças Autoimunes/imunologia , Doenças Autoimunes/patologia , Estudos de Casos e Controles , Feminino , Predisposição Genética para Doença , Antígenos HLA/classificação , Antígenos HLA-A/genética , Antígenos HLA-A/imunologia , Antígenos HLA-B/genética , Antígenos HLA-B/imunologia , Antígenos HLA-C/genética , Antígenos HLA-C/imunologia , Antígenos HLA-DQ/genética , Antígenos HLA-DQ/imunologia , Cadeias alfa de HLA-DQ , Antígenos HLA-DR/genética , Antígenos HLA-DR/imunologia , Cadeias HLA-DRB1 , Haplótipos , Humanos , Imunidade Inata , Masculino , Miosite/patologia , Ligação Proteica , Fatores de Risco , População Branca/genética
13.
Eur Cytokine Netw ; 17(2): 90-7, 2006 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-16840027

RESUMO

PFAPA syndrome is characterized by periodic episodes of high fever, aphthous stomatitis, pharyngitis, and/or cervical adenitis. It is of unknown etiology and manifests usually before 5 years of age. We determined serum and intracellular cytokine levels in six PFAPA patients (4 males, 2 females, mean age 8 years (+/- 1.2 SEM), range 4-13) during the symptom-free period as well as 6-12 hours and 18-24 hours after fever onset. Values were compared to age-matched, healthy controls. Febrile PFAPA attacks led to a significant increase in IL-6 and IFN-gamma serum concentrations compared to symptom-free periods and to controls, with IL-1beta, TNF-alpha and IL-12p70 levels being significantly higher than in controls. Lymphocytic IFN-gamma and CD8+ IL-2 production was consistently significantly elevated compared to healthy children. During the asymptomatic period, serum concentrations of IL-1beta, IL-6, TNF-alpha and IL-12p70 were significantly increased compared to controls. Intracellular TNF-alpha synthesis was not elevated at any time point. Soluble TNFRp55 levels were even lower in between febrile episodes, reaching values comparable to controls during attacks, whereas soluble TNFRp75 levels increased during attacks compared to healthy children. Anti-inflammatory IL-4 in serum was at all times lower in PFAPA patients compared to controls with no difference in levels of intracellular IL-4 and IL-10 or serum IL-10. The observed increase of pro-inflammatory mediators, even between febrile attacks, suggests a dysregulation of the immune response in PFAPA syndrome, with continuous pro-inflammatory cytokine activation and a reduced anti-inflammatory response.


Assuntos
Citocinas/biossíntese , Febre/metabolismo , Linfadenite/metabolismo , Faringite/metabolismo , Estomatite Aftosa/metabolismo , Adolescente , Criança , Pré-Escolar , Citocinas/sangue , Feminino , Humanos , Inflamação/sangue , Inflamação/metabolismo , Interleucina-10/biossíntese , Interleucina-10/sangue , Interleucina-4/biossíntese , Interleucina-4/sangue , Masculino , Síndrome
14.
BioData Min ; 9: 14, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27053949

RESUMO

BACKGROUND: Identifying gene-gene interactions is essential to understand disease susceptibility and to detect genetic architectures underlying complex diseases. Here, we aimed at developing a permutation-based methodology relying on a machine learning method, random forest (RF), to detect gene-gene interactions. Our approach called permuted random forest (pRF) which identified the top interacting single nucleotide polymorphism (SNP) pairs by estimating how much the power of a random forest classification model is influenced by removing pairwise interactions. RESULTS: We systematically tested our approach on a simulation study with datasets possessing various genetic constraints including heritability, number of SNPs, sample size, etc. Our methodology showed high success rates for detecting the interaction SNP pair. We also applied our approach to two bladder cancer datasets, which showed consistent results with well-studied methodologies, such as multifactor dimensionality reduction (MDR) and statistical epistasis network (SEN). Furthermore, we built permuted random forest networks (PRFN), in which we used nodes to represent SNPs and edges to indicate interactions. CONCLUSIONS: We successfully developed a scale-invariant methodology to detect pure gene-gene interactions based on permutation strategies and the machine learning method random forest. This methodology showed great potential to be used for detecting gene-gene interactions to study underlying genetic architectures in a scale-free way, which could be benefit to uncover the complex disease mechanisms.

15.
BioData Min ; 9: 7, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26839594

RESUMO

BACKGROUND: Machine learning methods and in particular random forests (RFs) are a promising alternative to standard single SNP analyses in genome-wide association studies (GWAS). RFs provide variable importance measures (VIMs) to rank SNPs according to their predictive power. However, in contrast to the established genome-wide significance threshold, no clear criteria exist to determine how many SNPs should be selected for downstream analyses. RESULTS: We propose a new variable selection approach, recurrent relative variable importance measure (r2VIM). Importance values are calculated relative to an observed minimal importance score for several runs of RF and only SNPs with large relative VIMs in all of the runs are selected as important. Evaluations on simulated GWAS data show that the new method controls the number of false-positives under the null hypothesis. Under a simple alternative hypothesis with several independent main effects it is only slightly less powerful than logistic regression. In an experimental GWAS data set, the same strong signal is identified while the approach selects none of the SNPs in an underpowered GWAS. CONCLUSIONS: The novel variable selection method r2VIM is a promising extension to standard RF for objectively selecting relevant SNPs in GWAS while controlling the number of false-positive results.

16.
Medicine (Baltimore) ; 84(6): 338-349, 2005 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-16267409

RESUMO

The idiopathic inflammatory myopathies (IIM) are systemic connective tissue diseases in which autoimmune pathology is suspected to promote chronic muscle inflammation and weakness. We have performed low to high resolution genotyping to characterize the allelic profiles of HLA-A, -B, -Cw, -DRB1, and -DQA1 loci in a large population of North American Caucasian patients with IIM representing the major clinicopathologic groups (n = 571). We confirmed that alleles of the 8.1 ancestral haplotype were important risk markers for the development of IIM, and a random forests classification analysis suggested that within this haplotype, HLA-B*0801, DRB1*0301 and/ or closely linked genes are the principal HLA risk factors. In addition, we identified several novel HLA factors associated distinctly with 1 or more clinicopathologic groups of IIM. The DQA1*0201 allele and associated peptide-binding motif (KLPLFHRL) were exclusive protective factors for the CD8+ T cell-mediated IIM forms of polymyositis (PM) and inclusion body myositis (IBM) (pc < 0.005). In contrast, HLA-A*68 alleles were significant risk factors for dermatomyositis (DM) (pc = 0.0021), a distinct clinical group thought to involve a humorally mediated immunopathology. While the DQA1*0301 allele was detected as a possible risk factor for IIM, PM, and DM patients (p < 0.05), DQA1*03 alleles were protective factors for IBM (pc = 0.0002). Myositis associated with malignancies was the most distinctive group of IIM wherein HLA Class I alleles were the only identifiable susceptibility factors and a shared HLA-Cw peptide-binding motif (AGSHTLQWM) conferred significant risk (pc = 0.019). Together, these data suggest that HLA susceptibility markers distinguish different myositis phenotypes with divergent pathogenetic mechanisms. These variations in associated HLA polymorphisms may reflect responses to unique environmental triggers resulting in the tissue pathospecificity and distinct clinicopathologic syndromes of the IIM.


Assuntos
Miosite/genética , População Branca/genética , Adulto , Alelos , Biomarcadores , Estudos de Casos e Controles , Suscetibilidade a Doenças , Feminino , Variação Genética , Antígenos HLA-B , Antígenos HLA-C , Antígenos HLA-DQ , Antígenos HLA-DR , Humanos , Inflamação/genética , Inflamação/imunologia , Masculino , Miosite/imunologia , Polimorfismo Genético , Medição de Risco , Fatores de Risco
17.
Acad Radiol ; 12(4): 479-86, 2005 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-15831422

RESUMO

RATIONALE AND OBJECTIVES: A new classification scheme for the computer-aided detection of colonic polyps in computed tomographic colonography is proposed. MATERIALS AND METHODS: The scheme involves an ensemble of support vector machines (SVMs) for classification, a smoothed leave-one-out (SLOO) cross-validation method for obtaining error estimates, and use of a bootstrap aggregation method for training and model selection. Our use of an ensemble of SVM classifiers with bagging (bootstrap aggregation), built on different feature subsets, is intended to improve classification performance compared with single SVMs and reduce the number of false-positive detections. The bootstrap-based model-selection technique is used for tuning SVM parameters. In our first experiment, two independent data sets were used: the first, for feature and model selection, and the second, for testing to evaluate the generalizability of our model. In the second experiment, the test set that contained higher resolution data was used for training and testing (using the SLOO method) to compare SVM committee and single SVM performance. RESULTS: The overall sensitivity on independent test set was 75%, with 1.5 false-positive detections/study, compared with 76%-78% sensitivity and 4.5 false-positive detections/study estimated using the SLOO method on the training set. The sensitivity of the SVM ensemble retrained on the former test set estimated using the SLOO method was 81%, which is 7%-10% greater than the sensitivity of a single SVM. The number of false-positive detections per study was 2.6, a 1.5 times reduction compared with a single SVM. CONCLUSION: Training an SVM ensemble on one data set and testing it on the independent data has shown that the SVM committee classification method has good generalizability and achieves high sensitivity and a low false-positive rate. The model selection and improved error estimation method are effective for computer-aided polyp detection.


Assuntos
Pólipos do Colo/diagnóstico por imagem , Colonografia Tomográfica Computadorizada , Redes Neurais de Computação , Algoritmos , Pólipos do Colo/classificação , Colonoscopia/métodos , Diagnóstico por Computador , Reações Falso-Positivas , Humanos , Sensibilidade e Especificidade
18.
Neuropsychology ; 17(3): 402-9, 2003 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-12959506

RESUMO

Patients with schizophrenia display numerous memory impairments. Examination of autobiographical memory distribution across the life span can constrain theories of how schizophrenia affects memory. Previously, schizophrenic patients were shown to produce fewer memories from early adulthood than from childhood or the recent past (A. Feinstein, T. E. Goldberg, B. Nowlin, & D. R. Weinberger, 1998), this temporal paucity corresponding with illness onset. The current study examined this issue further using a different (noncued) method. Age-matched schizophrenic patients (n = 21) and controls (n = 21) were to freely generate 50 episodes, after which they dated these memories. Patients generated fewer memories than did controls, especially from the recent decade. When the overall lower production of memories was controlled for, the groups displayed equivalent recency effects. It was concluded that patients' paucity of memories generated from the recent decade reflects encoding or acquisition problems, which may be associated with the illness period.


Assuntos
Amnésia Retrógrada/psicologia , Esquizofrenia , Psicologia do Esquizofrênico , Adulto , Autobiografias como Assunto , Encéfalo/fisiopatologia , Estudos de Casos e Controles , Feminino , Humanos , Masculino , Memória , Transtornos da Memória/psicologia , Esquizofrenia/fisiopatologia
19.
Med Phys ; 30(1): 52-60, 2003 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-12557979

RESUMO

Detection of colonic polyps in CT colonography is problematic due to complexities of polyp shape and the surface of the normal colon. Published results indicate the feasibility of computer-aided detection of polyps but better classifiers are needed to improve specificity. In this paper we compare the classification results of two approaches: neural networks and recursive binary trees. As our starting point we collect surface geometry information from three-dimensional reconstruction of the colon, followed by a filter based on selected variables such as region density, Gaussian and average curvature and sphericity. The filter returns sites that are candidate polyps, based on earlier work using detection thresholds, to which the neural nets or the binary trees are applied. A data set of 39 polyps from 3 to 25 mm in size was used in our investigation. For both neural net and binary trees we use tenfold cross-validation to better estimate the true error rates. The backpropagation neural net with one hidden layer trained with Levenberg-Marquardt algorithm achieved the best results: sensitivity 90% and specificity 95% with 16 false positives per study.


Assuntos
Algoritmos , Pólipos do Colo/diagnóstico por imagem , Colonografia Tomográfica Computadorizada/métodos , Redes Neurais de Computação , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Análise por Conglomerados , Reações Falso-Positivas , Humanos , Reconhecimento Automatizado de Padrão , Intensificação de Imagem Radiográfica/métodos , Valores de Referência , Análise de Regressão , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
20.
Acad Radiol ; 10(2): 154-60, 2003 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-12583566

RESUMO

RATIONALE AND OBJECTIVES: A new classification system for colonic polyp detection, designed to increase sensitivity and reduce the number of false-positive findings with computed tomographic colonography, was developed and tested in this study. MATERIALS AND METHODS: The system involves classification by a committee of neural networks (NNs), each using largely distinct subsets of features selected from a general set. Back-propagation NNs trained with the Levenberg-Marquardt algorithm were used as primary classifiers (committee members). The set of features included region density, Gaussian and mean curvature and sphericity, lesion size, colon wall thickness, and the means and standard deviations of all of these values. Subsets of variables were initially selected because of their effectiveness according to training and test sample misclassification rates. The final decision for each case is based on the majority vote across the networks and reflects the weighted votes of all networks. The authors also introduce a smoothed cross-validation method designed to improve estimation of the true misclassification rates by reducing bias and variance. RESULTS: This committee method reduced the false-positive rate by 36%, a clinically meaningful reduction, and improved sensitivity by an average of 6.9% compared with decisions made by any single NN. The overall sensitivity and specificity were 82.9% and 95.3%, respectively, when sensitivity was estimated by means of smoothed cross-validation. CONCLUSION: The proposed method of using multiple classifiers and majority voting is recommended for classification tasks with large sets of input features, particularly when selected feature subsets may not be equally effective and do not provide satisfactory true- and false-positive rates. This approach reduces variance in estimates of misclassification rates.


Assuntos
Pólipos do Colo/diagnóstico por imagem , Colonografia Tomográfica Computadorizada , Redes Neurais de Computação , Algoritmos , Pólipos do Colo/classificação , Diagnóstico por Computador , Humanos , Sensibilidade e Especificidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA