Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 6.461
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Hum Mol Genet ; 33(4): 342-354, 2024 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-37944069

RESUMO

Peripheral blood mononuclear cells (PBMCs) reflect systemic immune response during cancer progression. However, a comprehensive understanding of the composition and function of PBMCs in cancer patients is lacking, and the potential of these features to assist cancer diagnosis is also unclear. Here, the compositional and status differences between cancer patients and healthy donors in PBMCs were investigated by single-cell RNA sequencing (scRNA-seq), involving 262,025 PBMCs from 68 cancer samples and 14 healthy samples. We observed an enhanced activation and differentiation of most immune subsets in cancer patients, along with reduction of naïve T cells, expansion of macrophages, impairment of NK cells and myeloid cells, as well as tumor promotion and immunosuppression. Based on characteristics including differential cell type abundances and/or hub genes identified from weight gene co-expression network analysis (WGCNA) modules of each major cell type, we applied logistic regression to construct cancer diagnosis models. Furthermore, we found that the above models can distinguish cancer patients and healthy donors with high sensitivity. Our study provided new insights into using the features of PBMCs in non-invasive cancer diagnosis.


Assuntos
Leucócitos Mononucleares , Neoplasias , Humanos , Análise da Expressão Gênica de Célula Única , Neoplasias/diagnóstico , Neoplasias/genética , Diferenciação Celular , Transformação Celular Neoplásica
2.
Hum Mol Genet ; 33(8): 724-732, 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-38271184

RESUMO

Since first publication of the American College of Medical Genetics and Genomics/Association for Medical Pathology (ACMG/AMP) variant classification guidelines, additional recommendations for application of certain criteria have been released (https://clinicalgenome.org/docs/), to improve their application in the diagnostic setting. However, none have addressed use of the PS4 and PP4 criteria, capturing patient presentation as evidence towards pathogenicity. Application of PS4 can be done through traditional case-control studies, or "proband counting" within or across clinical testing cohorts. Review of the existing PS4 and PP4 specifications for Hereditary Cancer Gene Variant Curation Expert Panels revealed substantial differences in the approach to defining specifications. Using BRCA1, BRCA2 and TP53 as exemplar genes, we calibrated different methods proposed for applying the "PS4 proband counting" criterion. For each approach, we considered limitations, non-independence with other ACMG/AMP criteria, broader applicability, and variability in results for different datasets. Our findings highlight inherent overlap of proband-counting methods with ACMG/AMP frequency codes, and the importance of calibration to derive dataset-specific code weights that can account for potential between-dataset differences in ascertainment and other factors. Our work emphasizes the advantages and generalizability of logistic regression analysis over simple proband-counting approaches to empirically determine the relative predictive capacity and weight of various personal clinical features in the context of multigene panel testing, for improved variant interpretation. We also provide a general protocol, including instructions for data formatting and a web-server for analysis of personal history parameters, to facilitate dataset-specific calibration analyses required to use such data for germline variant classification.


Assuntos
Variação Genética , Neoplasias , Humanos , Variação Genética/genética , Testes Genéticos/métodos , Genoma Humano , Fenótipo , Genes Neoplásicos , Neoplasias/genética
3.
Am J Hum Genet ; 110(5): 762-773, 2023 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-37019109

RESUMO

The ongoing release of large-scale sequencing data in the UK Biobank allows for the identification of associations between rare variants and complex traits. SAIGE-GENE+ is a valid approach to conducting set-based association tests for quantitative and binary traits. However, for ordinal categorical phenotypes, applying SAIGE-GENE+ with treating the trait as quantitative or binarizing the trait can cause inflated type I error rates or power loss. In this study, we propose a scalable and accurate method for rare-variant association tests, POLMM-GENE, in which we used a proportional odds logistic mixed model to characterize ordinal categorical phenotypes while adjusting for sample relatedness. POLMM-GENE fully utilizes the categorical nature of phenotypes and thus can well control type I error rates while remaining powerful. In the analyses of UK Biobank 450k whole-exome-sequencing data for five ordinal categorical traits, POLMM-GENE identified 54 gene-phenotype associations.


Assuntos
Exoma , Estudo de Associação Genômica Ampla , Estudo de Associação Genômica Ampla/métodos , Exoma/genética , Bancos de Espécimes Biológicos , Fenótipo , Análise de Dados , Reino Unido
4.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38888457

RESUMO

Large sample datasets have been regarded as the primary basis for innovative discoveries and the solution to missing heritability in genome-wide association studies. However, their computational complexity cannot consider all comprehensive effects and all polygenic backgrounds, which reduces the effectiveness of large datasets. To address these challenges, we included all effects and polygenic backgrounds in a mixed logistic model for binary traits and compressed four variance components into two. The compressed model combined three computational algorithms to develop an innovative method, called FastBiCmrMLM, for large data analysis. These algorithms were tailored to sample size, computational speed, and reduced memory requirements. To mine additional genes, linkage disequilibrium markers were replaced by bin-based haplotypes, which are analyzed by FastBiCmrMLM, named FastBiCmrMLM-Hap. Simulation studies highlighted the superiority of FastBiCmrMLM over GMMAT, SAIGE and fastGWA-GLMM in identifying dominant, small α (allele substitution effect), and rare variants. In the UK Biobank-scale dataset, we demonstrated that FastBiCmrMLM could detect variants as small as 0.03% and with α ≈ 0. In re-analyses of seven diseases in the WTCCC datasets, 29 candidate genes, with both functional and TWAS evidence, around 36 variants identified only by the new methods, strongly validated the new methods. These methods offer a new way to decipher the genetic architecture of binary traits and address the challenges outlined above.


Assuntos
Algoritmos , Estudo de Associação Genômica Ampla , Estudo de Associação Genômica Ampla/métodos , Humanos , Modelos Logísticos , Estudos de Casos e Controles , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , Genômica/métodos , Simulação por Computador , Haplótipos , Modelos Genéticos
5.
Proc Natl Acad Sci U S A ; 120(13): e2221311120, 2023 Mar 28.
Artigo em Inglês | MEDLINE | ID: mdl-36940328

RESUMO

Leveraging a scientific infrastructure for exploring how students learn, we have developed cognitive and statistical models of skill acquisition and used them to understand fundamental similarities and differences across learners. Our primary question was why do some students learn faster than others? Or, do they? We model data from student performance on groups of tasks that assess the same skill component and that provide follow-up instruction on student errors. Our models estimate, for both students and skills, initial correctness and learning rate, that is, the increase in correctness after each practice opportunity. We applied our models to 1.3 million observations across 27 datasets of student interactions with online practice systems in the context of elementary to college courses in math, science, and language. Despite the availability of up-front verbal instruction, like lectures and readings, students demonstrate modest initial prepractice performance, at about 65% accuracy. Despite being in the same course, students' initial performance varies substantially from about 55% correct for those in the lower half to 75% for those in the upper half. In contrast, and much to our surprise, we found students to be astonishingly similar in estimated learning rate, typically increasing by about 0.1 log odds or 2.5% in accuracy per opportunity. These findings pose a challenge for theories of learning to explain the odd combination of large variation in student initial performance and striking regularity in student learning rate.

6.
Genet Epidemiol ; 48(4): 164-189, 2024 06.
Artigo em Inglês | MEDLINE | ID: mdl-38420714

RESUMO

Gene-environment (GxE) interactions play a crucial role in understanding the complex etiology of various traits, but assessing them using observational data can be challenging due to unmeasured confounders for lifestyle and environmental risk factors. Mendelian randomization (MR) has emerged as a valuable method for assessing causal relationships based on observational data. This approach utilizes genetic variants as instrumental variables (IVs) with the aim of providing a valid statistical test and estimation of causal effects in the presence of unmeasured confounders. MR has gained substantial popularity in recent years largely due to the success of genome-wide association studies. Many methods have been developed for MR; however, limited work has been done on evaluating GxE interaction. In this paper, we focus on two primary IV approaches: the two-stage predictor substitution and the two-stage residual inclusion, and extend them to accommodate GxE interaction under both the linear and logistic regression models for continuous and binary outcomes, respectively. Comprehensive simulation study and analytical derivations reveal that resolving the linear regression model is relatively straightforward. In contrast, the logistic regression model presents a considerably more intricate challenge, which demands additional effort.


Assuntos
Interação Gene-Ambiente , Estudo de Associação Genômica Ampla , Análise da Randomização Mendeliana , Humanos , Modelos Logísticos , Modelos Lineares , Polimorfismo de Nucleotídeo Único , Modelos Genéticos , Variação Genética , Simulação por Computador
7.
Brief Bioinform ; 24(5)2023 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-37738402

RESUMO

Understanding the function of the human microbiome is important but the development of statistical methods specifically for the microbial gene expression (i.e. metatranscriptomics) is in its infancy. Many currently employed differential expression analysis methods have been designed for different data types and have not been evaluated in metatranscriptomics settings. To address this gap, we undertook a comprehensive evaluation and benchmarking of 10 differential analysis methods for metatranscriptomics data. We used a combination of real and simulated data to evaluate performance (i.e. type I error, false discovery rate and sensitivity) of the following methods: log-normal (LN), logistic-beta (LB), MAST, DESeq2, metagenomeSeq, ANCOM-BC, LEfSe, ALDEx2, Kruskal-Wallis and two-part Kruskal-Wallis. The simulation was informed by supragingival biofilm microbiome data from 300 preschool-age children enrolled in a study of childhood dental disease (early childhood caries, ECC), whereas validations were sought in two additional datasets from the ECC study and an inflammatory bowel disease study. The LB test showed the highest sensitivity in both small and large samples and reasonably controlled type I error. Contrarily, MAST was hampered by inflated type I error. Upon application of the LN and LB tests in the ECC study, we found that genes C8PHV7 and C8PEV7, harbored by the lactate-producing Campylobacter gracilis, had the strongest association with childhood dental disease. This comprehensive model evaluation offers practical guidance for selection of appropriate methods for rigorous analyses of differential expression in metatranscriptomics. Selection of an optimal method increases the possibility of detecting true signals while minimizing the chance of claiming false ones.


Assuntos
Benchmarking , Doenças Estomatognáticas , Criança , Humanos , Pré-Escolar , Biofilmes , Simulação por Computador , Ácido Láctico
8.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-37183449

RESUMO

Undoubtedly, single-cell RNA sequencing (scRNA-seq) has changed the research landscape by providing insights into heterogeneous, complex and rare cell populations. Given that more such data sets will become available in the near future, their accurate assessment with compatible and robust models for cell type annotation is a prerequisite. Considering this, herein, we developed scAnno (scRNA-seq data annotation), an automated annotation tool for scRNA-seq data sets primarily based on the single-cell cluster levels, using a joint deconvolution strategy and logistic regression. We explicitly constructed a reference profile for human (30 cell types and 50 human tissues) and a reference profile for mouse (26 cell types and 50 mouse tissues) to support this novel methodology (scAnno). scAnno offers a possibility to obtain genes with high expression and specificity in a given cell type as cell type-specific genes (marker genes) by combining co-expression genes with seed genes as a core. Of importance, scAnno can accurately identify cell type-specific genes based on cell type reference expression profiles without any prior information. Particularly, in the peripheral blood mononuclear cell data set, the marker genes identified by scAnno showed cell type-specific expression, and the majority of marker genes matched exactly with those included in the CellMarker database. Besides validating the flexibility and interpretability of scAnno in identifying marker genes, we also proved its superiority in cell type annotation over other cell type annotation tools (SingleR, scPred, CHETAH and scmap-cluster) through internal validation of data sets (average annotation accuracy: 99.05%) and cross-platform data sets (average annotation accuracy: 95.56%). Taken together, we established the first novel methodology that utilizes a deconvolution strategy for automated cell typing and is capable of being a significant application in broader scRNA-seq analysis. scAnno is available at https://github.com/liuhong-jia/scAnno.


Assuntos
Algoritmos , Software , Animais , Camundongos , Humanos , Perfilação da Expressão Gênica/métodos , Leucócitos Mononucleares , Análise de Célula Única/métodos , RNA/genética , Análise de Sequência de RNA/métodos
9.
Stat Appl Genet Mol Biol ; 23(1)2024 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-38563699

RESUMO

Simulation frameworks are useful to stress-test predictive models when data is scarce, or to assert model sensitivity to specific data distributions. Such frameworks often need to recapitulate several layers of data complexity, including emergent properties that arise implicitly from the interaction between simulation components. Antibody-antigen binding is a complex mechanism by which an antibody sequence wraps itself around an antigen with high affinity. In this study, we use a synthetic simulation framework for antibody-antigen folding and binding on a 3D lattice that include full details on the spatial conformation of both molecules. We investigate how emergent properties arise in this framework, in particular the physical proximity of amino acids, their presence on the binding interface, or the binding status of a sequence, and relate that to the individual and pairwise contributions of amino acids in statistical models for binding prediction. We show that weights learnt from a simple logistic regression model align with some but not all features of amino acids involved in the binding, and that predictive sequence binding patterns can be enriched. In particular, main effects correlated with the capacity of a sequence to bind any antigen, while statistical interactions were related to sequence specificity.


Assuntos
Anticorpos , Antifibrinolíticos , Estudos de Viabilidade , Vacinas Sintéticas , Aminoácidos
10.
Cereb Cortex ; 34(4)2024 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-38679479

RESUMO

Normative ferret brain development was characterized using magnetic resonance imaging. Brain growth was longitudinally monitored in 10 ferrets (equal numbers of males and females) from postnatal day 8 (P8) through P38 in 6-d increments. Template T2-weighted images were constructed at each age, and these were manually segmented into 12 to 14 brain regions. A logistic growth model was used to fit data from whole brain volumes and 8 of the individual regions in both males and females. More protracted growth was found in males, which results in larger brains; however, sex differences were not apparent when results were corrected for body weight. Additionally, surface models of the developing cortical plate were registered to one another using the anatomically-constrained Multimodal Surface Matching algorithm. This, in turn, enabled local logistic growth parameters to be mapped across the cortical surface. A close similarity was observed between surface area expansion timing and previous reports of the transverse neurogenic gradient in ferrets. Regional variation in the extent of surface area expansion and the maximum expansion rate was also revealed. This characterization of normative brain growth over the period of cerebral cortex folding may serve as a reference for ferret studies of brain development.


Assuntos
Encéfalo , Furões , Imageamento por Ressonância Magnética , Animais , Furões/crescimento & desenvolvimento , Imageamento por Ressonância Magnética/métodos , Masculino , Feminino , Encéfalo/crescimento & desenvolvimento , Encéfalo/diagnóstico por imagem , Encéfalo/anatomia & histologia , Estudos Longitudinais , Caracteres Sexuais
11.
Cereb Cortex ; 34(1)2024 01 14.
Artigo em Inglês | MEDLINE | ID: mdl-37991271

RESUMO

Neuroimaging markers for risk and protective factors related to type 2 diabetes mellitus are critical for clinical prevention and intervention. In this work, the individual metabolic brain networks were constructed with Jensen-Shannon divergence for 4 groups (elderly type 2 diabetes mellitus and healthy controls, and middle-aged type 2 diabetes mellitus and healthy controls). Regional network properties were used to identify hub regions. Rich-club, feeder, and local connections were subsequently obtained, intergroup differences in connections and correlations between them and age (or fasting plasma glucose) were analyzed. Multinomial logistic regression was performed to explore effects of network changes on the probability of type 2 diabetes mellitus. The elderly had increased rich-club and feeder connections, and decreased local connection than the middle-aged among type 2 diabetes mellitus; type 2 diabetes mellitus had decreased rich-club and feeder connections than healthy controls. Protective factors including glucose metabolism in triangle part of inferior frontal gyrus, metabolic connectivity between triangle of the inferior frontal gyrus and anterior cingulate cortex, degree centrality of putamen, and risk factors including metabolic connectivities between triangle of the inferior frontal gyrus and Heschl's gyri were identified for the probability of type 2 diabetes mellitus. Metabolic interactions among critical brain regions increased in type 2 diabetes mellitus with aging. Individual metabolic network changes co-affected by type 2 diabetes mellitus and aging were identified as protective and risk factors for the likelihood of type 2 diabetes mellitus, providing guiding evidence for clinical interventions.


Assuntos
Diabetes Mellitus Tipo 2 , Pessoa de Meia-Idade , Idoso , Humanos , Imageamento por Ressonância Magnética/métodos , Encéfalo/diagnóstico por imagem , Fatores de Risco , Envelhecimento , Redes e Vias Metabólicas
12.
Proc Natl Acad Sci U S A ; 119(47): e2213879119, 2022 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-36383746

RESUMO

The main mathematical result in this paper is that change of variables in the ordinary differential equation (ODE) for the competition of two infections in a Susceptible-Infected-Removed (SIR) model shows that the fraction of cases due to the new variant satisfies the logistic differential equation, which models selective sweeps. Fitting the logistic to data from the Global Initiative on Sharing All Influenza Data (GISAID) shows that this correctly predicts the rapid turnover from one dominant variant to another. In addition, our fitting gives sensible estimates of the increase in infectivity. These arguments are applicable to any epidemic modeled by SIR equations.


Assuntos
COVID-19 , Epidemias , Influenza Humana , Humanos , SARS-CoV-2/genética , Suscetibilidade a Doenças
13.
BMC Bioinformatics ; 25(1): 253, 2024 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-39090608

RESUMO

BACKGROUND: Conditional logistic regression trees have been proposed as a flexible alternative to the standard method of conditional logistic regression for the analysis of matched case-control studies. While they allow to avoid the strict assumption of linearity and automatically incorporate interactions, conditional logistic regression trees may suffer from a relatively high variability. Further machine learning methods for the analysis of matched case-control studies are missing because conventional machine learning methods cannot handle the matched structure of the data. RESULTS: A random forest method for the analysis of matched case-control studies based on conditional logistic regression trees is proposed, which overcomes the issue of high variability. It provides an accurate estimation of exposure effects while being more flexible in the functional form of covariate effects. The efficacy of the method is illustrated in a simulation study and within an application to real-world data from a matched case-control study on the effect of regular participation in cervical cancer screening on the development of cervical cancer. CONCLUSIONS: The proposed random forest method is a promising add-on to the toolbox for the analysis of matched case-control studies and addresses the need for machine-learning methods in this field. It provides a more flexible approach compared to the standard method of conditional logistic regression, but also compared to conditional logistic regression trees. It allows for non-linearity and the automatic inclusion of interaction effects and is suitable both for exploratory and explanatory analyses.


Assuntos
Aprendizado de Máquina , Algoritmo Florestas Aleatórias , Feminino , Humanos , Estudos de Casos e Controles , Modelos Logísticos , Neoplasias do Colo do Útero
14.
BMC Bioinformatics ; 25(1): 226, 2024 Jun 27.
Artigo em Inglês | MEDLINE | ID: mdl-38937668

RESUMO

BACKGROUND: The matched case-control design, up until recently mostly pertinent to epidemiological studies, is becoming customary in biomedical applications as well. For instance, in omics studies, it is quite common to compare cancer and healthy tissue from the same patient. Furthermore, researchers today routinely collect data from various and variable sources that they wish to relate to the case-control status. This highlights the need to develop and implement statistical methods that can take these tendencies into account. RESULTS: We present an R package penalizedclr, that provides an implementation of the penalized conditional logistic regression model for analyzing matched case-control studies. It allows for different penalties for different blocks of covariates, and it is therefore particularly useful in the presence of multi-source omics data. Both L1 and L2 penalties are implemented. Additionally, the package implements stability selection for variable selection in the considered regression model. CONCLUSIONS: The proposed method fills a gap in the available software for fitting high-dimensional conditional logistic regression models accounting for the matched design and block structure of predictors/features. The output consists of a set of selected variables that are significantly associated with case-control status. These variables can then be investigated in terms of functional interpretation or validation in further, more targeted studies.


Assuntos
Software , Modelos Logísticos , Estudos de Casos e Controles , Humanos , Genômica/métodos , Biologia Computacional/métodos
15.
BMC Bioinformatics ; 25(1): 139, 2024 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-38553698

RESUMO

BACKGROUND: MicroRNA (miRNA) has been shown to play a key role in the occurrence and progression of diseases, making uncovering miRNA-disease associations vital for disease prevention and therapy. However, traditional laboratory methods for detecting these associations are slow, strenuous, expensive, and uncertain. Although numerous advanced algorithms have emerged, it is still a challenge to develop more effective methods to explore underlying miRNA-disease associations. RESULTS: In the study, we designed a novel approach on the basis of deep autoencoder and combined feature representation (DAE-CFR) to predict possible miRNA-disease associations. We began by creating integrated similarity matrices of miRNAs and diseases, performing a logistic function transformation, balancing positive and negative samples with k-means clustering, and constructing training samples. Then, deep autoencoder was used to extract low-dimensional feature from two kinds of feature representations for miRNAs and diseases, namely, original association information-based and similarity information-based. Next, we combined the resulting features for each miRNA-disease pair and used a logistic regression (LR) classifier to infer all unknown miRNA-disease interactions. Under five and tenfold cross-validation (CV) frameworks, DAE-CFR not only outperformed six popular algorithms and nine classifiers, but also demonstrated superior performance on an additional dataset. Furthermore, case studies on three diseases (myocardial infarction, hypertension and stroke) confirmed the validity of DAE-CFR in practice. CONCLUSIONS: DAE-CFR achieved outstanding performance in predicting miRNA-disease associations and can provide evidence to inform biological experiments and clinical therapy.


Assuntos
MicroRNAs , Humanos , MicroRNAs/genética , Biologia Computacional/métodos , Algoritmos , Predisposição Genética para Doença
16.
BMC Bioinformatics ; 25(1): 67, 2024 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-38347472

RESUMO

BACKGROUND: Recording and analyzing microbial growth is a routine task in the life sciences. Microplate readers that record dozens to hundreds of growth curves simultaneously are increasingly used for this task raising the demand for their rapid and reliable analysis. RESULTS: Here, we present Dashing Growth Curves, an interactive web application ( http://dashing-growth-curves.ethz.ch/ ) that enables researchers to quickly visualize and analyze growth curves without the requirement for coding knowledge and independent of operating system. Growth curves can be fitted with parametric and non-parametric models or manually. The application extracts maximum growth rates as well as other features such as lag time, length of exponential growth phase and maximum population size among others. Furthermore, Dashing Growth Curves automatically groups replicate samples and generates downloadable summary plots for of all growth parameters. CONCLUSIONS: Dashing Growth Curves is an open-source web application that reduces the time required to analyze microbial growth curves from hours to minutes.


Assuntos
Software , Interpretação Estatística de Dados
17.
BMC Bioinformatics ; 25(1): 57, 2024 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-38317067

RESUMO

BACKGROUND: Controlling the False Discovery Rate (FDR) in Multiple Comparison Procedures (MCPs) has widespread applications in many scientific fields. Previous studies show that the correlation structure between test statistics increases the variance and bias of FDR. The objective of this study is to modify the effect of correlation in MCPs based on the information theory. We proposed three modified procedures (M1, M2, and M3) under strong, moderate, and mild assumptions based on the conditional Fisher Information of the consecutive sorted test statistics for controlling the false discovery rate under arbitrary correlation structure. The performance of the proposed procedures was compared with the Benjamini-Hochberg (BH) and Benjamini-Yekutieli (BY) procedures in simulation study and real high-dimensional data of colorectal cancer gene expressions. In the simulation study, we generated 1000 differential multivariate Gaussian features with different levels of the correlation structure and screened the significance features by the FDR controlling procedures, with strong control on the Family Wise Error Rates. RESULTS: When there was no correlation between 1000 simulated features, the performance of the BH procedure was similar to the three proposed procedures. In low to medium correlation structures the BY procedure is too conservative. The BH procedure is too liberal, and the mean number of screened features was constant at the different levels of the correlation between features. The mean number of screened features by proposed procedures was between BY and BH procedures and reduced when the correlations increased. Where the features are highly correlated the number of screened features by proposed procedures reached the Bonferroni (BF) procedure, as expected. In real data analysis the BY, BH, M1, M2, and M3 procedures were done to screen gene expressions of colorectal cancer. To fit a predictive model based on the screened features the Efficient Bayesian Logistic Regression (EBLR) model was used. The fitted EBLR models based on the screened features by M1 and M2 procedures have minimum entropies and are more efficient than BY and BH procedures. CONCLUSION: The modified proposed procedures based on information theory, are much more flexible than BH and BY procedures for the amount of correlation between test statistics. The modified procedures avoided screening the non-informative features and so the number of screened features reduced with the increase in the level of correlation.


Assuntos
Neoplasias Colorretais , Teoria da Informação , Humanos , Teorema de Bayes , Genômica , Simulação por Computador
18.
Genet Epidemiol ; 47(4): 332-357, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-36808763

RESUMO

Mendelian randomization is a statistical method for inferring the causal relationship between exposures and outcomes using an economics-derived instrumental variable approach. The research results are relatively complete when both exposures and outcomes are continuous variables. However, due to the noncollapsing nature of the logistic model, the existing methods inherited from the linear model for exploring binary outcome cannot take the effect of confounding factors into account, which leads to biased estimate of the causal effect. In this article, we propose an integrated likelihood method MR-BOIL to investigate causal relationships for binary outcomes by treating confounders as latent variables in one-sample Mendelian randomization. Under the assumption of a joint normal distribution of the confounders, we use expectation maximization algorithm to estimate the causal effect. Extensive simulations demonstrate that the estimator of MR-BOIL is asymptotically unbiased and that our method improves statistical power without inflating type I error rate. We then apply this method to analyze the data from Atherosclerosis Risk in Communications Study. The results show that MR-BOIL can better identify plausible causal relationships with high reliability, compared with the unreliable results of existing methods. MR-BOIL is implemented in R and the corresponding R code is provided for free download.


Assuntos
Análise da Randomização Mendeliana , Modelos Genéticos , Humanos , Funções Verossimilhança , Análise da Randomização Mendeliana/métodos , Reprodutibilidade dos Testes , Causalidade
19.
Stroke ; 55(7): 1798-1807, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38836360

RESUMO

BACKGROUND: Hemodynamic impairment of blood pressure may play a crucial role in determining the mechanisms of stroke in symptomatic intracranial atherosclerotic stenosis). We aimed to elucidate this issue and assess the impacts of modifications to blood pressure on hemodynamic impairment. METHODS: From the Third China National Stroke Registry III, computed fluid dynamics modeling was performed using the Newton-Krylov-Schwarz method in 339 patients with symptomatic intracranial atherosclerotic stenosis during 2015 to 2018. The major exposures were translesional systolic blood pressure (SBP) drop and poststenotic mean arterial pressure (MAP), and the major study outcomes were cortex-involved infarcts and borderzone-involved infarcts, respectively. Multivariate logistic regression models and the bootstrap resampling method were utilized, adjusting for demographics and medical histories. RESULTS: In all, 184 (54.3%) cortex-involved infarcts and 70 (20.6%) borderzone-involved infarcts were identified. In multivariate logistic model, the upper quartile of SBP drop correlated with increased cortex-involved infarcts (odds ratio, 1.92 [95% CI, 1.03-3.57]; bootstrap analysis odds ratio, 2.07 [95% CI, 1.09-3.93]), and the lower quartile of poststenotic MAP may correlate with increased borderzone-involved infarcts (odds ratio, 2.07 [95% CI, 0.95-4.51]; bootstrap analysis odds ratio, 2.38 [95% CI, 1.04-5.45]). Restricted cubic spline analysis revealed a consistent upward trajectory of the relationship between translesional SBP drop and cortex-involved infarcts, while a downward trajectory between poststenotic MAP and borderzone-involved infarcts. SBP drop correlated with poststenotic MAP negatively (rs=-0.765; P<0.001). In generating hemodynamic impairment, simulating blood pressure modifications suggested that ensuring adequate blood pressure to maintain sufficient poststenotic MAP appears preferable to the reverse approach, due to the prolonged plateau period in the association between the translesional SBP drop and cortex-involved infarcts and the relatively short plateau period characterizing the correlation between poststenotic MAP and borderzone-involved infarcts. CONCLUSIONS: This research elucidates the role of hemodynamic impairment of blood pressure in symptomatic intracranial atherosclerotic stenosis-related stroke mechanisms, underscoring the necessity to conduct hemodynamic assessments when managing blood pressure in symptomatic intracranial atherosclerotic stenosis.


Assuntos
Pressão Sanguínea , Hemodinâmica , Arteriosclerose Intracraniana , Acidente Vascular Cerebral , Humanos , Masculino , Arteriosclerose Intracraniana/fisiopatologia , Arteriosclerose Intracraniana/complicações , Feminino , Pessoa de Meia-Idade , Idoso , Pressão Sanguínea/fisiologia , Hemodinâmica/fisiologia , Acidente Vascular Cerebral/fisiopatologia , Acidente Vascular Cerebral/epidemiologia , Sistema de Registros , Constrição Patológica/fisiopatologia , China/epidemiologia
20.
Am J Hum Genet ; 108(5): 825-839, 2021 05 06.
Artigo em Inglês | MEDLINE | ID: mdl-33836139

RESUMO

In genome-wide association studies, ordinal categorical phenotypes are widely used to measure human behaviors, satisfaction, and preferences. However, because of the lack of analysis tools, methods designed for binary or quantitative traits are commonly used inappropriately to analyze categorical phenotypes. To accurately model the dependence of an ordinal categorical phenotype on covariates, we propose an efficient mixed model association test, proportional odds logistic mixed model (POLMM). POLMM is computationally efficient to analyze large datasets with hundreds of thousands of samples, can control type I error rates at a stringent significance level regardless of the phenotypic distribution, and is more powerful than alternative methods. In contrast, the standard linear mixed model approaches cannot control type I error rates for rare variants when the phenotypic distribution is unbalanced, although they performed well when testing common variants. We applied POLMM to 258 ordinal categorical phenotypes on array genotypes and imputed samples from 408,961 individuals in UK Biobank. In total, we identified 5,885 genome-wide significant variants, of which, 424 variants (7.2%) are rare variants with MAF < 0.01.


Assuntos
Simulação por Computador , Estudo de Associação Genômica Ampla , Modelos Genéticos , Fenótipo , Bancos de Espécimes Biológicos , Criança , Feminino , Humanos , Masculino , Projetos de Pesquisa , Reino Unido
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA