Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Sci Rep ; 13(1): 8476, 2023 05 25.
Artículo en Inglés | MEDLINE | ID: mdl-37231056

RESUMEN

We propose a resampling-based fast variable selection technique for detecting relevant single nucleotide polymorphisms (SNP) in a multi-marker mixed effect model. Due to computational complexity, current practice primarily involves testing the effect of one SNP at a time, commonly termed as 'single SNP association analysis'. Joint modeling of genetic variants within a gene or pathway may have better power to detect associated genetic variants, especially the ones with weak effects. In this paper, we propose a computationally efficient model selection approach-based on the e-values framework-for single SNP detection in families while utilizing information on multiple SNPs simultaneously. To overcome computational bottleneck of traditional model selection methods, our method trains one single model, and utilizes a fast and scalable bootstrap procedure. We illustrate through numerical studies that our proposed method is more effective in detecting SNPs associated with a trait than either single-marker analysis using family data or model selection methods that ignore the familial dependency structure. Further, we perform gene-level analysis in Minnesota Center for Twin and Family Research (MCTFR) dataset using our method to detect several SNPs using this that have been implicated to be associated with alcohol consumption.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Humanos , Estudio de Asociación del Genoma Completo/métodos , Fenotipo , Consumo de Bebidas Alcohólicas , Minnesota , Modelos Genéticos
3.
Pharmaceuticals (Basel) ; 12(4)2019 Oct 16.
Artículo en Inglés | MEDLINE | ID: mdl-31623241

RESUMEN

Human life has been at the edge of catastrophe for millennia due diseases which emerge and reemerge at random. The recent outbreak of the Zika virus (ZIKV) is one such menace that shook the global public health community abruptly. Modern technologies, including computational tools as well as experimental approaches, need to be harnessed fast and effectively in a coordinated manner in order to properly address such challenges. In this paper, based on our earlier research, we have proposed a four-pronged approach to tackle the emerging pathogens like ZIKV: (a) Epidemiological modelling of spread mechanisms of ZIKV; (b) assessment of the public health risk of newly emerging strains of the pathogens by comparing them with existing strains/pathogens using fast computational sequence comparison methods; (c) implementation of vaccine design methods in order to produce a set of probable peptide vaccine candidates for quick synthesis/production and testing in the laboratory; and (d) designing of novel therapeutic molecules and their laboratory testing as well as validation of new drugs or repurposing of drugs for use against ZIKV. For each of these stages, we provide an extensive review of the technical challenges and current state-of-the-art. Further, we outline the future areas of research and discuss how they can work together to proactively combat ZIKV or future emerging pathogens.

4.
Mol Inform ; 38(8-9): e1800164, 2019 08.
Artículo en Inglés | MEDLINE | ID: mdl-31322827

RESUMEN

In this paper we used two sets of calculated molecular descriptors to predict blood-brain barrier (BBB) entry of a collection of 415 chemicals. The set of 579 descriptors were calculated by Schrodinger and TopoCluj software. Polly and Triplet software were used to calculate the second set of 198 descriptors. Following this, modelling and a two-deep, repeated external validation method was used for QSAR formulation. Results show that both sets of descriptors individually and their combination give models of reasonable prediction accuracy. We also uncover the effectiveness of a variable selection approach, by showing that for one of our descriptor sets, the top 5 % predictors in terms of random forest variable importance are able to provide a better performing model than the model with all predictors. The top influential descriptors indicate important aspects of molecular structural features that govern BBB entry of chemicals.


Asunto(s)
Barrera Hematoencefálica/metabolismo , Aprendizaje Automático , Compuestos Orgánicos/química , Compuestos Orgánicos/farmacocinética , Algoritmos , Modelos Moleculares , Relación Estructura-Actividad Cuantitativa , Programas Informáticos
5.
Epidemics ; 27: 59-65, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-30902616

RESUMEN

The recent Zika virus (ZIKV) epidemic in the Americas ranks among the largest outbreaks in modern times. Like other mosquito-borne flaviviruses, ZIKV circulates in sylvatic cycles among primates that can serve as reservoirs of spillover infection to humans. Identifying sylvatic reservoirs is critical to mitigating spillover risk, but relevant surveillance and biological data remain limited for this and most other zoonoses. We confronted this data sparsity by combining a machine learning method, Bayesian multi-label learning, with a multiple imputation method on primate traits. The resulting models distinguished flavivirus-positive primates with 82% accuracy and suggest that species posing the greatest spillover risk are also among the best adapted to human habitations. Given pervasive data sparsity describing animal hosts, and the virtual guarantee of data sparsity in scenarios involving novel or emerging zoonoses, we show that computational methods can be useful in extracting actionable inference from available data to support improved epidemiological response and prevention.


Asunto(s)
Primates/virología , Infección por el Virus Zika/epidemiología , Virus Zika/patogenicidad , Zoonosis/epidemiología , Zoonosis/virología , Animales , Teorema de Bayes , Humanos , Riesgo , Infección por el Virus Zika/patología , Zoonosis/patología
6.
Curr Comput Aided Drug Des ; 14(4): 284-291, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29701159

RESUMEN

BACKGROUND: Proper validation is an important aspect of QSAR modelling. External validation is one of the widely used validation methods in QSAR where the model is built on a subset of the data and validated on the rest of the samples. However, its effectiveness for datasets with a small number of samples but a large number of predictors remains suspect. OBJECTIVE: Calculating hundreds or thousands of molecular descriptors using currently available software has become the norm in QSAR research, owing to computational advances in the past few decades. Thus, for n chemical compounds and p descriptors calculated for each molecule, the typical chemometric dataset today has a high value of p but small n (i.e. n << p). Motivated by the evidence of inadequacies of external validation in estimating the true predictive capability of a statistical model in recent literature, this paper performs an extensive and comparative study of this method with several other validation techniques. METHODOLOGY: We compared four validation methods: Leave-one-out, K-fold, external and multi-split validation, using statistical models built using the LASSO regression, which simultaneously performs variable selection and modelling. We used 300 simulated datasets and one real dataset of 95 congeneric amine mutagens for this evaluation. RESULTS: External validation metrics have high variation among different random splits of the data, hence are not recommended for predictive QSAR models. LOO has the overall best performance among all validation methods applied in our scenario. CONCLUSION: Results from external validation are too unstable for the datasets we analyzed. Based on our findings, we recommend using the LOO procedure for validating QSAR predictive models built on high-dimensional small-sample data.


Asunto(s)
Relación Estructura-Actividad Cuantitativa , Aminas/química , Aminas/farmacología , Simulación por Computador , Modelos Biológicos , Modelos Estadísticos , Mutágenos/química , Mutágenos/farmacología , Análisis de Regresión , Salmonella typhimurium/efectos de los fármacos , Salmonella typhimurium/genética , Programas Informáticos
8.
Indian Pediatr ; 54(1): 65, 2017 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-28141574
9.
Curr Comput Aided Drug Des ; 12(4): 294-301, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27600878

RESUMEN

BACKGROUND: Computed mathematical descriptors of molecules are used for the prediction of their property/ bioactivity. In the 1970s only a few descriptors could be calculated, currently available software can calculate a large number of descriptors for molecules or biomolecules like DNA/ RNA, proteins. OBJECTIVE: When p molecular descriptors are calculated for n molecules, the data set can be viewed as n vectors in p dimensions, each chemical being represented as a point in .. Because many of the descriptors are strongly correlated, the n points in ..will lie on a subspace of dimension lower than p. Methods like principal components analysis (PCA) can be used to characterize the intrinsic dimensionality of chemical spaces. Taking motivation from the work of Basak et al. in 1980s in using PCA of descriptors calculated for various congeneric and structurally diverse sets of chemicals relevant to new drug discovery and predictive toxicology, this paper explores the intrinsic dimensionality of chemical spaces for robust QSAR model development. METHODOLOGY: Intrinsic dimensionality of chemical spaces was studied using three new statistical approaches and two data sets, viz. a congeneric set of 95 aromatic and heteroaromatic amine mutagens and a structurally diverse set of 508 chemical mutagens. RESULTS: The new outlier-robust methods applied here yield favorable prediction results compared to previous studies on same datasets. CONCLUSION: We conclude that while analyzing data on large number of chemical descriptors, it is advisable to build QSAR models that are outlier-robust, and take into consideration the underlying correlations among predictors.


Asunto(s)
Aminas/química , ADN/química , Compuestos Heterocíclicos/química , Modelos Moleculares , Modelos Estadísticos , Mutágenos/química , Relación Estructura-Actividad Cuantitativa , Aminas/toxicidad , Compuestos Heterocíclicos/toxicidad , Humanos , Mutágenos/farmacología , Conformación de Ácido Nucleico , Análisis de Componente Principal
12.
Curr Comput Aided Drug Des ; 11(2): 117-23, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26202887

RESUMEN

Variation in high-dimensional data is often caused by a few latent factors, and hence dimension reduction or variable selection techniques are often useful in gathering useful information from the data. In this paper we consider two such recent methods: Interrelated two-way clustering and envelope models. We couple these methods with traditional statistical procedures like ridge regression and linear discriminant analysis, and apply them on two data sets which have more predictors than samples (i.e. n << p scenario) and several types of molecular descriptors. One of these datasets consists of a congeneric group of Amines while the other has a much diverse collection compounds. The difference of prediction results between these two datasets for both the methods supports the hypothesis that for a congeneric set of compounds, descriptors of a certain type are enough to provide good QSAR models, but as the data set grows diverse including a variety of descriptors can improve model quality considerably.


Asunto(s)
Mutágenos/química , Mutágenos/toxicidad , Aminas/química , Aminas/toxicidad , Animales , Análisis por Conglomerados , Conjuntos de Datos como Asunto , Análisis Discriminante , Humanos , Modelos Biológicos , Relación Estructura-Actividad Cuantitativa
13.
Curr Comput Aided Drug Des ; 9(4): 463-71, 2013 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-24138420

RESUMEN

Interrelated Two-way Clustering (ITC) is an unsupervised clustering method developed to divide samples into two groups in gene expression data obtained through microarrays, selecting important genes simultaneously in the process. This has been found to be a better approach than conventional clustering methods like K-means or selforganizing map for the scenarios when number of samples is much smaller than number of variables (n«p). In this paper we used the ITC approach for classification of a diverse set of 508 chemicals regarding mutagenicity. A large number of topological indices (TIs), 3-dimensional, and quantum chemical descriptors, as well as atom pairs (APs) has been used as explanatory variables. In this paper, ITC has been used only for predictor selection, after which ridge regression is employed to build the final predictive model. The proper leave-one-out (LOO) method of cross-validation in this scenario is to take as holdout each of the 508 compounds before predictor thinning and compare the predicted values with the experimental data. ITC based results obtained here are comparable to those developed earlier.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Modelos Químicos , Modelos Moleculares , Análisis por Conglomerados , Expresión Génica , Humanos , Estructura Molecular , Mutágenos/química , Mutágenos/toxicidad , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Relación Estructura-Actividad Cuantitativa
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...