Búsqueda | BVS Bolivia

1.

Simultaneous selection of multiple important single nucleotide polymorphisms in familial genome wide association studies data.

Majumdar, Subhabrata; Basu, Saonli; McGue, Matt; Chatterjee, Snigdhansu.

Sci Rep ; 13(1): 8476, 2023 05 25.

Artículo en Inglés | MEDLINE | ID: mdl-37231056

RESUMEN

We propose a resampling-based fast variable selection technique for detecting relevant single nucleotide polymorphisms (SNP) in a multi-marker mixed effect model. Due to computational complexity, current practice primarily involves testing the effect of one SNP at a time, commonly termed as 'single SNP association analysis'. Joint modeling of genetic variants within a gene or pathway may have better power to detect associated genetic variants, especially the ones with weak effects. In this paper, we propose a computationally efficient model selection approach-based on the e-values framework-for single SNP detection in families while utilizing information on multiple SNPs simultaneously. To overcome computational bottleneck of traditional model selection methods, our method trains one single model, and utilizes a fast and scalable bootstrap procedure. We illustrate through numerical studies that our proposed method is more effective in detecting SNPs associated with a trait than either single-marker analysis using family data or model selection methods that ignore the familial dependency structure. Further, we perform gene-level analysis in Minnesota Center for Twin and Family Research (MCTFR) dataset using our method to detect several SNPs using this that have been implicated to be associated with alcohol consumption.

Asunto(s)

Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Humanos , Estudio de Asociación del Genoma Completo/métodos , Fenotipo , Consumo de Bebidas Alcohólicas , Minnesota , Modelos Genéticos

2.

A Generic Computer-Assisted Four-Pronged Approach for the Management of Emerging Global Pathogens: Some Comments on COVID-19.

Basak, Subhash C; Majumdar, Subhabrata; Vracko, Marjan; Nandy, Ashesh; Bhattacharjee, Apurba.

Curr Comput Aided Drug Des ; 16(4): 351-353, 2020.

Artículo en Inglés | MEDLINE | ID: mdl-32174284

Asunto(s)

Diseño Asistido por Computadora , Infecciones por Coronavirus/tratamiento farmacológico , Diseño de Fármacos , Neumonía Viral/tratamiento farmacológico , COVID-19 , Infecciones por Coronavirus/epidemiología , Humanos , Pandemias , Neumonía Viral/epidemiología , Vacunas Virales/administración & dosificación , Tratamiento Farmacológico de COVID-19

3.

Computer-Assisted and Data Driven Approaches for Surveillance, Drug Discovery, and Vaccine Design for the Zika Virus.

Basak, Subhash C; Majumdar, Subhabrata; Nandy, Ashesh; Roy, Proyasha; Dutta, Tathagata; Vracko, Marjan; Bhattacharjee, Apurba K.

Pharmaceuticals (Basel) ; 12(4)2019 Oct 16.

Artículo en Inglés | MEDLINE | ID: mdl-31623241

RESUMEN

Human life has been at the edge of catastrophe for millennia due diseases which emerge and reemerge at random. The recent outbreak of the Zika virus (ZIKV) is one such menace that shook the global public health community abruptly. Modern technologies, including computational tools as well as experimental approaches, need to be harnessed fast and effectively in a coordinated manner in order to properly address such challenges. In this paper, based on our earlier research, we have proposed a four-pronged approach to tackle the emerging pathogens like ZIKV: (a) Epidemiological modelling of spread mechanisms of ZIKV; (b) assessment of the public health risk of newly emerging strains of the pathogens by comparing them with existing strains/pathogens using fast computational sequence comparison methods; (c) implementation of vaccine design methods in order to produce a set of probable peptide vaccine candidates for quick synthesis/production and testing in the laboratory; and (d) designing of novel therapeutic molecules and their laboratory testing as well as validation of new drugs or repurposing of drugs for use against ZIKV. For each of these stages, we provide an extensive review of the technical challenges and current state-of-the-art. Further, we outline the future areas of research and discuss how they can work together to proactively combat ZIKV or future emerging pathogens.

4.

Finding Needles in a Haystack: Determining Key Molecular Descriptors Associated with the Blood-brain Barrier Entry of Chemical Compounds Using Machine Learning.

Majumdar, Subhabrata; Basak, Subhash C; Lungu, Claudiu N; Diudea, Mircea V; Grunwald, Gregory D.

Mol Inform ; 38(8-9): e1800164, 2019 08.

Artículo en Inglés | MEDLINE | ID: mdl-31322827

RESUMEN

In this paper we used two sets of calculated molecular descriptors to predict blood-brain barrier (BBB) entry of a collection of 415 chemicals. The set of 579 descriptors were calculated by Schrodinger and TopoCluj software. Polly and Triplet software were used to calculate the second set of 198 descriptors. Following this, modelling and a two-deep, repeated external validation method was used for QSAR formulation. Results show that both sets of descriptors individually and their combination give models of reasonable prediction accuracy. We also uncover the effectiveness of a variable selection approach, by showing that for one of our descriptor sets, the top 5 % predictors in terms of random forest variable importance are able to provide a better performing model than the model with all predictors. The top influential descriptors indicate important aspects of molecular structural features that govern BBB entry of chemicals.

Asunto(s)

Barrera Hematoencefálica/metabolismo , Aprendizaje Automático , Compuestos Orgánicos/química , Compuestos Orgánicos/farmacocinética , Algoritmos , Modelos Moleculares , Relación Estructura-Actividad Cuantitativa , Programas Informáticos

5.

Confronting data sparsity to identify potential sources of Zika virus spillover infection among primates.

Han, Barbara A; Majumdar, Subhabrata; Calmon, Flavio P; Glicksberg, Benjamin S; Horesh, Raya; Kumar, Abhishek; Perer, Adam; von Marschall, Elisa B; Wei, Dennis; Mojsilovic, Aleksandra; Varshney, Kush R.

Epidemics ; 27: 59-65, 2019 06.

Artículo en Inglés | MEDLINE | ID: mdl-30902616

RESUMEN

The recent Zika virus (ZIKV) epidemic in the Americas ranks among the largest outbreaks in modern times. Like other mosquito-borne flaviviruses, ZIKV circulates in sylvatic cycles among primates that can serve as reservoirs of spillover infection to humans. Identifying sylvatic reservoirs is critical to mitigating spillover risk, but relevant surveillance and biological data remain limited for this and most other zoonoses. We confronted this data sparsity by combining a machine learning method, Bayesian multi-label learning, with a multiple imputation method on primate traits. The resulting models distinguished flavivirus-positive primates with 82% accuracy and suggest that species posing the greatest spillover risk are also among the best adapted to human habitations. Given pervasive data sparsity describing animal hosts, and the virtual guarantee of data sparsity in scenarios involving novel or emerging zoonoses, we show that computational methods can be useful in extracting actionable inference from available data to support improved epidemiological response and prevention.

Asunto(s)

Primates/virología , Infección por el Virus Zika/epidemiología , Virus Zika/patogenicidad , Zoonosis/epidemiología , Zoonosis/virología , Animales , Teorema de Bayes , Humanos , Riesgo , Infección por el Virus Zika/patología , Zoonosis/patología

6.

Beware of External Validation! - A Comparative Study of Several Validation Techniques used in QSAR Modelling.

Majumdar, Subhabrata; Basak, Subhash C.

Curr Comput Aided Drug Des ; 14(4): 284-291, 2018.

Artículo en Inglés | MEDLINE | ID: mdl-29701159

RESUMEN

BACKGROUND: Proper validation is an important aspect of QSAR modelling. External validation is one of the widely used validation methods in QSAR where the model is built on a subset of the data and validated on the rest of the samples. However, its effectiveness for datasets with a small number of samples but a large number of predictors remains suspect. OBJECTIVE: Calculating hundreds or thousands of molecular descriptors using currently available software has become the norm in QSAR research, owing to computational advances in the past few decades. Thus, for n chemical compounds and p descriptors calculated for each molecule, the typical chemometric dataset today has a high value of p but small n (i.e. n << p). Motivated by the evidence of inadequacies of external validation in estimating the true predictive capability of a statistical model in recent literature, this paper performs an extensive and comparative study of this method with several other validation techniques. METHODOLOGY: We compared four validation methods: Leave-one-out, K-fold, external and multi-split validation, using statistical models built using the LASSO regression, which simultaneously performs variable selection and modelling. We used 300 simulated datasets and one real dataset of 95 congeneric amine mutagens for this evaluation. RESULTS: External validation metrics have high variation among different random splits of the data, hence are not recommended for predictive QSAR models. LOO has the overall best performance among all validation methods applied in our scenario. CONCLUSION: Results from external validation are too unstable for the datasets we analyzed. Based on our findings, we recommend using the LOO procedure for validating QSAR predictive models built on high-dimensional small-sample data.

Asunto(s)

Relación Estructura-Actividad Cuantitativa , Aminas/química , Aminas/farmacología , Simulación por Computador , Modelos Biológicos , Modelos Estadísticos , Mutágenos/química , Mutágenos/farmacología , Análisis de Regresión , Salmonella typhimurium/efectos de los fármacos , Salmonella typhimurium/genética , Programas Informáticos

7.

Editorial: Beware of Naïve q2, use True q2: Some Comments on QSAR Model Building and Cross Validation.

Majumdar, Subhabrata; Basak, Subhash C.

Curr Comput Aided Drug Des ; 14(1): 5-6, 2018.

Artículo en Inglés | MEDLINE | ID: mdl-29624158

Asunto(s)

Modelos Moleculares , Relación Estructura-Actividad Cuantitativa , Estudios de Validación como Asunto , Humanos , Estadística como Asunto

8.

Statistical Methods: Need for a Rethink.

Biswas, Tamoghna; Majumdar, Subhabrata.

Indian Pediatr ; 54(1): 65, 2017 Jan 15.

Artículo en Inglés | MEDLINE | ID: mdl-28141574

9.

Exploring Intrinsic Dimensionality of Chemical Spaces for Robust QSAR Model Development: A Comparison of Several Statistical Approaches.

Majumdar, Subhabrata; Basak, Subhash C.

Curr Comput Aided Drug Des ; 12(4): 294-301, 2016.

Artículo en Inglés | MEDLINE | ID: mdl-27600878

RESUMEN

BACKGROUND: Computed mathematical descriptors of molecules are used for the prediction of their property/ bioactivity. In the 1970s only a few descriptors could be calculated, currently available software can calculate a large number of descriptors for molecules or biomolecules like DNA/ RNA, proteins. OBJECTIVE: When p molecular descriptors are calculated for n molecules, the data set can be viewed as n vectors in p dimensions, each chemical being represented as a point in .. Because many of the descriptors are strongly correlated, the n points in ..will lie on a subspace of dimension lower than p. Methods like principal components analysis (PCA) can be used to characterize the intrinsic dimensionality of chemical spaces. Taking motivation from the work of Basak et al. in 1980s in using PCA of descriptors calculated for various congeneric and structurally diverse sets of chemicals relevant to new drug discovery and predictive toxicology, this paper explores the intrinsic dimensionality of chemical spaces for robust QSAR model development. METHODOLOGY: Intrinsic dimensionality of chemical spaces was studied using three new statistical approaches and two data sets, viz. a congeneric set of 95 aromatic and heteroaromatic amine mutagens and a structurally diverse set of 508 chemical mutagens. RESULTS: The new outlier-robust methods applied here yield favorable prediction results compared to previous studies on same datasets. CONCLUSION: We conclude that while analyzing data on large number of chemical descriptors, it is advisable to build QSAR models that are outlier-robust, and take into consideration the underlying correlations among predictors.

Asunto(s)

Aminas/química , ADN/química , Compuestos Heterocíclicos/química , Modelos Moleculares , Modelos Estadísticos , Mutágenos/química , Relación Estructura-Actividad Cuantitativa , Aminas/toxicidad , Compuestos Heterocíclicos/toxicidad , Humanos , Mutágenos/farmacología , Conformación de Ácido Nucleico , Análisis de Componente Principal

10.

EDITORIAL: Chemodescriptor Based QSARs of Structurally Homogeneous Versus Heterogeneous Chemical Data Sets: Some Comments on the Congenericity Principle vis-à-vis Diversity Begets Diversity Principle.

Basak, Subhash C; Majumdar, Subhabrata.

Curr Comput Aided Drug Des ; 12(2): 84-6, 2016.

Artículo en Inglés | MEDLINE | ID: mdl-27484117

Asunto(s)

Bases de Datos de Compuestos Químicos , Mutágenos/química , Relación Estructura-Actividad Cuantitativa , Humanos , Estructura Molecular

11.

The Importance of Rigorous Statistical Practice in the Current Landscape of QSAR Modelling.

Basak, Subhash C; Majumdar, Subhabrata.

Curr Comput Aided Drug Des ; 11(1): 2-4, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-26205831

Asunto(s)

Descubrimiento de Drogas , Relación Estructura-Actividad Cuantitativa , Algoritmos , Descubrimiento de Drogas/métodos , Humanos , Modelos Biológicos

12.

Prediction of Mutagenicity of Chemicals from Their Calculated Molecular Descriptors: A Case Study with Structurally Homogeneous versus Diverse Datasets.

Basak, Subhash C; Majumdar, Subhabrata.

Curr Comput Aided Drug Des ; 11(2): 117-23, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-26202887

RESUMEN

Variation in high-dimensional data is often caused by a few latent factors, and hence dimension reduction or variable selection techniques are often useful in gathering useful information from the data. In this paper we consider two such recent methods: Interrelated two-way clustering and envelope models. We couple these methods with traditional statistical procedures like ridge regression and linear discriminant analysis, and apply them on two data sets which have more predictors than samples (i.e. n << p scenario) and several types of molecular descriptors. One of these datasets consists of a congeneric group of Amines while the other has a much diverse collection compounds. The difference of prediction results between these two datasets for both the methods supports the hypothesis that for a congeneric set of compounds, descriptors of a certain type are enough to provide good QSAR models, but as the data set grows diverse including a variety of descriptors can improve model quality considerably.

Asunto(s)

Mutágenos/química , Mutágenos/toxicidad , Aminas/química , Aminas/toxicidad , Animales , Análisis por Conglomerados , Conjuntos de Datos como Asunto , Análisis Discriminante , Humanos , Modelos Biológicos , Relación Estructura-Actividad Cuantitativa

13.

Adapting interrelated two-way clustering method for quantitative structure-activity relationship (QSAR) modeling of mutagenicity/non- mutagenicity of a diverse set of chemicals.

Majumdar, Subhabrata; Basak, Subhash C; Grunwald, Gregory D.

Curr Comput Aided Drug Des ; 9(4): 463-71, 2013 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-24138420

RESUMEN

Interrelated Two-way Clustering (ITC) is an unsupervised clustering method developed to divide samples into two groups in gene expression data obtained through microarrays, selecting important genes simultaneously in the process. This has been found to be a better approach than conventional clustering methods like K-means or selforganizing map for the scenarios when number of samples is much smaller than number of variables (n«p). In this paper we used the ITC approach for classification of a diverse set of 508 chemicals regarding mutagenicity. A large number of topological indices (TIs), 3-dimensional, and quantum chemical descriptors, as well as atom pairs (APs) has been used as explanatory variables. In this paper, ITC has been used only for predictor selection, after which ridge regression is employed to build the final predictive model. The proper leave-one-out (LOO) method of cross-validation in this scenario is to take as holdout each of the 508 compounds before predictor thinning and compare the predicted values with the experimental data. ITC based results obtained here are comparable to those developed earlier.

Asunto(s)

Perfilación de la Expresión Génica/métodos , Modelos Químicos , Modelos Moleculares , Análisis por Conglomerados , Expresión Génica , Humanos , Estructura Molecular , Mutágenos/química , Mutágenos/toxicidad , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Relación Estructura-Actividad Cuantitativa

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

Asunto(s)

Asunto(s)

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA