Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
1.
Microsc Microanal ; 30(3): 456-465, 2024 Jul 04.
Artículo en Inglés | MEDLINE | ID: mdl-38758983

RESUMEN

Traditionally, materials discovery has been driven more by evidence and intuition than by systematic design. However, the advent of "big data" and an exponential increase in computational power have reshaped the landscape. Today, we use simulations, artificial intelligence (AI), and machine learning (ML) to predict materials characteristics, which dramatically accelerates the discovery of novel materials. For instance, combinatorial megalibraries, where millions of distinct nanoparticles are created on a single chip, have spurred the need for automated characterization tools. This paper presents an ML model specifically developed to perform real-time binary classification of grayscale high-angle annular dark-field images of nanoparticles sourced from these megalibraries. Given the high costs associated with downstream processing errors, a primary requirement for our model was to minimize false positives while maintaining efficacy on unseen images. We elaborate on the computational challenges and our solutions, including managing memory constraints, optimizing training time, and utilizing Neural Architecture Search tools. The final model outperformed our expectations, achieving over 95% precision and a weighted F-score of more than 90% on our test data set. This paper discusses the development, challenges, and successful outcomes of this significant advancement in the application of AI and ML to materials discovery.

2.
J Hum Genet ; 68(6): 409-417, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-36813834

RESUMEN

Structural variants contribute to genetic variability in human genomes and they can be presented in population-specific patterns. We aimed to understand the landscape of structural variants in the genomes of healthy Indian individuals and explore their potential implications in genetic disease conditions. For the identification of structural variants, a whole genome sequencing dataset of 1029 self-declared healthy Indian individuals from the IndiGen project was analysed. Further, these variants were evaluated for potential pathogenicity and their associations with genetic diseases. We also compared our identified variations with the existing global datasets. We generated a compendium of total 38,560 high-confident structural variants, comprising 28,393 deletions, 5030 duplications, 5038 insertions, and 99 inversions. Particularly, we identified around 55% of all these variants were found to be unique to the studied population. Further analysis revealed 134 deletions with predicted pathogenic/likely pathogenic effects and their affected genes were majorly enriched for neurological disease conditions, such as intellectual disability and neurodegenerative diseases. The IndiGenomes dataset helped us to understand the unique spectrum of structural variants in the Indian population. More than half of identified variants were not present in the publicly available global dataset on structural variants. Clinically important deletions identified in IndiGenomes might aid in improving the diagnosis of unsolved genetic diseases, particularly in neurological conditions. Along with basal allele frequency data and clinically important deletions, IndiGenomes data might serve as a baseline resource for future studies on genomic structural variant analysis in the Indian population.


Asunto(s)
Pueblo Asiatico , Genoma Humano , Humanos , Frecuencia de los Genes , Secuenciación Completa del Genoma , Genoma Humano/genética
3.
J Chem Inf Model ; 63(7): 1865-1871, 2023 04 10.
Artículo en Inglés | MEDLINE | ID: mdl-36972592

RESUMEN

The applications of artificial intelligence, machine learning, and deep learning techniques in the field of materials science are becoming increasingly common due to their promising abilities to extract and utilize data-driven information from available data and accelerate materials discovery and design for future applications. In an attempt to assist with this process, we deploy predictive models for multiple material properties, given the composition of the material. The deep learning models described here are built using a cross-property deep transfer learning technique, which leverages source models trained on large data sets to build target models on small data sets with different properties. We deploy these models in an online software tool that takes a number of material compositions as input, performs preprocessing to generate composition-based attributes for each material, and feeds them into the predictive models to obtain up to 41 different material property values. The material property predictor is available online at http://ai.eecs.northwestern.edu/MPpredictor.


Asunto(s)
Inteligencia Artificial , Programas Informáticos , Aprendizaje Automático
4.
Nucleic Acids Res ; 49(D1): D1225-D1232, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33095885

RESUMEN

With the advent of next-generation sequencing, large-scale initiatives for mining whole genomes and exomes have been employed to better understand global or population-level genetic architecture. India encompasses more than 17% of the world population with extensive genetic diversity, but is under-represented in the global sequencing datasets. This gave us the impetus to perform and analyze the whole genome sequencing of 1029 healthy Indian individuals under the pilot phase of the 'IndiGen' program. We generated a compendium of 55,898,122 single allelic genetic variants from geographically distinct Indian genomes and calculated the allele frequency, allele count, allele number, along with the number of heterozygous or homozygous individuals. In the present study, these variants were systematically annotated using publicly available population databases and can be accessed through a browsable online database named as 'IndiGenomes' http://clingen.igib.res.in/indigen/. The IndiGenomes database will help clinicians and researchers in exploring the genetic component underlying medical conditions. Till date, this is the most comprehensive genetic variant resource for the Indian population and is made freely available for academic utility. The resource has also been accessed extensively by the worldwide community since it's launch.


Asunto(s)
Bases de Datos Genéticas , Variación Genética , Genoma Humano , Proyecto Genoma Humano , Programas Informáticos , Adulto , Exoma , Femenino , Genética de Población/estadística & datos numéricos , Humanos , India , Internet , Masculino , Anotación de Secuencia Molecular , Secuenciación Completa del Genoma
6.
J Cheminform ; 16(1): 17, 2024 Feb 16.
Artículo en Inglés | MEDLINE | ID: mdl-38365691

RESUMEN

Modern data mining techniques using machine learning (ML) and deep learning (DL) algorithms have been shown to excel in the regression-based task of materials property prediction using various materials representations. In an attempt to improve the predictive performance of the deep neural network model, researchers have tried to add more layers as well as develop new architectural components to create sophisticated and deep neural network models that can aid in the training process and improve the predictive ability of the final model. However, usually, these modifications require a lot of computational resources, thereby further increasing the already large model training time, which is often not feasible, thereby limiting usage for most researchers. In this paper, we study and propose a deep neural network framework for regression-based problems comprising of fully connected layers that can work with any numerical vector-based materials representations as model input. We present a novel deep regression neural network, iBRNet, with branched skip connections and multiple schedulers, which can reduce the number of parameters used to construct the model, improve the accuracy, and decrease the training time of the predictive model. We perform the model training using composition-based numerical vectors representing the elemental fractions of the respective materials and compare their performance against other traditional ML and several known DL architectures. Using multiple datasets with varying data sizes for training and testing, We show that the proposed iBRNet models outperform the state-of-the-art ML and DL models for all data sizes. We also show that the branched structure and usage of multiple schedulers lead to fewer parameters and faster model training time with better convergence than other neural networks. Scientific contribution: The combination of multiple callback functions in deep neural networks minimizes training time and maximizes accuracy in a controlled computational environment with parametric constraints for the task of materials property prediction.

7.
Comput Biol Chem ; 112: 108118, 2024 Jun 10.
Artículo en Inglés | MEDLINE | ID: mdl-38878606

RESUMEN

Mitochondrial disorders are a class of heterogeneous disorders caused by genetic variations in the mitochondrial genome (mtDNA) as well as the nuclear genome. The spectrum of mtDNA variants remains unexplored in the Indian population. In the present study, we have cataloged 2689 high confidence single nucleotide variants, small insertions and deletions in mtDNA in 1029 healthy Indian individuals. We found a major proportion (76.5 %) of the variants being rare (AF<=0.005) in the studied population. Intriguingly, we found two 'confirmed' pathogenic variants (m.1555 A>G and m.14484 T>C) with a frequency of ∼1 in 250 individuals in our dataset. The high carrier frequency underscores the need for screening of the mtDNA pathogenic mutations in newborns in India. Interestingly, our analysis also revealed 202 variants in our dataset which have been 'reported' in disease cases as per the MITOMAP database. Additionally, we found the frequency of haplogroup M (52.2 %) to be the highest among all the 18 top-level haplogroups found in our dataset. In comparison to the global population datasets, 20 unique mtDNA variants are found in the Indian population. We hope the whole genome sequencing based compendium of mtDNA variants along with their allele frequencies and heteroplasmy levels in the Indian population will drive additional genome scale studies for mtDNA. Furthermore, the identification of clinically relevant variants in our dataset will aid in better clinical interpretation of the variants in mitochondrial disorders.

8.
Mitochondrion ; 75: 101844, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38237647

RESUMEN

Genomic investigations on an infant who presented with a putative mitochondrial disorder led to identification of compound heterozygous deletion with an overlapping region of ∼142 kb encompassing two nuclear encoded genes namely ERCC8 and NDUFAF2. Investigations on fetal-derived fibroblast culture demonstrated impaired bioenergetics and mitochondrial dysfunction, which explains the phenotype and observed infant mortality in the present study. The genetic findings from this study extended the utility of whole-genome sequencing as it led to development of a MLPA-based assay for carrier screening in the extended family and the prenatal testing aiding in the birth of two healthy children.


Asunto(s)
Mortalidad Infantil , Mitocondrias , Lactante , Niño , Embarazo , Femenino , Humanos , Mitocondrias/genética , Secuenciación Completa del Genoma , Metabolismo Energético , Genómica , Factores de Transcripción/genética , Enzimas Reparadoras del ADN/genética , Chaperonas Moleculares/genética , Proteínas Mitocondriales/genética
9.
Sci Rep ; 13(1): 9128, 2023 Jun 05.
Artículo en Inglés | MEDLINE | ID: mdl-37277456

RESUMEN

Modern machine learning (ML) and deep learning (DL) techniques using high-dimensional data representations have helped accelerate the materials discovery process by efficiently detecting hidden patterns in existing datasets and linking input representations to output properties for a better understanding of the scientific phenomenon. While a deep neural network comprised of fully connected layers has been widely used for materials property prediction, simply creating a deeper model with a large number of layers often faces with vanishing gradient problem, causing a degradation in the performance, thereby limiting usage. In this paper, we study and propose architectural principles to address the question of improving the performance of model training and inference under fixed parametric constraints. Here, we present a general deep-learning framework based on branched residual learning (BRNet) with fully connected layers that can work with any numerical vector-based representation as input to build accurate models to predict materials properties. We perform model training for materials properties using numerical vectors representing different composition-based attributes of the respective materials and compare the performance of the proposed models against traditional ML and existing DL architectures. We find that the proposed models are significantly more accurate than the ML/DL models for all data sizes by using different composition-based attributes as input. Further, branched learning requires fewer parameters and results in faster model training due to better convergence during the training phase than existing neural networks, thereby efficiently building accurate models for predicting materials properties.

10.
Sci Rep ; 12(1): 11953, 2022 07 13.
Artículo en Inglés | MEDLINE | ID: mdl-35831344

RESUMEN

While experiments and DFT-computations have been the primary means for understanding the chemical and physical properties of crystalline materials, experiments are expensive and DFT-computations are time-consuming and have significant discrepancies against experiments. Currently, predictive modeling based on DFT-computations have provided a rapid screening method for materials candidates for further DFT-computations and experiments; however, such models inherit the large discrepancies from the DFT-based training data. Here, we demonstrate how AI can be leveraged together with DFT to compute materials properties more accurately than DFT itself by focusing on the critical materials science task of predicting "formation energy of a material given its structure and composition". On an experimental hold-out test set containing 137 entries, AI can predict formation energy from materials structure and composition with a mean absolute error (MAE) of 0.064 eV/atom; comparing this against DFT-computations, we find that AI can significantly outperform DFT computations for the same task (discrepancies of [Formula: see text] eV/atom) for the first time.


Asunto(s)
Inteligencia Artificial
11.
Hum Immunol ; 83(4): 335-345, 2022 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-35074268

RESUMEN

X-linked agammaglobulinemia (XLA) is an X-linked recessive primary immunodeficiency disorder caused due to a pathogenic variant in the Bruton tyrosine (BTK) gene with an incidence of 1:379,000 live births and 1:190,000 male births. Patients affected with XLA present with recurrent infections of the gastrointestinal and respiratory tracts. Here we report the first case series of 17 XLA patients of 10 South Indian families with a wide spectrum of clinical and genetic features. In our cohort, patients presented mainly with recurrent pneumonia, gastrointestinal infection, otitis media, pyoderma, abscesses, empyema, arthritis, and osteomyelitis. Using next-generation and Sanger sequencing we have identified 10 unique pathogenic and likely pathogenic variants in 17 patients. This encompasses three nonsynonymous, two stop-gain, two frameshifts, two structural, and one splicing variant, out of which two of them are novel. Based on the type of variant, patients had variable clinical features and treatment responses. We have also evaluated Btk protein expression for six patients in comparison to the healthy individuals and determined mosaic Btk expression patterns in four mothers. We have also performed family screening in 6 families using Sanger sequencing and identified 19 carriers for the variant. The diagnosis for the patients led to the proper treatment i.e. 15 patients were on intravenous immunoglobulin (IVIG) and the other two had successful hematopoietic stem cell transplantation (HSCT). Unfortunately, two of our patients died due to sepsis, while on IVIG. We envision the present study could help in better understanding of patients with XLA and help in family screening and prenatal diagnosis. To the best of our knowledge, this is the largest case series of patients affected with XLA from South India.


Asunto(s)
Agammaglobulinemia , Enfermedades Genéticas Ligadas al Cromosoma X , Agammaglobulinemia Tirosina Quinasa/genética , Agammaglobulinemia/diagnóstico , Agammaglobulinemia/genética , Niño , Enfermedades Genéticas Ligadas al Cromosoma X/diagnóstico , Enfermedades Genéticas Ligadas al Cromosoma X/genética , Enfermedades Genéticas Ligadas al Cromosoma X/terapia , Humanos , Inmunoglobulinas Intravenosas/uso terapéutico , Masculino , Mutación
12.
Adv Genet ; 107: 121-152, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33641745

RESUMEN

Human migration and community specific cultural practices have contributed to founder events and enrichment of the variants associated with genetic diseases. While many founder events in isolated populations have remained uncharacterized, the application of genomics in clinical settings as well as for population scale studies in the recent years have provided an unprecedented push towards identification of founder variants associated with human health and disease. The discovery and characterization of founder variants could have far reaching implications not only in understanding the history or genealogy of the disease, but also in implementing evidence based policies and genetic testing frameworks. This further enables precise diagnosis and prevention in an attempt towards precision medicine. This review provides an overview of founder variants along with methods and resources cataloging them. We have also discussed the public health implications and examples of prevalent disease associated founder variants in specific populations.


Asunto(s)
Bases de Datos Genéticas , Efecto Fundador , Mutación , Finlandia , Enfermedades Genéticas Congénitas/genética , Marcadores Genéticos , Genética de Población , Genoma Humano , Humanos , Medicina de Precisión/métodos , Salud Pública
13.
Nat Commun ; 12(1): 6595, 2021 11 15.
Artículo en Inglés | MEDLINE | ID: mdl-34782631

RESUMEN

Artificial intelligence (AI) and machine learning (ML) have been increasingly used in materials science to build predictive models and accelerate discovery. For selected properties, availability of large databases has also facilitated application of deep learning (DL) and transfer learning (TL). However, unavailability of large datasets for a majority of properties prohibits widespread application of DL/TL. We present a cross-property deep-transfer-learning framework that leverages models trained on large datasets to build models on small datasets of different properties. We test the proposed framework on 39 computational and two experimental datasets and find that the TL models with only elemental fractions as input outperform ML/DL models trained from scratch even when they are allowed to use physical attributes as input, for 27/39 (≈ 69%) computational and both the experimental datasets. We believe that the proposed framework can be widely useful to tackle the small data challenge in applying AI/ML in materials science.

14.
Sci Rep ; 11(1): 4244, 2021 02 19.
Artículo en Inglés | MEDLINE | ID: mdl-33608599

RESUMEN

The application of machine learning (ML) techniques in materials science has attracted significant attention in recent years, due to their impressive ability to efficiently extract data-driven linkages from various input materials representations to their output properties. While the application of traditional ML techniques has become quite ubiquitous, there have been limited applications of more advanced deep learning (DL) techniques, primarily because big materials datasets are relatively rare. Given the demonstrated potential and advantages of DL and the increasing availability of big materials datasets, it is attractive to go for deeper neural networks in a bid to boost model performance, but in reality, it leads to performance degradation due to the vanishing gradient problem. In this paper, we address the question of how to enable deeper learning for cases where big materials data is available. Here, we present a general deep learning framework based on Individual Residual learning (IRNet) composed of very deep neural networks that can work with any vector-based materials representation as input to build accurate property prediction models. We find that the proposed IRNet models can not only successfully alleviate the vanishing gradient problem and enable deeper learning, but also lead to significantly (up to 47%) better model accuracy as compared to plain deep neural networks and traditional ML techniques for a given input materials representation in the presence of big data.

15.
PLoS One ; 16(7): e0254407, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34252140

RESUMEN

X-linked agammaglobulinemia (XLA, OMIM #300755) is a primary immunodeficiency disorder caused by pathogenic variations in the BTK gene, characterized by failure of development and maturation of B lymphocytes. The estimated prevalence worldwide is 1 in 190,000 male births. Recently, genome sequencing has been widely used in difficult to diagnose and familial cases. We report a large Indian family suffering from XLA with five affected individuals. We performed complete blood count, immunoglobulin assay, and lymphocyte subset analysis for all patients and analyzed Btk expression for one patient and his mother. Whole exome sequencing (WES) for four patients, and whole genome sequencing (WGS) for two patients have been performed. Carrier screening was done for 17 family members using Multiplex Ligation-dependent Probe Amplification (MLPA) and haplotype ancestry mapping using fineSTRUCTURE was performed. All patients had hypogammaglobulinemia and low CD19+ B cells. One patient who underwent Btk estimation had low expression and his mother showed a mosaic pattern. We could not identify any single nucleotide variants or small insertion/ deletions from the WES dataset that correlates with the clinical feature of the patient. Structural variant analysis through WGS data identifies a novel large deletion of 5,296 bp at loci chrX:100,624,323-100,629,619 encompassing exons 3-5 of the BTK gene. Family screening revealed seven carriers for the deletion. Two patients had a successful HSCT. Haplotype mapping revealed a South Asian ancestry. WGS led to identification of the accurate genetic mutation which could help in early diagnosis leading to improved outcomes, prevention of permanent organ damage and improved quality of life, as well as enabling genetic counselling and prenatal diagnosis in the family.


Asunto(s)
Agammaglobulinemia/genética , Análisis Mutacional de ADN/métodos , Secuenciación del Exoma/métodos , Exoma/genética , Exones/genética , Citometría de Flujo , Haplotipos/genética , Trasplante de Células Madre Hematopoyéticas , Humanos , Masculino , Mutación/genética
16.
Pharmacogenomics ; 22(10): 603-618, 2021 07.
Artículo en Inglés | MEDLINE | ID: mdl-34142560

RESUMEN

Aim: Numerous drugs are being widely prescribed for COVID-19 treatment without any direct evidence for the drug safety/efficacy in patients across diverse ethnic populations. Materials & methods: We analyzed whole genomes of 1029 Indian individuals (IndiGen) to understand the extent of drug-gene (pharmacogenetic), drug-drug and drug-drug-gene interactions associated with COVID-19 therapy in the Indian population. Results: We identified 30 clinically significant pharmacogenetic variants and 73 predicted deleterious pharmacogenetic variants. COVID-19-associated pharmacogenes were substantially overlapped with those of metabolic disorder therapeutics. CYP3A4, ABCB1 and ALB are the most shared pharmacogenes. Fifteen COVID-19 therapeutics were predicted as likely drug-drug interaction candidates when used with four CYP inhibitor drugs. Conclusion: Our findings provide actionable insights for future validation studies and improved clinical decisions for COVID-19 therapy in Indians.


Asunto(s)
Tratamiento Farmacológico de COVID-19 , COVID-19/genética , Antivirales/uso terapéutico , Pueblo Asiatico , Interacciones Farmacológicas/genética , Genoma/genética , Genotipo , Humanos , India , Farmacogenética/métodos , Pruebas de Farmacogenómica/métodos , Variantes Farmacogenómicas/genética , SARS-CoV-2/efectos de los fármacos
17.
J Genet Eng Biotechnol ; 19(1): 183, 2021 Dec 14.
Artículo en Inglés | MEDLINE | ID: mdl-34905135

RESUMEN

BACKGROUND: Autoinflammatory disorders are the group of inherited inflammatory disorders caused due to the genetic defect in the genes that regulates innate immune systems. These have been clinically characterized based on the duration and occurrence of unprovoked fever, skin rash, and patient's ancestry. There are several autoinflammatory disorders that are found to be prevalent in a specific population and whose disease genetic epidemiology within the population has been well understood. However, India has a limited number of genetic studies reported for autoinflammatory disorders till date. The whole genome sequencing and analysis of 1029 Indian individuals performed under the IndiGen project persuaded us to perform the genetic epidemiology of the autoinflammatory disorders in India. RESULTS: We have systematically annotated the genetic variants of 56 genes implicated in autoinflammatory disorder. These genetic variants were reclassified into five categories (i.e., pathogenic, likely pathogenic, benign, likely benign, and variant of uncertain significance (VUS)) according to the American College of Medical Genetics and Association of Molecular pathology (ACMG-AMP) guidelines. Our analysis revealed 20 pathogenic and likely pathogenic variants with significant differences in the allele frequency compared with the global population. We also found six causal founder variants in the IndiGen dataset belonging to different ancestry. We have performed haplotype prediction analysis for founder mutations haplotype that reveals the admixture of the South Asian population with other populations. The cumulative carrier frequency of the autoinflammatory disorder in India was found to be 3.5% which is much higher than reported. CONCLUSION: With such frequency in the Indian population, there is a great need for awareness among clinicians as well as the general public regarding the autoinflammatory disorder. To the best of our knowledge, this is the first and most comprehensive population scale genetic epidemiological study being reported from India.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA