Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Am Chem Soc ; 145(51): 28284-28295, 2023 Dec 27.
Artigo em Inglês | MEDLINE | ID: mdl-38090755

RESUMO

We construct a data set of metal-organic framework (MOF) linkers and employ a fine-tuned GPT assistant to propose MOF linker designs by mutating and modifying the existing linker structures. This strategy allows the GPT model to learn the intricate language of chemistry in molecular representations, thereby achieving an enhanced accuracy in generating linker structures compared with its base models. Aiming to highlight the significance of linker design strategies in advancing the discovery of water-harvesting MOFs, we conducted a systematic MOF variant expansion upon state-of-the-art MOF-303 utilizing a multidimensional approach that integrates linker extension with multivariate tuning strategies. We synthesized a series of isoreticular aluminum MOFs, termed Long-Arm MOFs (LAMOF-1 to LAMOF-10), featuring linkers that bear various combinations of heteroatoms in their five-membered ring moiety, replacing pyrazole with either thiophene, furan, or thiazole rings or a combination of two. Beyond their consistent and robust architecture, as demonstrated by permanent porosity and thermal stability, the LAMOF series offers a generalizable synthesis strategy. Importantly, these 10 LAMOFs establish new benchmarks for water uptake (up to 0.64 g g-1) and operational humidity ranges (between 13 and 53%), thereby expanding the diversity of water-harvesting MOFs.

2.
ACS Cent Sci ; 9(11): 2161-2170, 2023 Nov 22.
Artigo em Inglês | MEDLINE | ID: mdl-38033801

RESUMO

We leveraged the power of ChatGPT and Bayesian optimization in the development of a multi-AI-driven system, backed by seven large language model-based assistants and equipped with machine learning algorithms, that seamlessly orchestrates a multitude of research aspects in a chemistry laboratory (termed the ChatGPT Research Group). Our approach accelerated the discovery of optimal microwave synthesis conditions, enhancing the crystallinity of MOF-321, MOF-322, and COF-323 and achieving the desired porosity and water capacity. In this system, human researchers gained assistance from these diverse AI collaborators, each with a unique role within the laboratory environment, spanning strategy planning, literature search, coding, robotic operation, labware design, safety inspection, and data analysis. Such a comprehensive approach enables a single researcher working in concert with AI to achieve productivity levels analogous to those of an entire traditional scientific team. Furthermore, by reducing human biases in screening experimental conditions and deftly balancing the exploration and exploitation of synthesis parameters, our Bayesian search approach precisely zeroed in on optimal synthesis conditions from a pool of 6 million within a significantly shortened time scale. This work serves as a compelling proof of concept for an AI-driven revolution in the chemistry laboratory, painting a future where AI becomes an efficient collaborator, liberating us from routine tasks to focus on pushing the boundaries of innovation.

3.
Angew Chem Int Ed Engl ; 62(46): e202311983, 2023 Nov 13.
Artigo em Inglês | MEDLINE | ID: mdl-37798813

RESUMO

We present a new framework integrating the AI model GPT-4 into the iterative process of reticular chemistry experimentation, leveraging a cooperative workflow of interaction between AI and a human researcher. This GPT-4 Reticular Chemist is an integrated system composed of three phases. Each of these utilizes GPT-4 in various capacities, wherein GPT-4 provides detailed instructions for chemical experimentation and the human provides feedback on the experimental outcomes, including both success and failures, for the in-context learning of AI in the next iteration. This iterative human-AI interaction enabled GPT-4 to learn from the outcomes, much like an experienced chemist, by a prompt-learning strategy. Importantly, the system is based on natural language for both development and operation, eliminating the need for coding skills, and thus, make it accessible to all chemists. Our collaboration with GPT-4 Reticular Chemist guided the discovery of an isoreticular series of MOFs, with each synthesis fine-tuned through iterative feedback and expert suggestions. This workflow presents a potential for broader applications in scientific research by harnessing the capability of large language models like GPT-4 to enhance the feasibility and efficiency of research activities.

4.
J Am Chem Soc ; 145(32): 18048-18062, 2023 Aug 16.
Artigo em Inglês | MEDLINE | ID: mdl-37548379

RESUMO

We use prompt engineering to guide ChatGPT in the automation of text mining of metal-organic framework (MOF) synthesis conditions from diverse formats and styles of the scientific literature. This effectively mitigates ChatGPT's tendency to hallucinate information, an issue that previously made the use of large language models (LLMs) in scientific fields challenging. Our approach involves the development of a workflow implementing three different processes for text mining, programmed by ChatGPT itself. All of them enable parsing, searching, filtering, classification, summarization, and data unification with different trade-offs among labor, speed, and accuracy. We deploy this system to extract 26 257 distinct synthesis parameters pertaining to approximately 800 MOFs sourced from peer-reviewed research articles. This process incorporates our ChemPrompt Engineering strategy to instruct ChatGPT in text mining, resulting in impressive precision, recall, and F1 scores of 90-99%. Furthermore, with the data set built by text mining, we constructed a machine-learning model with over 87% accuracy in predicting MOF experimental crystallization outcomes and preliminarily identifying important factors in MOF crystallization. We also developed a reliable data-grounded MOF chatbot to answer questions about chemical reactions and synthesis procedures. Given that the process of using ChatGPT reliably mines and tabulates diverse MOF synthesis information in a unified format while using only narrative language requiring no coding expertise, we anticipate that our ChatGPT Chemistry Assistant will be very useful across various other chemistry subdisciplines.

5.
Proc Natl Acad Sci U S A ; 119(44): e2208975119, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36279463

RESUMO

Randomized experiments are widely used to estimate the causal effects of a proposed treatment in many areas of science, from medicine and healthcare to the physical and biological sciences, from the social sciences to engineering, and from public policy to the technology industry. Here we consider situations where classical methods for estimating the total treatment effect on a target population are considerably biased due to confounding network effects, i.e., the fact that the treatment of an individual may impact its neighbors' outcomes, an issue referred to as network interference or as nonindividualized treatment response. A key challenge in these situations is that the network is often unknown and difficult or costly to measure. We assume a potential outcomes model with heterogeneous additive network effects, encompassing a broad class of network interference sources, including spillover, peer effects, and contagion. First, we characterize the limitations in estimating the total treatment effect without knowledge of the network that drives interference. By contrast, we subsequently develop a simple estimator and efficient randomized design that outputs an unbiased estimate with low variance in situations where one is given access to average historical baseline measurements prior to the experiment. Our solution does not require knowledge of the underlying network structure, and it comes with statistical guarantees for a broad class of models. Due to their ease of interpretation and implementation, and their theoretical guarantees, we believe our results will have significant impact on the design of randomized experiments.


Assuntos
Ensaios Clínicos Controlados Aleatórios como Assunto , Causalidade
6.
BMC Health Serv Res ; 18(1): 678, 2018 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-30176856

RESUMO

BACKGROUND: Record linkage is an important tool for epidemiologists and health planners. Record linkage studies will generally contain some level of residual record linkage error, where individual records are either incorrectly marked as belonging to the same individual, or incorrectly marked as belonging to separate individuals. A key question is whether errors in linkage quality are distributed evenly throughout the population, or whether certain subgroups will exhibit higher rates of error. Previous investigations of this issue have typically compared linked and un-linked records, which can conflate bias caused by record linkage error, with bias caused by missing records (data capture errors). METHODS: Four large administrative datasets were individually de-duplicated, with results compared to an available 'gold-standard' benchmark, allowing us to avoid methodological issues with comparing linked and un-linked records. Results were compared by gender, age, geographic remoteness (major cities, regional or remote) and socioeconomic status. RESULTS: Results varied between datasets, and by sociodemographic characteristic. The most consistent findings were worse linkage quality for younger individuals (seen in all four datasets) and worse linkage quality for those living in remote areas (seen in three of four datasets). The linkage quality within sociodemographic categories varied between datasets, with the associations with linkage error reversed across different datasets due to quirks of the specific data collection mechanisms and data sharing practices. CONCLUSIONS: These results suggest caution should be taken both when linking younger individuals and those in remote areas, and when analysing linked data from these subgroups. Further research is required to determine the ramifications of worse linkage quality in these subpopulations on research outcomes.


Assuntos
Armazenamento e Recuperação da Informação/normas , Registro Médico Coordenado/normas , Classe Social , Adolescente , Adulto , Idoso , Austrália , Benchmarking/normas , Benchmarking/estatística & dados numéricos , Viés , Criança , Pré-Escolar , Cidades , Coleta de Dados/normas , Coleta de Dados/estatística & dados numéricos , Feminino , Humanos , Lactente , Recém-Nascido , Disseminação de Informação , Armazenamento e Recuperação da Informação/estatística & dados numéricos , Masculino , Registro Médico Coordenado/métodos , Pessoa de Meia-Idade , Características de Residência/estatística & dados numéricos , Adulto Jovem
7.
Stud Health Technol Inform ; 253: 91-95, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30147048

RESUMO

Linking information across databases fosters new research in the medical sciences. Recent European privacy regulations recommend encrypting personal identifiers used for linking. Bloom filter based methods are an increasingly popular Record Linkage method. However, basic Bloom filter encodings are prone to cryptographic attacks. Therefore, hardening methods against these attacks are required. In this paper, a new method for such a hardening method for Privacy-preserving Record Linkage (PPRL) technique is presented. By using a Markov chain-based language model of bigrams of identifiers during the encryption, protection against attacks is increased. Based on real-world mortality data, we compare unencrypted and state of the art PPRL methods with the results of the proposed hardening method.


Assuntos
Bases de Dados Factuais , Nomes , Privacidade , Segurança Computacional , Confidencialidade , Humanos , Idioma , Cadeias de Markov , Registro Médico Coordenado
8.
PLoS One ; 13(12): e0208422, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30596661

RESUMO

Checkpoint inhibitor immunotherapies have had major success in treating patients with late-stage cancers, yet the minority of patients benefit. Mutation load and PD-L1 staining are leading biomarkers associated with response, but each is an imperfect predictor. A key challenge to predicting response is modeling the interaction between the tumor and immune system. We begin to address this challenge with a multifactorial model for response to anti-PD-L1 therapy. We train a model to predict immune response in patients after treatment based on 36 clinical, tumor, and circulating features collected prior to treatment. We analyze data from 21 bladder cancer patients using the elastic net high-dimensional regression procedure and, as training set error is a biased and overly optimistic measure of prediction error, we use leave-one-out cross-validation to obtain unbiased estimates of accuracy on held-out patients. In held-out patients, the model explains 79% of the variance in T cell clonal expansion. This predicted immune response is multifactorial, as the variance explained is at most 23% if clinical, tumor, or circulating features are excluded. Moreover, if patients are triaged according to predicted expansion, only 38% of non-durable clinical benefit (DCB) patients need be treated to ensure that 100% of DCB patients are treated. In contrast, using mutation load or PD-L1 staining alone, one must treat at least 77% of non-DCB patients to ensure that all DCB patients receive treatment. Thus, integrative models of immune response may improve our ability to anticipate clinical benefit of immunotherapy.


Assuntos
Antígeno B7-H1/antagonistas & inibidores , Proliferação de Células , Imunoterapia/métodos , Linfócitos do Interstício Tumoral/fisiologia , Modelos Estatísticos , Inibidores de Proteínas Quinases/uso terapêutico , Linfócitos T/fisiologia , Adulto , Anticorpos Monoclonais/uso terapêutico , Anticorpos Monoclonais Humanizados , Antígeno B7-H1/imunologia , Biomarcadores Farmacológicos/análise , Biomarcadores Tumorais/análise , Carcinoma de Células de Transição/tratamento farmacológico , Carcinoma de Células de Transição/imunologia , Carcinoma de Células de Transição/patologia , Proliferação de Células/efeitos dos fármacos , Proliferação de Células/genética , Evolução Clonal/efeitos dos fármacos , Evolução Clonal/genética , Feminino , Humanos , Linfócitos do Interstício Tumoral/efeitos dos fármacos , Masculino , Mutação , Medição de Risco , Linfócitos T/efeitos dos fármacos , Resultado do Tratamento , Neoplasias da Bexiga Urinária/tratamento farmacológico , Neoplasias da Bexiga Urinária/imunologia , Neoplasias da Bexiga Urinária/patologia
9.
BMC Med Inform Decis Mak ; 17(1): 83, 2017 Jun 08.
Artigo em Inglês | MEDLINE | ID: mdl-28595638

RESUMO

BACKGROUND: Integrating medical data using databases from different sources by record linkage is a powerful technique increasingly used in medical research. Under many jurisdictions, unique personal identifiers needed for linking the records are unavailable. Since sensitive attributes, such as names, have to be used instead, privacy regulations usually demand encrypting these identifiers. The corresponding set of techniques for privacy-preserving record linkage (PPRL) has received widespread attention. One recent method is based on Bloom filters. Due to superior resilience against cryptographic attacks, composite Bloom filters (cryptographic long-term keys, CLKs) are considered best practice for privacy in PPRL. Real-world performance of these techniques using large-scale data is unknown up to now. METHODS: Using a large subset of Australian hospital admission data, we tested the performance of an innovative PPRL technique (CLKs using multibit trees) against a gold-standard derived from clear-text probabilistic record linkage. Linkage time and linkage quality (recall, precision and F-measure) were evaluated. RESULTS: Clear text probabilistic linkage resulted in marginally higher precision and recall than CLKs. PPRL required more computing time but 5 million records could still be de-duplicated within one day. However, the PPRL approach required fine tuning of parameters. CONCLUSIONS: We argue that increased privacy of PPRL comes with the price of small losses in precision and recall and a large increase in computational burden and setup time. These costs seem to be acceptable in most applied settings, but they have to be considered in the decision to apply PPRL. Further research on the optimal automatic choice of parameters is needed.


Assuntos
Bases de Dados Factuais/normas , Registro Médico Coordenado/normas , Privacidade , Austrália , Hospitalização/estatística & dados numéricos , Humanos
10.
Stud Health Technol Inform ; 235: 161-165, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28423775

RESUMO

Record linkage (RL) is the process of identifying pairs of records that correspond to the same entity, for example the same patient. The basic approach assigns to each pair of records a similarity weight, and then determines a certain threshold, above which the two records are considered to be a match. Three different RL methods were applied under privacy-preserving conditions on hospital admission data: deterministic RL (DRL), probabilistic RL (PRL), and Bloom filters. The patient characteristics like names were one-way encrypted (DRL, PRL) or transformed to a cryptographic longterm key (Bloom filters). Based on one year of hospital admissions, the data set was split randomly in 30 thousand new and 1,5 million known patients. With the combination of the three RL-methods, a positive predictive value of 83 % (95 %-confidence interval 65 %-94 %) was attained. Thus, the application of the presented combination of RL-methods seem to be suited for other applications of population-based research.


Assuntos
Hospitalização/estatística & dados numéricos , Registro Médico Coordenado/métodos , Privacidade , Alemanha , Hospitais Universitários , Humanos
12.
Proc Natl Acad Sci U S A ; 113(48): E7655-E7662, 2016 11 29.
Artigo em Inglês | MEDLINE | ID: mdl-27856745

RESUMO

In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here, we study the difficult case of networks with discrete weights, where the optimization landscape is very rough even for simple architectures, and provide theoretical and numerical evidence of the existence of rare-but extremely dense and accessible-regions of configurations in the network weight space. We define a measure, the robust ensemble (RE), which suppresses trapping by isolated configurations and amplifies the role of these dense regions. We analytically compute the RE in some exactly solvable models and also provide a general algorithmic scheme that is straightforward to implement: define a cost function given by a sum of a finite number of replicas of the original cost function, with a constraint centering the replicas around a driving assignment. To illustrate this, we derive several powerful algorithms, ranging from Markov Chains to message passing to gradient descent processes, where the algorithms target the robust dense states, resulting in substantial improvements in performance. The weak dependence on the number of precision bits of the weights leads us to conjecture that very similar reasoning applies to more conventional neural networks. Analogous algorithmic schemes can also be applied to other optimization problems.

13.
Pac Symp Biocomput ; : 39-50, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24297532

RESUMO

Advances in experimental techniques resulted in abundant genomic, transcriptomic, epigenomic, and proteomic data that have the potential to reveal critical drivers of human diseases. Complementary algorithmic developments enable researchers to map these data onto protein-protein interaction networks and infer which signaling pathways are perturbed by a disease. Despite this progress, integrating data across different biological samples or patients remains a substantial challenge because samples from the same disease can be extremely heterogeneous. Somatic mutations in cancer are an infamous example of this heterogeneity. Although the same signaling pathways may be disrupted in a cancer patient cohort, the distribution of mutations is long-tailed, and many driver mutations may only be detected in a small fraction of patients. We developed a computational approach to account for heterogeneous data when inferring signaling pathways by sharing information across the samples. Our technique builds upon the prize-collecting Steiner forest problem, a network optimization algorithm that extracts pathways from a protein-protein interaction network. We recover signaling pathways that are similar across all samples yet still reflect the unique characteristics of each biological sample. Leveraging data from related tumors improves our ability to recover the disrupted pathways and reveals patient-specific pathway perturbations in breast cancer.


Assuntos
Algoritmos , Neoplasias/genética , Neoplasias/metabolismo , Mapas de Interação de Proteínas , Transdução de Sinais , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Biologia Computacional , Bases de Dados Genéticas , Receptores ErbB/genética , Receptores ErbB/metabolismo , Feminino , Humanos , Modelos Biológicos , Mutação
14.
J Comput Biol ; 20(2): 124-36, 2013 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-23383998

RESUMO

Signaling and regulatory networks are essential for cells to control processes such as growth, differentiation, and response to stimuli. Although many "omic" data sources are available to probe signaling pathways, these data are typically sparse and noisy. Thus, it has been difficult to use these data to discover the cause of the diseases and to propose new therapeutic strategies. We overcome these problems and use "omic" data to reconstruct simultaneously multiple pathways that are altered in a particular condition by solving the prize-collecting Steiner forest problem. To evaluate this approach, we use the well-characterized yeast pheromone response. We then apply the method to human glioblastoma data, searching for a forest of trees, each of which is rooted in a different cell-surface receptor. This approach discovers both overlapping and independent signaling pathways that are enriched in functionally and clinically relevant proteins, which could provide the basis for new therapeutic strategies. Although the algorithm was not provided with any information about the phosphorylation status of receptors, it identifies a small set of clinically relevant receptors among hundreds present in the interactome.


Assuntos
Algoritmos , Neoplasias Encefálicas/genética , Glioblastoma/genética , Proteínas de Neoplasias/genética , Feromônios/genética , Receptores de Superfície Celular/genética , Saccharomyces cerevisiae/genética , Comunicação Celular , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Modelos Biológicos , Farmacogenética , Mapeamento de Interação de Proteínas/estatística & dados numéricos , Transdução de Sinais
15.
Proc Natl Acad Sci U S A ; 104(15): 6112-7, 2007 Apr 10.
Artigo em Inglês | MEDLINE | ID: mdl-17395721

RESUMO

We show how preferential attachment can emerge in an optimization framework, resolving a long-standing theoretical controversy. We also show that the preferential attachment model so obtained has two novel features, saturation and viability, which have natural interpretations in the underlying network and lead to a power-law degree distribution with exponential cutoff. Moreover, we consider a generalized version of this preferential attachment model with independent saturation and viability, leading to a broader class of power laws again with exponential cutoff. We present a collection of empirical observations from social, biological, physical, and technological networks, for which such degree distributions give excellent fits. We suggest that, in general, optimization models that give rise to preferential attachment with saturation and viability effects form a good starting point for the analysis of many networks.


Assuntos
Interpretação Estatística de Dados , Modelos Teóricos , Internet , Redes e Vias Metabólicas , Distribuições Estatísticas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...