RESUMEN
Machine learning methods hold the promise to reduce the costs and the failure rates of conventional drug discovery pipelines. This issue is especially pressing for neurodegenerative diseases, where the development of disease-modifying drugs has been particularly challenging. To address this problem, we describe here a machine learning approach to identify small molecule inhibitors of α-synuclein aggregation, a process implicated in Parkinson's disease and other synucleinopathies. Because the proliferation of α-synuclein aggregates takes place through autocatalytic secondary nucleation, we aim to identify compounds that bind the catalytic sites on the surface of the aggregates. To achieve this goal, we use structure-based machine learning in an iterative manner to first identify and then progressively optimize secondary nucleation inhibitors. Our results demonstrate that this approach leads to the facile identification of compounds two orders of magnitude more potent than previously reported ones.
Asunto(s)
Descubrimiento de Drogas , Aprendizaje Automático , Agregado de Proteínas , alfa-Sinucleína , alfa-Sinucleína/antagonistas & inhibidores , alfa-Sinucleína/metabolismo , alfa-Sinucleína/química , Humanos , Descubrimiento de Drogas/métodos , Agregado de Proteínas/efectos de los fármacos , Bibliotecas de Moléculas Pequeñas/farmacología , Bibliotecas de Moléculas Pequeñas/química , Enfermedad de Parkinson/tratamiento farmacológico , Enfermedad de Parkinson/metabolismo , Relación Estructura-ActividadRESUMEN
New scientific understanding is catalyzed by novel technologies that enhance measurement precision, resolution or type, and that provide new tools to test and develop theory. Over the last 50 years, technology has transformed the hydrologic sciences by enabling direct measurements of watershed fluxes (evapotranspiration, streamflow) at time scales and spatial extents aligned with variation in physical drivers. High frequency water quality measurements, increasingly obtained by in situ water quality sensors, are extending that transformation. Widely available sensors for some physical (temperature) and chemical (conductivity, dissolved oxygen) attributes have become integral to aquatic science, and emerging sensors for nutrients, dissolved CO2, turbidity, algal pigments, and dissolved organic matter are now enabling observations of watersheds and streams at time scales commensurate with their fundamental hydrological, energetic, elemental, and biological drivers. Here we synthesize insights from emerging technologies across a suite of applications, and envision future advances, enabled by sensors, in our ability to understand, predict, and restore watershed and stream systems.
Asunto(s)
Hidrología , Ríos , Temperatura , Calidad del AguaRESUMEN
Multiprotein complexes regulate most if not all cellular functions. Elucidating the structure and function of these complex cellular machines is essential for understanding biology. Moreover, multiprotein complexes by themselves constitute powerful reagents as biologics for the prevention and treatment of human diseases. Recombinant production by the baculovirus/insect cell expression system is particularly useful for expressing proteins of eukaryotic origin and their complexes. MultiBac, an advanced baculovirus/insect cell system, has been widely adopted in the last decade to produce multiprotein complexes with many subunits that were hitherto inaccessible, for academic and industrial research and development. The MultiBac system, its development and numerous applications are presented. Future opportunities for utilizing MultiBac to catalyze discovery are outlined.
Asunto(s)
Baculoviridae/metabolismo , Ingeniería de Proteínas/métodos , Proteínas Recombinantes/biosíntesis , Proteínas Virales/biosíntesis , Animales , Baculoviridae/genética , Biología Computacional , Bases de Datos de Proteínas , Descubrimiento de Drogas/métodos , Regulación Viral de la Expresión Génica , Vectores Genéticos , Humanos , Modelos Moleculares , Complejos Multiproteicos , Multimerización de Proteína , Estructura Cuaternaria de Proteína , Subunidades de Proteína , Proteínas Recombinantes/química , Proteínas Recombinantes/genética , Relación Estructura-Actividad , Transcripción Genética , Transfección , Proteínas Virales/química , Proteínas Virales/genéticaRESUMEN
Recently developed measurement technologies can monitor surface water quality almost continuously, creating high-frequency multiparameter time series and raising the question of how best to extract insights from such rich data sets. Here we use spectral analysis to characterize the variability of water quality at the AgrHys observatory (Western France) over time scales ranging from 20 min to 12 years. Three years of daily sampling at the intensively farmed Kervidy-Naizin watershed reveal universal 1/f scaling for all 36 solutes, yielding spectral slopes of 1.05 ± 0.11 (mean ± standard deviation). These 36 solute concentrations show varying degrees of annual cycling, suggesting different controls on watershed export processes. Twelve years of daily samples of SO4, NO3, and dissolved organic carbon (DOC) show that 1/f scaling does not continue at frequencies below 1/year in those constituents, whereas a 12-year daily record of Cl shows a general 1/f trend down to the lowest measurable frequencies. Conversely, approximately 12 months of 20 min NO3 and DOC measurements show that at frequencies higher than 1/day, the spectra of these solutes steepen to slopes of roughly 3, and at time scales shorter than 2-3 h, the spectra flatten to slopes near zero, reflecting analytical noise. These results confirm and extend the recent discovery of universal fractal 1/f scaling in water quality at the relatively pristine Plynlimon watershed in Wales, further demonstrating the importance of advective-dispersive transport mixing in catchments. However, the steeper scaling at subdaily time scales suggests additional short-term damping of solute concentrations, potentially due to in-stream or riparian processes.
Asunto(s)
Agricultura , Elementos Químicos , Fractales , Calidad del Agua , Carbono/análisis , Francia , Nitratos/análisis , Análisis Espectral , Factores de Tiempo , Abastecimiento de AguaRESUMEN
We developed four online interfaces supporting citizen participation in decision-making. We included (1) learning loops (LLs), good practice in decision analysis, and (2) gamification, to enliven an otherwise long and tedious survey. We investigated the effects of these features on drop-out rate, perceived experience, and basic psychological needs (BPNs): autonomy, competence, and relatedness, all from self-determination theory. We also investigated how BPNs and individual causality orientation influence experience of the four interfaces. Answers from 785 respondents, representative of the Swiss German-speaking population in age and gender, provided insightful results. LLs and gamification increased drop-out rate. Experience was better explained by the BPN satisfaction than by the interface, and this was moderated by respondents' causality orientations. LLs increased the challenge, and gamification enhanced the social experience and playfulness. LLs frustrated all three needs, and gamification satisfied relatedness. Autonomy and relatedness both positively influenced the social experience, but competence was negatively correlated with challenge. All observed effects were small. Hence, using gamification for decision-making is questionable, and understanding individual variability is a prerequisite; this study has helped disentangle the diversity of responses to survey design options.
Asunto(s)
Aprendizaje , Autonomía Personal , Satisfacción Personal , Encuestas y Cuestionarios , Juego e Implementos de JuegoRESUMEN
G-quadruplexes (G4s) are nucleic acid secondary structures that form within guanine-rich DNA or RNA sequences. G4 formation can affect chromatin architecture and gene regulation, and has been associated with genomic instability, genetic diseases, and cancer progression. The experimental data produced by the G4-seq experiment provides unprecedented details on G4 formation in the genome. Still, running the experimental protocol on a whole genome is an expensive and time-consuming process. Thus, it is highly desirable to have a computational method to predict G4 formation in new DNA sequences or whole genomes. Here, we present G4detector, a new method based on a convolutional neural network to predict G4s from DNA sequences. On top of the sequence information, we improved prediction accuracy by the addition of RNA secondary structure information. To train and test G4detector, we compiled novel high-throughput benchmarks over multiple species genomes measured by the G4-seq protocol. We show that G4detector outperforms extant methods for the same task on all benchmark datasets, can detect G4s genome-wide with high accuracy, and is able to extrapolate human-trained measurements to various non-human species. The code and benchmarks are publicly available on github.com/OrensteinLab/G4detector.
Asunto(s)
G-Cuádruplex , ADN/química , ADN/genética , Genoma , Redes Neurales de la Computación , ARN/químicaRESUMEN
OBJECTIVE: Standardized face-to-face interviews are widely used in low and middle-income countries to collect data for social science and health research. Such interviews can be long and tedious. In an attempt to improve the respondents' experience of interviews, we developed a concept of gamified interview format by including a game element. Gamification is reported to increase engagement in tasks, but results from rigorously developed research are equivocal, and a theory of gamification is still needed. MATERIALS & METHODS: We evaluated the proposed gamification with a randomized controlled trial based on self-determination theory, specifically on the basic psychological needs theory. In total, 1266 respondents were interviewed. Single and multiple mediation analyses were used to understand the effects of the gamified interview format. RESULTS: Our evaluation showed that the gamification we had developed did not improve the outcome, the experience of the interview reported by respondent. The effect of the gamified interview format depended on the ability of respondents: gamification can be counterproductive if it overburdens the respondents. However, the basic psychological needs theory explained the mechanisms of action of gamification well: feeling competent and related to others improved the reported experience of the interview. CONCLUSION: We emphasize the need to develop context-specific gamification and invite researchers to conduct equivalently rigorous evaluations of gamification in future studies.
Asunto(s)
Teoría Psicológica , Adulto , Femenino , Teoría del Juego , Humanos , India , Entrevistas como Asunto , Masculino , Análisis de Mediación , Autonomía Personal , Población RuralRESUMEN
The recent development of in-situ monitoring devices, such as UV-spectrometers, makes the study of short-term stream chemistry variation relevant, especially the study of diurnal cycles, which are not yet fully understood. Our study is based on high-frequency data from an agricultural catchment (Studienlandschaft Schwingbachtal, Germany). We propose a novel approach, i.e. the combination of cluster analysis and Linear Discriminant Analysis, to mine from these data nitrate behavior patterns. As a result, we observe a seasonality of nitrate diurnal cycles, that differs from the most common cycle seasonality described in the literature, i.e. pre-dawn peaks in spring. Our cycles appear in summer and the maximum and minimum shift to a later time in late summer/autumn. This is observed both for water- and energy-limited years, thus potentially stressing the role of evapotranspiration. This concluding hypothesis on the role of evapotranspiration on nitrate stream concentration, which was obtained through data mining, broadens the perspective on the diurnal cycling of stream nitrate concentrations.
Asunto(s)
Monitoreo del Ambiente , Nitratos/análisis , Periodicidad , Ríos/química , Estaciones del Año , Agua/análisis , Agricultura , Minería de Datos , AlemaniaRESUMEN
High-frequency, in-situ monitoring provides large environmental datasets. These datasets will likely bring new insights in landscape functioning and process scale understanding. However, tailoring data analysis methods is necessary. Here, we detach our analysis from the usual temporal analysis performed in hydrology to determine if it is possible to infer general rules regarding hydrochemistry from available large datasets. We combined a 2-year in-stream nitrate concentration time series (time resolution of 15 min) with concurrent hydrological, meteorological and soil moisture data. We removed the low-frequency variations through low-pass filtering, which suppressed seasonality. We then analyzed the high-frequency variability component using Pareto Density Estimation, which to our knowledge has not been applied to hydrology. The resulting distribution of nitrate concentrations revealed three normally distributed modes: low, medium and high. Studying the environmental conditions for each mode revealed the main control of nitrate concentration: the saturation state of the riparian zone. We found low nitrate concentrations under conditions of hydrological connectivity and dominant denitrifying biological processes, and we found high nitrate concentrations under hydrological recession conditions and dominant nitrifying biological processes. These results generalize our understanding of hydro-biogeochemical nitrate flux controls and bring useful information to the development of nitrogen process-based models at the landscape scale.
Asunto(s)
Bases de Datos Factuales , Monitoreo del Ambiente , Nitratos/análisis , Ríos/químicaRESUMEN
We organized a crowdsourcing experiment in the form of a snapshot sampling campaign to assess the spatial distribution of nitrogen solutes, namely, nitrate, ammonium and dissolved organic nitrogen (DON), in German surface waters. In particular, we investigated (i) whether crowdsourcing is a reasonable sampling method in hydrology and (ii) what the effects of population density, soil humus content and arable land were on actual nitrogen solute concentrations and surface water quality. The statistical analyses revealed a significant correlation between nitrate and arable land (0.46), as well as soil humus content (0.37) but a weak correlation with population density (0.12). DON correlations were weak but significant with humus content (0.14) and arable land (0.13). The mean contribution of DON to total dissolved nitrogen was 22%. Samples were classified as water quality class II or above, following the European Water Framework Directive for nitrate and ammonium (53% and 82%, respectively). Crowdsourcing turned out to be a useful method to assess the spatial distribution of stream solutes, as considerable amounts of samples were collected with comparatively little effort.
RESUMEN
Proteomics research revealed the impressive complexity of eukaryotic proteomes in unprecedented detail. It is now a commonly accepted notion that proteins in cells mostly exist not as isolated entities but exert their biological activity in association with many other proteins, in humans ten or more, forming assembly lines in the cell for most if not all vital functions.(1,2) Knowledge of the function and architecture of these multiprotein assemblies requires their provision in superior quality and sufficient quantity for detailed analysis. The paucity of many protein complexes in cells, in particular in eukaryotes, prohibits their extraction from native sources, and necessitates recombinant production. The baculovirus expression vector system (BEVS) has proven to be particularly useful for producing eukaryotic proteins, the activity of which often relies on post-translational processing that other commonly used expression systems often cannot support.(3) BEVS use a recombinant baculovirus into which the gene of interest was inserted to infect insect cell cultures which in turn produce the protein of choice. MultiBac is a BEVS that has been particularly tailored for the production of eukaryotic protein complexes that contain many subunits.(4) A vital prerequisite for efficient production of proteins and their complexes are robust protocols for all steps involved in an expression experiment that ideally can be implemented as standard operating procedures (SOPs) and followed also by non-specialist users with comparative ease. The MultiBac platform at the European Molecular Biology Laboratory (EMBL) uses SOPs for all steps involved in a multiprotein complex expression experiment, starting from insertion of the genes into an engineered baculoviral genome optimized for heterologous protein production properties to small-scale analysis of the protein specimens produced.(5-8) The platform is installed in an open-access mode at EMBL Grenoble and has supported many scientists from academia and industry to accelerate protein complex research projects.