RESUMO
MOTIVATION: Molecular dynamics (MD) simulations have become routine tools for the study of protein dynamics and function. Thanks to faster GPU-based algorithms, atomistic and coarse-grained simulations are being used to explore biological functions over the microsecond timescale, yielding terabytes of data spanning multiple trajectories, thereby extracting relevant protein conformations without losing important information is often challenging. RESULTS: We present MDSubSampler, a Python library and toolkit for a posteriori subsampling of data from multiple trajectories. This toolkit provides access to uniform, random, stratified, weighted sampling, and bootstrapping sampling methods. Sampling can be performed under the constraint of preserving the original distribution of relevant geometrical properties. Possible applications include simulations post-processing, noise reduction, and structures selection for ensemble docking. AVAILABILITY AND IMPLEMENTATION: MDSubSampler is freely available at https://github.com/alepandini/MDSubSampler, along with guidance on installation and tutorials on how it can be used.
Assuntos
Algoritmos , Simulação de Dinâmica Molecular , Conformação ProteicaRESUMO
PURPOSE: Real-world data represents a valuable tool for pregnancy research. However, an algorithmic approach is needed to ascertain pregnancy timings from this complex data. The Clinical Practice Research Datalink (CPRD) GOLD Pregnancy Register, based on UK Primary care data, has therefore proven to be a valuable research tool. The same algorithmic approach was applied to the CPRD Aurum data to generate an equivalent register in the larger database. METHODS: Records of female patients registered with a CPRD Aurum contributing practice between the 1st of January 1987 and the 30th of April 2021 were searched for evidence of pregnancy. The algorithm used to generate the CPRD GOLD Pregnancy Register was redeveloped and applied first to CPRD GOLD and then to CPRD Aurum. The resulting CPRD Aurum Pregnancy Register was validated against the CPRD GOLD register, linked Hospital Episode Statistics (HES) and the Office of National Statistics (ONS) live birth data. RESULTS: There are 16 833 427 pregnancy episodes in the CPRD Aurum Pregnancy Register from 6 724 615 women, more than double the number in CPRD GOLD. The distribution of pregnancy outcome types was comparable between the registers. Across the whole register, there was good concordance between pregnancy episodes found in CPRD Aurum and linked HES. However, both CPRD registers saw a declining number of pregnancy episodes from 2007 onwards, steeper than in HES or the ONS birth data. CONCLUSIONS: A pregnancy register has been created in CPRD Aurum. Changes in antenatal care policies in the UK have led to declining numbers of pregnancies in EHR primary care data. However, the creation of this pregnancy register has tripled the number of patients in the CPRD Pregnancy Registers and will increase the capacity to study pregnancy in CPRD data, particularly rare or emerging exposures, and outcomes.