Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 20 de 72
Filtrar
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38632951

RESUMEN

In cancer genomics, variant calling has advanced, but traditional mean accuracy evaluations are inadequate for biomarkers like tumor mutation burden, which vary significantly across samples, affecting immunotherapy patient selection and threshold settings. In this study, we introduce TMBstable, an innovative method that dynamically selects optimal variant calling strategies for specific genomic regions using a meta-learning framework, distinguishing it from traditional callers with uniform sample-wide strategies. The process begins with segmenting the sample into windows and extracting meta-features for clustering, followed by using a pre-trained meta-model to select suitable algorithms for each cluster, thereby addressing strategy-sample mismatches, reducing performance fluctuations and ensuring consistent performance across various samples. We evaluated TMBstable using both simulated and real non-small cell lung cancer and nasopharyngeal carcinoma samples, comparing it with advanced callers. The assessment, focusing on stability measures, such as the variance and coefficient of variation in false positive rate, false negative rate, precision and recall, involved 300 simulated and 106 real tumor samples. Benchmark results showed TMBstable's superior stability with the lowest variance and coefficient of variation across performance metrics, highlighting its effectiveness in analyzing the counting-based biomarker. The TMBstable algorithm can be accessed at https://github.com/hello-json/TMBstable for academic usage only.


Asunto(s)
Carcinoma de Pulmón de Células no Pequeñas , Neoplasias Pulmonares , Humanos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Genómica/métodos , Genoma , Algoritmos
2.
Genome Res ; 32(10): 1918-1929, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-36220609

RESUMEN

Extensive evidence indicates that the pathobiological processes of a complex disease are associated with perturbation in specific neighborhoods of the human protein-protein interaction (PPI) network (also known as the interactome), often referred to as the disease module. Many computational methods have been developed to integrate the interactome and omics profiles to extract context-dependent disease modules. Yet, existing methods all have fundamental limitations in terms of rigor and/or efficiency. Here, we developed a statistical physics approach based on the random-field Ising model (RFIM) for disease module detection, which is both mathematically rigorous and computationally efficient. We applied our RFIM approach to genome-wide association studies (GWAS) of ten complex diseases to examine its performance for disease module detection. We found that our RFIM approach outperforms existing methods in terms of computational efficiency, connectivity of disease modules, and robustness to the interactome incompleteness.


Asunto(s)
Estudio de Asociación del Genoma Completo , Mapas de Interacción de Proteínas , Humanos , Estudio de Asociación del Genoma Completo/métodos , Física , Algoritmos
3.
Brief Bioinform ; 23(5)2022 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-36056740

RESUMEN

Copy number variation (CNV) is a class of key biomarkers in many complex traits and diseases. Detecting CNV from sequencing data is a substantial bioinformatics problem and a standard requirement in clinical practice. Although many proposed CNV detection approaches exist, the core statistical model at their foundation is weakened by two critical computational issues: (i) identifying the optimal setting on the sliding window and (ii) correcting for bias and noise. We designed a statistical process model to overcome these limitations by calculating regional read depths via an exponentially weighted moving average strategy. A one-run detection of CNVs of various lengths is then achieved by a dynamic sliding window, whose size is self-adopted according to the weighted averages. We also designed a novel bias/noise reduction model, accompanied by the moving average, which can handle complicated patterns and extend training data. This model, called PEcnv, accurately detects CNVs ranging from kb-scale to chromosome-arm level. The model performance was validated with simulation samples and real samples. Comparative analysis showed that PEcnv outperforms current popular approaches. Notably, PEcnv provided considerable advantages in detecting small CNVs (1 kb-1 Mb) in panel sequencing data. Thus, PEcnv fills the gap left by existing methods focusing on large CNVs. PEcnv may have broad applications in clinical testing where panel sequencing is the dominant strategy. Availability and implementation: Source code is freely available at https://github.com/Sherwin-xjtu/PEcnv.


Asunto(s)
Variaciones en el Número de Copia de ADN , Secuenciación de Nucleótidos de Alto Rendimiento , Algoritmos , Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos
4.
Cereb Cortex ; 33(3): 811-822, 2023 01 05.
Artículo en Inglés | MEDLINE | ID: mdl-35253859

RESUMEN

Nonsuicidal self-injury (NSSI) generally occurs in youth and probably progresses to suicide. An examination of cortical thickness differences (ΔCT) between NSSI individuals and controls is crucial to investigate potential neurobiological correlates. Notably, ΔCT are influenced by specific genetic factors, and a large proportion of cortical thinning is associated with the expression of genes that overlap in astrocytes and pyramidal cells. However, in NSSI youth, the mechanisms underlying the relations between the genetic and cell type-specific transcriptional signatures to ΔCT are unclear. Here, we studied the genetic association of ΔCT in NSSI youth by performing a partial least-squares regression (PLSR) analysis of gene expression data and 3D-T1 brain images of 45 NSSI youth and 75 controls. We extracted the top-10 Gene Ontology terms for the enrichment results of upregulated PLS component 1 genes related to ΔCT to conduct the cell-type classification and enrichment analysis. Enrichment of cell type-specific genes shows that cellular component morphogenesis of astrocytes and excitatory neurons accounts for the observed NSSI-specific ΔCT. We validated the main results in independent datasets to verify the robustness and specificity. We concluded that the brain ΔCT is associated with cellular component morphogenesis of astrocytes and excitatory neurons in NSSI youth.


Asunto(s)
Astrocitos , Conducta Autodestructiva , Humanos , Adolescente , Encéfalo , Neuronas , Morfogénesis
5.
Acta Radiol ; 65(5): 489-498, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38644751

RESUMEN

BACKGROUND: The grading of adult isocitrate dehydrogenase (IDH)-mutant astrocytomas is a crucial prognostic factor. PURPOSE: To investigate the value of conventional magnetic resonance imaging (MRI) features and apparent diffusion coefficient (ADC) in the grading of adult IDH-mutant astrocytomas, and to analyze the correlation between ADC and the Ki-67 proliferation index. MATERIAL AND METHODS: The clinical and MRI data of 82 patients with adult IDH-mutant astrocytoma who underwent surgical resection and molecular genetic testing with IDH and 1p/19q were retrospectively analyzed. The conventional MRI features, ADCmin, ADCmean, and nADC of the tumors were compared using the Kruskal-Wallis single factor ANOVA and chi-square tests. Receiver operating characteristic (ROC) curves were drawn to evaluate conventional MRI and ADC accuracy in differentiating tumor grades. Pearson correlation analysis was performed to determine the correlation between ADC and the Ki-67 proliferation index. RESULTS: The difference in enhancement, ADCmin, ADCmean, and nADC among WHO grade 2, 3, and 4 tumors was statistically significant (all P <0.05). ADCmin showed the preferable diagnostic accuracy for grading WHO grade 2 and 3 tumors (AUC=0.724, sensitivity=63.4%, specificity=80%, positive predictive value (PPV)=62.0%; negative predictive value (NPV)=82.5%), and distinguishing grade 3 from grade 4 tumors (AUC=0.764, sensitivity=70%, specificity=76.2%, PPV=75.0%, NPV=71.4%). Enhancement + ADC model showed an optimal predictive accuracy (grade 2 vs. 3: AUC = 0.759; grade 3 vs. 4: AUC = 0.799). The Ki-67 proliferation index was negatively correlated with ADCmin, ADCmean, and nADC (all P <0.05), and positively correlated with tumor grade. CONCLUSION: Conventional MRI features and ADC are valuable to predict pathological grading of adult IDH-mutant astrocytomas.


Asunto(s)
Astrocitoma , Neoplasias Encefálicas , Imagen de Difusión por Resonancia Magnética , Isocitrato Deshidrogenasa , Antígeno Ki-67 , Clasificación del Tumor , Humanos , Astrocitoma/diagnóstico por imagen , Astrocitoma/genética , Astrocitoma/patología , Masculino , Femenino , Isocitrato Deshidrogenasa/genética , Antígeno Ki-67/metabolismo , Adulto , Persona de Mediana Edad , Neoplasias Encefálicas/diagnóstico por imagen , Neoplasias Encefálicas/genética , Neoplasias Encefálicas/patología , Estudios Retrospectivos , Imagen de Difusión por Resonancia Magnética/métodos , Anciano , Mutación , Proliferación Celular , Adulto Joven , Sensibilidad y Especificidad
6.
Respir Res ; 24(1): 63, 2023 Feb 26.
Artículo en Inglés | MEDLINE | ID: mdl-36842969

RESUMEN

BACKGROUND: Asthma is a heterogeneous disease with high morbidity. Advancement in high-throughput multi-omics approaches has enabled the collection of molecular assessments at different layers, providing a complementary perspective of complex diseases. Numerous computational methods have been developed for the omics-based patient classification or disease outcome prediction. Yet, a systematic benchmarking of those methods using various combinations of omics data for the prediction of asthma development is still lacking. OBJECTIVE: We aimed to investigate the computational methods in disease status prediction using multi-omics data. METHOD: We systematically benchmarked 18 computational methods using all the 63 combinations of six omics data (GWAS, miRNA, mRNA, microbiome, metabolome, DNA methylation) collected in The Vitamin D Antenatal Asthma Reduction Trial (VDAART) cohort. We evaluated each method using standard performance metrics for each of the 63 omics combinations. RESULTS: Our results indicate that overall Logistic Regression, Multi-Layer Perceptron, and MOGONET display superior performance, and the combination of transcriptional, genomic and microbiome data achieves the best prediction. Moreover, we find that including the clinical data can further improve the prediction performance for some but not all the omics combinations. CONCLUSIONS: Specific omics combinations can reach the optimal prediction of asthma development in children. And certain computational methods showed superior performance than other methods.


Asunto(s)
Asma , MicroARNs , Embarazo , Humanos , Femenino , Niño , Benchmarking , Genómica/métodos , Asma/diagnóstico , Asma/epidemiología , Asma/genética , Pronóstico
7.
Epidemiol Infect ; 151: e125, 2023 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-37469289

RESUMEN

Varicella vaccination is optional and requires self-payment. On 1 December 2018, Wuxi City launched a free varicella vaccination program for children. This study aimed to evaluate the changes in varicella incidence before and after the implementation of the policy. The data were obtained from official information systems and statistical yearbooks. We divided the period into chargeable (January 2017 to November 2018) and free (December 2018 to December 2021) periods. Interrupt time series analysis was used to conduct a generalised least-squares regression analysis for the two periods. A total of 51,071 varicella cases were reported between January 2017 and December 2021. After the implementation of the policy, there was a statistically significant decrease in the incidence of varicella (ß2 = -0.140, P = 0.017), and the slope of the incidence also decreased by 0.012 (P = 0.015). Following policy implementation, the incidence decreased in all age groups, with the largest decline observed among children aged 8-14 years (ß2 = -1.109, P = 0.009), followed by children aged ≤7 years (ß2 = -0.894, P = 0.013). Our study found a significant reduction in the incidence of varicella in the total population after the introduction of free varicella vaccination in Wuxi City.


Asunto(s)
Varicela , Niño , Humanos , Lactante , Varicela/epidemiología , Varicela/prevención & control , Análisis de Series de Tiempo Interrumpido , Incidencia , Vacunación , China/epidemiología , Políticas , Vacuna contra la Varicela
8.
BMC Pulm Med ; 23(1): 115, 2023 Apr 11.
Artículo en Inglés | MEDLINE | ID: mdl-37041558

RESUMEN

BACKGROUND: Chronic obstructive pulmonary disease (COPD) is a highly morbid and heterogenous disease. While COPD is defined by spirometry, many COPD characteristics are seen in cigarette smokers with normal spirometry. The extent to which COPD and COPD heterogeneity is captured in omics of lung tissue is not known. METHODS: We clustered gene expression and methylation data in 78 lung tissue samples from former smokers with normal lung function or severe COPD. We applied two integrative omics clustering methods: (1) Similarity Network Fusion (SNF) and (2) Entropy-Based Consensus Clustering (ECC). RESULTS: SNF clusters were not significantly different by the percentage of COPD cases (48.8% vs. 68.6%, p = 0.13), though were different according to median forced expiratory volume in one second (FEV1) % predicted (82 vs. 31, p = 0.017). In contrast, the ECC clusters showed stronger evidence of separation by COPD case status (48.2% vs. 81.8%, p = 0.013) and similar stratification by median FEV1% predicted (82 vs. 30.5, p = 0.0059). ECC clusters using both gene expression and methylation were identical to the ECC clustering solution generated using methylation data alone. Both methods selected clusters with differentially expressed transcripts enriched for interleukin signaling and immunoregulatory interactions between lymphoid and non-lymphoid cells. CONCLUSIONS: Unsupervised clustering analysis from integrated gene expression and methylation data in lung tissue resulted in clusters with modest concordance with COPD, though were enriched in pathways potentially contributing to COPD-related pathology and heterogeneity.


Asunto(s)
Enfermedad Pulmonar Obstructiva Crónica , Fumar , Humanos , Pulmón , Volumen Espiratorio Forzado , Análisis por Conglomerados
9.
Proc Biol Sci ; 289(1978): 20220678, 2022 07 13.
Artículo en Inglés | MEDLINE | ID: mdl-35858052

RESUMEN

Collisions between fast-moving objects often cause severe damage, but collision avoidance mechanisms of fast-moving animals remain understudied. Particularly, birds can fly fast and often in large groups, raising the question of how individuals avoid in-flight collisions that are potentially lethal. We tested the collision-avoidance hypothesis, which proposes that conspicuously contrasting ventral wings are visual signals that help birds to avoid collisions. We scored the ventral wing contrasts for a global dataset of 1780 bird species. Phylogenetic comparative analyses showed that larger species had more contrasting ventral wings than smaller species, and that in larger species, colonial breeders had more contrasting ventral wings than non-colonial breeders. Evidently, larger species have lower manoeuvrability than smaller species, and colonial-breeding species frequently encounter con- and heterospecifics, increasing their risk of in-flight collisions. Thus, more contrasting ventral wing patterns in these species are a sensory mechanism that facilitates collision avoidance.


Asunto(s)
Vuelo Animal , Alas de Animales , Animales , Fenómenos Biomecánicos , Aves , Filogenia
10.
Neurobiol Learn Mem ; 191: 107620, 2022 05.
Artículo en Inglés | MEDLINE | ID: mdl-35398514

RESUMEN

BACKGROUND: Longitudinal studies reported that some elderly people with normal cognition (NC) converted to mild cognitive impairment (MCI), and some remained normal state (NC_S). The underlying factor for this difference conversion of NC is worthy of exploration METHODS: Eighty-three NC participants were tracked for eight years. Thirty participants transitioned from NC to MCI (NC_MCI). The remaining 53 participants retained an NC_S. The structural brain features and genetic expression of the 83 NC participants were obtained. We applied weighted gene co-expression network analysis (WGCNA) to inquire into the co-expression network of those. Mediator effect analysis of regulatory roles was conducted to inquire into the associations between brain measures, expression values, and clinical scores. RESULTS: The main results are: 1) 20 brain features and 740 gene expression had significant differences between the two groups, 2) one module including 187 genes had the most correlation with cortical thickness of left superior temporal sulcus (L.STS), 3) NFKBIA and RARA genes were the top two genes that made the greatest contribution to L.STS thickness, and 4) mediating effect was found between the L.STS thickness, the NFKBIA and RARA expression levels, and clinical scores. CONCLUSION: Our results provide a theoretical foundation based on gene expression and brain imaging for the factors of NC with different outcomes.


Asunto(s)
Enfermedad de Alzheimer , Disfunción Cognitiva , Anciano , Enfermedad de Alzheimer/genética , Encéfalo/diagnóstico por imagen , Cognición , Disfunción Cognitiva/diagnóstico por imagen , Disfunción Cognitiva/genética , Humanos , Imagen por Resonancia Magnética , Lóbulo Temporal/diagnóstico por imagen
11.
Acta Pharmacol Sin ; 43(9): 2429-2438, 2022 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-35110698

RESUMEN

Synthetic glucocorticoids (GCs) have been widely used in the treatment of a broad range of inflammatory diseases, but their clinic use is limited by undesired side effects such as metabolic disorders, osteoporosis, skin and muscle atrophies, mood disorders and hypothalamic-pituitary-adrenal (HPA) axis suppression. Selective glucocorticoid receptor modulators (SGRMs) are expected to have promising anti-inflammatory efficacy but with fewer side effects caused by GCs. Here, we reported HT-15, a prospective SGRM discovered by structure-based virtual screening (VS) and bioassays. HT-15 can selectively act on the NF-κB/AP1-mediated transrepression function of glucocorticoid receptor (GR) and repress the expression of pro-inflammation cytokines (i.e., IL-1ß, IL-6, COX-2, and CCL-2) as effectively as dexamethasone (Dex). Compared with Dex, HT-15 shows less transactivation potency that is associated with the main adverse effects of synthetic GCs, and no cross activities with other nuclear receptors. Furthermore, HT-15 exhibits very weak inhibition on the ratio of OPG/RANKL. Therefore, it may reduce the side effects induced by normal GCs. The bioactive compound HT-15 can serve as a starting point for the development of novel therapeutics for high dose or long-term anti-inflammatory treatment.


Asunto(s)
Glucocorticoides , Receptores de Glucocorticoides , Antiinflamatorios/farmacología , Bioensayo , Glucocorticoides/farmacología , Estudios Prospectivos , Receptores de Glucocorticoides/metabolismo
12.
BMC Med Inform Decis Mak ; 21(Suppl 2): 90, 2021 07 30.
Artículo en Inglés | MEDLINE | ID: mdl-34330244

RESUMEN

BACKGROUND: Transformer is an attention-based architecture proven the state-of-the-art model in natural language processing (NLP). To reduce the difficulty of beginning to use transformer-based models in medical language understanding and expand the capability of the scikit-learn toolkit in deep learning, we proposed an easy to learn Python toolkit named transformers-sklearn. By wrapping the interfaces of transformers in only three functions (i.e., fit, score, and predict), transformers-sklearn combines the advantages of the transformers and scikit-learn toolkits. METHODS: In transformers-sklearn, three Python classes were implemented, namely, BERTologyClassifier for the classification task, BERTologyNERClassifier for the named entity recognition (NER) task, and BERTologyRegressor for the regression task. Each class contains three methods, i.e., fit for fine-tuning transformer-based models with the training dataset, score for evaluating the performance of the fine-tuned model, and predict for predicting the labels of the test dataset. transformers-sklearn is a user-friendly toolkit that (1) Is customizable via a few parameters (e.g., model_name_or_path and model_type), (2) Supports multilingual NLP tasks, and (3) Requires less coding. The input data format is automatically generated by transformers-sklearn with the annotated corpus. Newcomers only need to prepare the dataset. The model framework and training methods are predefined in transformers-sklearn. RESULTS: We collected four open-source medical language datasets, including TrialClassification for Chinese medical trial text multi label classification, BC5CDR for English biomedical text name entity recognition, DiabetesNER for Chinese diabetes entity recognition and BIOSSES for English biomedical sentence similarity estimation. In the four medical NLP tasks, the average code size of our script is 45 lines/task, which is one-sixth the size of transformers' script. The experimental results show that transformers-sklearn based on pretrained BERT models achieved macro F1 scores of 0.8225, 0.8703 and 0.6908, respectively, on the TrialClassification, BC5CDR and DiabetesNER tasks and a Pearson correlation of 0.8260 on the BIOSSES task, which is consistent with the results of transformers. CONCLUSIONS: The proposed toolkit could help newcomers address medical language understanding tasks using the scikit-learn coding style easily. The code and tutorials of transformers-sklearn are available at https://doi.org/10.5281/zenodo.4453803 . In future, more medical language understanding tasks will be supported to improve the applications of transformers_sklearn.


Asunto(s)
Multilingüismo , Procesamiento de Lenguaje Natural , Humanos , Lenguaje
13.
Bioinformatics ; 35(10): 1777-1779, 2019 05 15.
Artículo en Inglés | MEDLINE | ID: mdl-30329012

RESUMEN

SUMMARY: Protein-protein interactions (PPIs) have been regarded as an attractive emerging class of therapeutic targets for the development of new treatments. Computational approaches, especially molecular docking, have been extensively employed to predict the binding structures of PPI-inhibitors or discover novel small molecule PPI inhibitors. However, due to the relatively 'undruggable' features of PPI interfaces, accurate predictions of the binding structures for ligands towards PPI targets are quite challenging for most docking algorithms. Here, we constructed a non-redundant pose ranking benchmark dataset for small-molecule PPI inhibitors, which contains 900 binding poses for 184 protein-ligand complexes. Then, we evaluated the performance of MM/PB(GB)SA approaches to identify the correct binding poses for PPI inhibitors, including two Prime MM/GBSA procedures from the Schrödinger suite and seven different MM/PB(GB)SA procedures from the Amber package. Our results showed that MM/PBSA outperformed the Glide SP scoring function (success rate of 58.6%) and MM/GBSA in most cases, especially the PB3 procedure which could achieve an overall success rate of ∼74%. Moreover, the GB6 procedure (success rate of 68.9%) performed much better than the other MM/GBSA procedures, highlighting the excellent potential of the GBNSR6 implicit solvation model for pose ranking. Finally, we developed the webserver of Fast Amber Rescoring for PPI Inhibitors (farPPI), which offers a freely available service to rescore the docking poses for PPI inhibitors by using the MM/PB(GB)SA methods. AVAILABILITY AND IMPLEMENTATION: farPPI web server is freely available at http://cadd.zju.edu.cn/farppi/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Proteínas/química , Algoritmos , Sitios de Unión , Ligandos , Simulación del Acoplamiento Molecular , Unión Proteica , Programas Informáticos
14.
Phys Chem Chem Phys ; 22(10): 5487-5499, 2020 Mar 14.
Artículo en Inglés | MEDLINE | ID: mdl-32101223

RESUMEN

Ubiquitin specific protease 7 (USP7) has attracted increasing attention because of its multifaceted roles in different tumor types. The crystal structures of USP7-inhibitor complexes resolved recently provide reliable models for computational structure-based drug design (SBDD) towards USP7. How to accurately estimate USP7-ligand binding affinity is quite critical to guarantee the reliability of SBDD. In this study, we assessed the reliability of multiple computational methods to the binding affinity prediction for a series of USP7 inhibitors with the pyrimidinone scaffold, including molecular docking scoring, MM/PB(GB)SA, and umbrella sampling (US). It was found that the accuracy of the evaluated computational methods for binding affinity prediction follows the order: US-based method > MM/PB(GB)SA > Glide XP scoring. The calculation results demonstrate that incorporating protein flexibility through induced-fit docking or ensemble docking cannot improve the performance of the Glide scoring based on rigid-receptor docking. For the MM/PB(GB)SA methods, the choice of the protein structure and the calculation procedure has a marked impact on the predictions. More importantly, we discovered for the first time that there are significant differences in the dissociation pathways of strong-binding inhibitors and weak-binding inhibitors of USP7, which may be used as a new criterion to judge whether an inhibitor is a strong binder or not. It is expected that our work can provide valuable guidance on the design and discovery of potent USP7 inhibitors.


Asunto(s)
Química Computacional , Inhibidores Enzimáticos/metabolismo , Pirimidinonas/química , Peptidasa Específica de Ubiquitina 7/metabolismo , Diseño de Fármacos , Inhibidores Enzimáticos/química , Unión Proteica
15.
BMC Med Inform Decis Mak ; 20(Suppl 3): 122, 2020 07 09.
Artículo en Inglés | MEDLINE | ID: mdl-32646415

RESUMEN

BACKGROUND: The increasing global cancer incidence corresponds to serious health impact in countries worldwide. Knowledge-powered health system in different languages would enhance clinicians' healthcare practice, patients' health management and public health literacy. High-quality corpus containing cancer information is the necessary foundation of cancer education. Massive non-structural information resources exist in clinical narratives, electronic health records (EHR) etc. They can only be used for training AI models after being transformed into structured corpus. However, the scarcity of multilingual cancer corpus limits the intelligent processing, such as machine translation in medical scenarios. Thus, we created the cancer specific cross-lingual corpus and open it to the public for academic use. METHODS: Aiming to build an English-Chinese cancer parallel corpus, we developed a workflow of seven steps including data retrieval, data parsing, data processing, corpus implementation, assessment verification, corpus release, and application. We applied the workflow to a cross-lingual, comprehensive and authoritative cancer information resource, PDQ (Physician Data Query). We constructed, validated and released the parallel corpus named as ECCParaCorp, made it openly accessible online. RESULTS: The proposed English-Chinese Cancer Parallel Corpus (ECCParaCorp) consists of 6685 aligned text pairs in Xml, Excel, Csv format, containing 5190 sentence pairs, 1083 phrase pairs and 412 word pairs, which involved information of 6 cancers including breast cancer, liver cancer, lung cancer, esophageal cancer, colorectal cancer, and stomach cancer, and 3 cancer themes containing cancer prevention, screening, and treatment. All data in the parallel corpus are online, available for users to browse and download ( http://www.phoc.org.cn/ECCParaCorp/ ). CONCLUSIONS: ECCParaCorp is a parallel corpus focused on cancer in a cross-lingual form, which is openly accessible. It would make up the imbalance of scarce multilingual corpus resources, bridge the gap between human readable information and machine understanding data resources, and would contribute to intelligent technology application as a preparatory data foundation e.g. cancer-related machine translation, cancer system development towards medical education, and disease-oriented knowledge extraction.


Asunto(s)
Multilingüismo , Neoplasias , Humanos , Almacenamiento y Recuperación de la Información , Lenguaje , Unified Medical Language System
16.
J Chem Inf Model ; 59(2): 842-857, 2019 02 25.
Artículo en Inglés | MEDLINE | ID: mdl-30658039

RESUMEN

Androgen receptor (AR), as a member of the nuclear receptor (NR) superfamily, regulates the gene transcription in response to the sequential binding of diverse agonists and coactivators. Great progress has been made in studies on the pharmacology and structure of AR, but the atomic level mechanism of the bidirectional communications between the ligand-binding pocket (LBP) and the activation function-2 (AF2) region of AR remains poorly understood. Therefore, in this study, molecular dynamics (MD) simulations and free energy calculations were carried out to explore the interactions among water, agonist (DHT) or antagonist (HFT), AR, and coactivator (SRC3). Upon the binding of an agonist (DHT) or antagonist (HFT), the LBP structure would transform to the agonistic or antagonistic state, and the conformational changes of the LBP would regulate the structure of the AF2 surface. As a result, the binding of the androgen DHT could promote the recruitment of the coactivator SRC3 to the AF2, and on the contrary, the binding of the antagonist HFT would induce a perturbation to the shape of the AF2 and then weaken its accommodating capability of the coactivators with the LXXLL motif. The simulation results illustrated that the DHT-AR binding affinity was enhanced by the association of the coactivator SRC3, which would reduce the conformational fluctuation of the AR-LBD and expand the size of the AR LBP. On the other hand, the coactivator-to-HFT allosteric pathway, which involves the SRC3, helix 3 (H3), helix 4 (H4), the loop (L1-3) between helix 1 (H1) and H3, and HFT, was characterized. The HFT's skewness and different interactions between HFT and the LBP were observed in the SRC3-present AR. The mutual communications between the AF2 surface and LBP, together with the processes involving the interplay of the ligand binding and coactivator recruitment events, would help in understanding the association of coactivators and rationally develop potent drugs to inhibit the activity of AR.


Asunto(s)
Simulación de Dinámica Molecular , Receptores Androgénicos/química , Receptores Androgénicos/metabolismo , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Sitios de Unión , Ligandos , Coactivador 3 de Receptor Nuclear/metabolismo , Unión Proteica , Termodinámica
17.
Phys Chem Chem Phys ; 21(45): 25276-25289, 2019 Dec 07.
Artículo en Inglés | MEDLINE | ID: mdl-31701109

RESUMEN

As a member of the bromodomain and extra terminal domain (BET) protein family, bromodomain-containing protein 4 (BRD4) is an epigenetic reader and can recognize acetylated lysine residues in histones. BRD4 has been regarded as an essential drug target for cancers, inflammatory diseases and acute heart failure, and therefore the discovery of potent BRD4 inhibitors with novel scaffolds is highly desirable. In this study, the crystalline water molecules in BRD4 involved in ligand binding were analyzed first, and the simulation results suggest that several conserved crystalline water molecules are quite essential to keep the stability of the crystalline water network and therefore they need to be reserved in structure-based drug design. Then, a docking-based virtual screening workflow with the consideration of the conserved crystalline water network in the binding pocket was utilized to identify the potential inhibitors of BRD4. The in vitro fluorescence resonance energy transfer (HTRF) binding assay illustrates that 4 hits have good inhibitory activity against BRD4 in the micromolar regime, including three compounds with IC50 values below 5 µM and one below 1 µM (0.37 µM). The structural analysis demonstrates that three active compounds possess novel scaffolds. Moreover, the interaction patterns between the hits and BRD4 were characterized by molecular dynamics simulations and binding free energy calculations, and then several suggestions for the further optimization of these hits were proposed.


Asunto(s)
Simulación del Acoplamiento Molecular , Proteínas Nucleares/química , Factores de Transcripción/química , Agua/química , Proteínas de Ciclo Celular , Cristalización , Transferencia Resonante de Energía de Fluorescencia , Humanos , Simulación de Dinámica Molecular , Proteínas Nucleares/antagonistas & inhibidores , Factores de Transcripción/antagonistas & inhibidores
18.
Nat Ecol Evol ; 8(1): 22-31, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-37974003

RESUMEN

Previous studies suggested that microbial communities can harbour keystone species whose removal can cause a dramatic shift in microbiome structure and functioning. Yet, an efficient method to systematically identify keystone species in microbial communities is still lacking. Here we propose a data-driven keystone species identification (DKI) framework based on deep learning to resolve this challenge. Our key idea is to implicitly learn the assembly rules of microbial communities from a particular habitat by training a deep-learning model using microbiome samples collected from this habitat. The well-trained deep-learning model enables us to quantify the community-specific keystoneness of each species in any microbiome sample from this habitat by conducting a thought experiment on species removal. We systematically validated this DKI framework using synthetic data and applied DKI to analyse real data. We found that those taxa with high median keystoneness across different communities display strong community specificity. The presented DKI framework demonstrates the power of machine learning in tackling a fundamental problem in community ecology, paving the way for the data-driven management of complex microbial communities.


Asunto(s)
Aprendizaje Profundo , Microbiota , Aprendizaje Automático
19.
Micromachines (Basel) ; 15(1)2024 Jan 19.
Artículo en Inglés | MEDLINE | ID: mdl-38276851

RESUMEN

Titanium alloy components often experience damage from impact loads during usage, which makes improving the mechanical properties of TC4 titanium alloys crucial. This paper investigates the influence of laser scanning irradiation on the tensile properties of thin titanium alloy sheets. Results indicate that the tensile strength of thin titanium alloy sheets exhibits a trend of initial increase followed by a decrease. Different levels of enhancement are observed in the elongation at break of a cross-section. Optimal improvement in the elongation at break is achieved when the laser fluence is around 8 J/cm2, while the maximum increase in tensile strength occurs at approximately 10 J/cm2. Using femtosecond laser surface irradiation, this study compares the maximum enhancement in the tensile strength of titanium alloy base materials, which is approximately 8.54%, and the maximum increase in elongation at break, which reaches 25.61%. In addition, the results verify that cracks in tensile fractures of TC4 start from the middle, while laser-induced fracture cracks occur from both ends.

20.
Nat Commun ; 15(1): 2406, 2024 Mar 16.
Artículo en Inglés | MEDLINE | ID: mdl-38493186

RESUMEN

Microbial interactions can lead to different colonization outcomes of exogenous species, be they pathogenic or beneficial in nature. Predicting the colonization of exogenous species in complex communities remains a fundamental challenge in microbial ecology, mainly due to our limited knowledge of the diverse mechanisms governing microbial dynamics. Here, we propose a data-driven approach independent of any dynamics model to predict colonization outcomes of exogenous species from the baseline compositions of microbial communities. We systematically validate this approach using synthetic data, finding that machine learning models can predict not only the binary colonization outcome but also the post-invasion steady-state abundance of the invading species. Then we conduct colonization experiments for commensal gut bacteria species Enterococcus faecium and Akkermansia muciniphila in hundreds of human stool-derived in vitro microbial communities, confirming that the data-driven approaches can predict the colonization outcomes in experiments. Furthermore, we find that while most resident species are predicted to have a weak negative impact on the colonization of exogenous species, strongly interacting species could significantly alter the colonization outcomes, e.g., Enterococcus faecalis inhibits the invasion of E. faecium invasion. The presented results suggest that the data-driven approaches are powerful tools to inform the ecology and management of microbial communities.


Asunto(s)
Enterococcus faecium , Microbiota , Humanos , Heces/microbiología , Interacciones Microbianas , Enterococcus faecalis
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda