Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 149
Filtrar
1.
J Med Virol ; 96(5): e29657, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38727035

RESUMO

The H1N1pdm09 virus has been a persistent threat to public health since the 2009 pandemic. Particularly, since the relaxation of COVID-19 pandemic mitigation measures, the influenza virus and SARS-CoV-2 have been concurrently prevalent worldwide. To determine the antigenic evolution pattern of H1N1pdm09 and develop preventive countermeasures, we collected influenza sequence data and immunological data to establish a new antigenic evolution analysis framework. A machine learning model (XGBoost, accuracy = 0.86, area under the receiver operating characteristic curve = 0.89) was constructed using epitopes, physicochemical properties, receptor binding sites, and glycosylation sites as features to predict the antigenic similarity relationships between influenza strains. An antigenic correlation network was constructed, and the Markov clustering algorithm was used to identify antigenic clusters. Subsequently, the antigenic evolution pattern of H1N1pdm09 was analyzed at the global and regional scales across three continents. We found that H1N1pdm09 evolved into around five antigenic clusters between 2009 and 2023 and that their antigenic evolution trajectories were characterized by cocirculation of multiple clusters, low-level persistence of former dominant clusters, and local heterogeneity of cluster circulations. Furthermore, compared with the seasonal H1N1 virus, the potential cluster-transition determining sites of H1N1pdm09 were restricted to epitopes Sa and Sb. This study demonstrated the effectiveness of machine learning methods for characterizing antigenic evolution of viruses, developed a specific model to rapidly identify H1N1pdm09 antigenic variants, and elucidated their evolutionary patterns. Our findings may provide valuable support for the implementation of effective surveillance strategies and targeted prevention efforts to mitigate the impact of H1N1pdm09.


Assuntos
Antígenos Virais , Vírus da Influenza A Subtipo H1N1 , Influenza Humana , Vírus da Influenza A Subtipo H1N1/genética , Vírus da Influenza A Subtipo H1N1/imunologia , Humanos , Influenza Humana/epidemiologia , Influenza Humana/prevenção & controle , Influenza Humana/virologia , Influenza Humana/imunologia , Antígenos Virais/genética , Antígenos Virais/imunologia , Aprendizado de Máquina , Evolução Molecular , Epitopos/genética , Epitopos/imunologia , COVID-19/epidemiologia , COVID-19/prevenção & controle , COVID-19/virologia , COVID-19/imunologia , Pandemias/prevenção & controle , Glicoproteínas de Hemaglutininação de Vírus da Influenza/genética , Glicoproteínas de Hemaglutininação de Vírus da Influenza/imunologia , SARS-CoV-2/genética , SARS-CoV-2/imunologia
2.
Virol Sin ; 2024 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-38423254

RESUMO

Influenza A virus (IAV) shows an extensive host range and rapid genomic variations, leading to continuous emergence of novel viruses with significant antigenic variations and the potential for cross-species transmission. This causes global pandemics and seasonal flu outbreaks, posing sustained threats worldwide. Thus, studying all IAVs' evolutionary patterns and underlying mechanisms is crucial for effective prevention and control. We developed FluTyping to identify IAV genotypes, to explore overall genetic diversity patterns and their restriction factors. FluTyping groups isolates based on genetic distance and phylogenetic relationships using entire genomes, enabling identification of each isolate's genotype. Three distinct genetic diversity patterns were observed: one genotype domination pattern comprising only H1N1 and H3N2 seasonal influenza subtypes, multi-genotypes co-circulation pattern including majority avian influenza subtypes and swine influenza H1N2, and hybrid-circulation pattern involving H7N9 and three H5 subtypes of influenza viruses. Furthermore, the IAVs in multi-genotypes co-circulation pattern showed region-specific dominant genotypes, implying the restriction of virus transmission is a key factor contributing to distinct genetic diversity patterns, and the genomic evolution underlying different patterns showed more influenced by host-specific factors. In summary, a comprehensive picture of the evolutionary patterns of overall IAVs is provided by the FluTyping's identified genotypes, offering important theoretical foundations for future prevention and control of these viruses.

3.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38343322

RESUMO

Vaccination stands as the most effective and economical strategy for prevention and control of influenza. The primary target of neutralizing antibodies is the surface antigen hemagglutinin (HA). However, ongoing mutations in the HA sequence result in antigenic drift. The success of a vaccine is contingent on its antigenic congruence with circulating strains. Thus, predicting antigenic variants and deducing antigenic clusters of influenza viruses are pivotal for recommendation of vaccine strains. The antigenicity of influenza A viruses is determined by the interplay of amino acids in the HA1 sequence. In this study, we exploit the ability of convolutional neural networks (CNNs) to extract spatial feature representations in the convolutional layers, which can discern interactions between amino acid sites. We introduce PREDAC-CNN, a model designed to track antigenic evolution of seasonal influenza A viruses. Accessible at http://predac-cnn.cloudna.cn, PREDAC-CNN formulates a spatially oriented representation of the HA1 sequence, optimized for the convolutional framework. It effectively probes interactions among amino acid sites in the HA1 sequence. Also, PREDAC-CNN focuses exclusively on physicochemical attributes crucial for the antigenicity of influenza viruses, thereby eliminating unnecessary amino acid embeddings. Together, PREDAC-CNN is adept at capturing interactions of amino acid sites within the HA1 sequence and examining the collective impact of point mutations on antigenic variation. Through 5-fold cross-validation and retrospective testing, PREDAC-CNN has shown superior performance in predicting antigenic variants compared to its counterparts. Additionally, PREDAC-CNN has been instrumental in identifying predominant antigenic clusters for A/H3N2 (1968-2023) and A/H1N1 (1977-2023) viruses, significantly aiding in vaccine strain recommendation.


Assuntos
Vírus da Influenza A Subtipo H1N1 , Vírus da Influenza A , Vacinas , Vírus da Influenza A/genética , Vírus da Influenza A Subtipo H3N2/genética , Glicoproteínas de Hemaglutininação de Vírus da Influenza/genética , Estações do Ano , Estudos Retrospectivos , Antígenos Virais/genética , Redes Neurais de Computação , Aminoácidos
4.
IEEE J Biomed Health Inform ; 27(12): 6029-6038, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37703167

RESUMO

Medical entity normalization is an important task for medical information processing. The Unified Medical Language System (UMLS), a well-developed medical terminology system, is crucial for medical entity normalization. However, the UMLS primarily consists of English medical terms. For languages other than English, such as Chinese, a significant challenge for normalizing medical entities is the lack of robust terminology systems. To address this issue, we propose a translation-enhancing training strategy that incorporates the translation and synonym knowledge of the UMLS into a language model using the contrastive learning approach. In this work, we proposed a cross-lingual pre-trained language model called TeaBERT, which can align synonymous Chinese and English medical entities across languages at the concept level. As the evaluation results showed, the TeaBERT language model outperformed previous cross-lingual language models with Acc@5 values of 92.54%, 87.14% and 84.77% on the ICD10-CN, CHPO and RealWorld-v2 datasets, respectively. It also achieved a new state-of-the-art cross-lingual entity mapping performance without fine-tuning. The translation-enhancing strategy is applicable to other languages that face the similar challenge due to the absence of well-developed medical terminology systems.


Assuntos
Idioma , Unified Medical Language System , Classificação Internacional de Doenças , Processamento de Linguagem Natural
5.
Virol Sin ; 38(4): 508-519, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37169126

RESUMO

The coronavirus disease 2019 (COVID-19) pandemic, caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has seriously threatened global public health and caused huge economic losses. Omics studies of SARS-CoV-2 can help understand the interaction between the virus and host, thereby providing a new perspective in guiding the intervention and treatment of the SARS-CoV-2 infection. Since large amount of SARS-CoV-2 omics data have been accumulated in public databases, this study aimed to identify key host factors involved in SARS-CoV-2 infection through systematic integration of transcriptome and interactome data. By manually curating published studies, we obtained a comprehensive SARS-CoV-2-human protein-protein interactions (PPIs) network, comprising 3591 human proteins interacting with 31 SARS-CoV-2 viral proteins. Using the RobustRankAggregation method, we identified 123 multiple cell line common genes (CLCGs), of which 115 up-regulated CLCGs showed host enhanced innate immunity and chemotactic response signatures. Combined with network analysis, co-expression and functional enrichment analysis, we discovered four key host factors involved in SARS-CoV-2 infection: IFITM1, SERPINE1, DDX60, and TNFAIP2. Furthermore, SERPINE1 was found to facilitate SARS-CoV-2 replication, and can alleviate the endoplasmic reticulum (ER) stress induced by ORF8 protein through interaction with ORF8. Our findings highlight the importance of systematic integration analysis in understanding SARS-CoV-2-human interactions and provide valuable insights for future research on potential therapeutic targets against SARS-CoV-2 infection.


Assuntos
COVID-19 , Humanos , SARS-CoV-2/genética , Linhagem Celular , Transcriptoma , Perfilação da Expressão Gênica
6.
Virol Sin ; 38(4): 541-548, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37211247

RESUMO

The Influenza A (H1N1) pdm09 virus caused a global pandemic in 2009 and has circulated seasonally ever since. As the continual genetic evolution of hemagglutinin in this virus leads to antigenic drift, rapid identification of antigenic variants and characterization of the antigenic evolution are needed. In this study, we developed PREDAC-H1pdm, a model to predict antigenic relationships between H1N1pdm viruses and identify antigenic clusters for post-2009 pandemic H1N1 strains. Our model performed well in predicting antigenic variants, which was helpful in influenza surveillance. By mapping the antigenic clusters for H1N1pdm, we found that substitutions on the Sa epitope were common for H1N1pdm, whereas for the former seasonal H1N1, substitutions on the Sb epitope were more common in antigenic evolution. Additionally, the localized epidemic pattern of H1N1pdm was more obvious than that of the former seasonal H1N1, which could make vaccine recommendation more sophisticated. Overall, the antigenic relationship prediction model we developed provides a rapid determination method for identifying antigenic variants, and the further analysis of evolutionary and epidemic characteristics can facilitate vaccine recommendations and influenza surveillance for H1N1pdm.


Assuntos
Vírus da Influenza A Subtipo H1N1 , Vacinas contra Influenza , Influenza Humana , Humanos , Vírus da Influenza A Subtipo H1N1/genética , Influenza Humana/epidemiologia , Epitopos/genética , Evolução Molecular , Filogenia , Glicoproteínas de Hemaglutininação de Vírus da Influenza/genética
7.
Microbiol Spectr ; 11(3): e0401122, 2023 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-37022188

RESUMO

Klebsiella pneumoniae is a common cause of hospital- and community-acquired infections globally, yet its population structure remains unknown for many regions, particularly in low- and middle-income countries (LMICs). Here, we report for the first-time whole-genome sequencing (WGS) of a multidrug-resistant K. pneumoniae isolate, ARM01, recovered from a patient in Armenia. Antibiotic susceptibility testing revealed that ARM01 was resistant to ampicillin, amoxicillin-clavulanic acid, ceftazidime, cefepime, norfloxacin, levofloxacin, and chloramphenicol. Genome sequencing analysis revealed that ARM01 belonged to sequence type 967 (ST967), capsule type K18, and antigen type O1. ARM01 carried 13 antimicrobial resistance (AMR) genes, including blaSHV-27, dfrA12, tet(A), sul1, sul2, catII.2, mphA, qnrS1, aadA2, aph3-Ia, strA, and strB and the extended-spectrum ß-lactamase (ESBL) gene blaCTX-M-15, but only one known virulence factor gene, yagZ/ecpA, and one plasmid replicon, IncFIB(K)(pCAV1099-114), were detected. The plasmid profile, AMR genes, virulence factors, accessory gene profile, and evolutionary analyses of ARM01 showed high similarity to isolates recovered from Qatar (SRR11267909 and SRR11267906). The date of the most recent common ancestor (MRCA) of ARM01 was estimated to be around 2017 (95% confidence interval [CI], 2017 to 2018). Although in this study, we report the comparative genomics analysis of only one isolate, it emphasizes the importance of genomic surveillance for emerging pathogens, urging the need for implementation of more effective infection prevention and control practices. IMPORTANCE Whole-genome sequencing and population genetics analysis of K. pneumoniae are scarce from LMICs, and none has been reported for Armenia. Multilevel comparative analysis revealed that ARM01 (an isolate belonging to a newly emerged K. pneumoniae ST967 lineage) was genetically similar to two isolates recovered from Qatar. ARM01 was resistant to a wide range of antibiotics, reflecting the unregulated usage of antibiotics (in most LMICs, antibiotic use is typically unregulated.) Understanding the genetic makeup of these newly emerging lineages will aid in optimizing antibiotic use for patient treatment and contribute to the worldwide efforts of pathogen and AMR surveillance and implementation of more effective infection prevention and control strategies.


Assuntos
Infecções por Klebsiella , Klebsiella pneumoniae , Humanos , Antibacterianos/farmacologia , Plasmídeos , Sequenciamento Completo do Genoma , Fatores de Virulência/genética , Genômica , Infecções por Klebsiella/epidemiologia , beta-Lactamases/genética , Farmacorresistência Bacteriana Múltipla/genética , Testes de Sensibilidade Microbiana
9.
Nat Genet ; 55(2): 312-323, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36646891

RESUMO

Hybrid maize displays superior heterosis and contributes over 30% of total worldwide cereal production. However, the molecular mechanisms of heterosis remain obscure. Here we show that structural variants (SVs) between the parental lines have a predominant role underpinning maize heterosis. De novo assembly and analyses of 12 maize founder inbred lines (FILs) reveal abundant genetic variations among these FILs and, through expression quantitative trait loci and association analyses, we identify several SVs contributing to genomic and phenotypic differentiations of various heterotic groups. Using a set of 91 diallel-cross F1 hybrids, we found strong positive correlations between better-parent heterosis of the F1 hybrids and the numbers of SVs between the parental lines, providing concrete genomic support for a prevalent role of genetic complementation underlying heterosis. Further, we document evidence that SVs in both ZAR1 and ZmACO2 contribute to yield heterosis in an overdominance fashion. Our results should promote genomics-based breeding of hybrid maize.


Assuntos
Vigor Híbrido , Zea mays , Grão Comestível/genética , Vigor Híbrido/genética , Hibridização Genética , Melhoramento Vegetal , Locos de Características Quantitativas/genética , Zea mays/genética , Genoma de Planta
10.
Nucleic Acids Res ; 51(D1): D262-D268, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36177882

RESUMO

Ribozymes are excellent systems in which to study 'sequence - structure - function' relationships in RNA molecules. Understanding these relationships may greatly help structural modeling and design of functional RNA structures and some functional structural modules could be repurposed in molecular design. At present, there is no comprehensive database summarising all the natural ribozyme families. We have therefore created Ribocentre, a database that collects together sequence, structure and mechanistic data on 21 ribozyme families. This includes available information on timelines, sequence families, secondary and tertiary structures, catalytic mechanisms, applications of the ribozymes together with key publications. The database is publicly available at https://www.ribocentre.org.


Assuntos
Bases de Dados de Ácidos Nucleicos , RNA Catalítico , Humanos , Sequência de Bases , Conformação de Ácido Nucleico , RNA Catalítico/química
11.
Health Data Sci ; 3: 0011, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38487197

RESUMO

Background: Chinese medical entities have not been organized comprehensively due to the lack of well-developed terminology systems, which poses a challenge to processing Chinese medical texts for fine-grained medical knowledge representation. To unify Chinese medical terminologies, mapping Chinese medical entities to their English counterparts in the Unified Medical Language System (UMLS) is an efficient solution. However, their mappings have not been investigated sufficiently in former research. In this study, we explore strategies for mapping Chinese medical entities to the UMLS and systematically evaluate the mapping performance. Methods: First, Chinese medical entities are translated to English using multiple web-based translation engines. Then, 3 mapping strategies are investigated: (a) string-based, (b) semantic-based, and (c) string and semantic similarity combined. In addition, cross-lingual pretrained language models are applied to map Chinese medical entities to UMLS concepts without translation. All of these strategies are evaluated on the ICD10-CN, Chinese Human Phenotype Ontology (CHPO), and RealWorld datasets. Results: The linear combination method based on the SapBERT and term frequency-inverse document frequency bag-of-words models perform the best on all evaluation datasets, with 91.85%, 82.44%, and 78.43% of the top 5 accuracies on the ICD10-CN, CHPO, and RealWorld datasets, respectively. Conclusions: In our study, we explore strategies for mapping Chinese medical entities to the UMLS and identify a satisfactory linear combination method. Our investigation will facilitate Chinese medical entity normalization and inspire research that focuses on Chinese medical ontology development.

12.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36377755

RESUMO

Virus-encoded small RNAs (vsRNA) have been reported to play an important role in viral infection. Unfortunately, there is still a lack of an effective method for vsRNA identification. Herein, we presented vsRNAfinder, a de novo method for identifying high-confidence vsRNAs from small RNA-Seq (sRNA-Seq) data based on peak calling and Poisson distribution and is publicly available at https://github.com/ZenaCai/vsRNAfinder. vsRNAfinder outperformed two widely used methods namely miRDeep2 and ShortStack in identifying viral miRNAs with a significantly improved sensitivity. It can also be used to identify sRNAs in animals and plants with similar performance to miRDeep2 and ShortStack. vsRNAfinder would greatly facilitate effective identification of vsRNAs from sRNA-Seq data.


Assuntos
MicroRNAs , Animais , RNA-Seq , MicroRNAs/genética , Análise de Sequência de RNA/métodos
14.
Front Microbiol ; 13: 890590, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35910603

RESUMO

Genetic mutation and recombination are driving the evolution of SARS-CoV-2, leaving many genetic imprints which could be utilized to track the evolutionary pathway of SARS-CoV-2 and explore the relationships among variants. Here, we constructed a complete genetic map, showing the explicit evolutionary relationship among all SARS-CoV-2 variants including 58 groups and 46 recombination types identified from 3,392,553 sequences, which enables us to keep well informed of the evolution of SARS-CoV-2 and quickly determine the parents of novel variants. We found that the 5' and 3' of the spike and nucleoprotein genes have high frequencies to form the recombination junctions and that the RBD region in S gene is always exchanged as a whole. Although these recombinants did not show advantages in community transmission, it is necessary to keep a wary eye on the novel genetic events, in particular, the mutants with mutations on spike and recombinants with exchanged moieties on spike gene.

15.
Emerg Microbes Infect ; 11(1): 2069-2079, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-35930371

RESUMO

The enteroinvasive bacterium Shigella flexneri is known as a highly host-adapted human pathogen. There had been no known other reservoirs reported until recently. Here 34 isolates obtained from animals (yaks, dairy cows and beef cattle) from 2016 to 2017 and 268 human S. flexneri isolates from China were sequenced to determine the relationships between animal and human isolates and infer the evolutionary history of animal-associated S. flexneri. The 18 animal isolates (15 yak and 3 beef cattle isolates) in PG1 were separated into 4 lineages, and the 16 animal isolates (1 yak, 5 beef cattle and 10 dairy cow isolates) in PG3 were clustered in 8 lineages. The most recent human isolates from China belonged to PG3 whereas Chinese isolates from the 1950s-1960s belonged to PG1. PG1 S. flexneri may has been transmitted to the yaks during PG1 circulation in the human population in China and has remained in the yak population since, while PG3 S. flexneri in animals were likely recent transmissions from the human population. Increased stability of the large virulence plasmid and acquisition of abundant antimicrobial resistance determinants may have enabled PG3 to expand globally and replaced PG1 in China. Our study confirms that animals may act as a reservoir for S. flexneri. Genomic analysis revealed the evolutionary history of multiple S. flexneri lineages in animals and humans in China. However, further studies are required to determine the public health threat of S. flexneri from animals.


Assuntos
Disenteria Bacilar , Shigella flexneri , Animais , Antibacterianos , Disenteria Bacilar/epidemiologia , Disenteria Bacilar/microbiologia , Genômica , Humanos , Plasmídeos , Shigella flexneri/genética
16.
J Virol ; 96(17): e0074122, 2022 09 14.
Artigo em Inglês | MEDLINE | ID: mdl-35980206

RESUMO

Within the past 2 decades, three highly pathogenic human coronaviruses have emerged, namely, severe acute respiratory syndrome coronavirus (SARS-CoV), Middle East respiratory syndrome coronavirus (MERS-CoV), and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The health threats and economic burden posed by these tremendously severe coronaviruses have paved the way for research on their etiology, pathogenesis, and treatment. Compared to SARS-CoV and SARS-CoV-2, MERS-CoV genome encoded fewer accessory proteins, among which the ORF4b protein had anti-immunity ability in both the cytoplasm and nucleus. Our work for the first time revealed that ORF4b protein was unstable in the host cells and could be degraded by the ubiquitin proteasome system. After extensive screenings, it was found that UBR5 (ubiquitin protein ligase E3 component N-recognin 5), a member of the HECT E3 ubiquitin ligases, specifically regulated the ubiquitination and degradation of ORF4b. Similar to ORF4b, UBR5 can also translocate into the nucleus through its nuclear localization signal, enabling it to regulate ORF4b stability in both the cytoplasm and nucleus. Through further experiments, lysine 36 was identified as the ubiquitination site on the ORF4b protein, and this residue was highly conserved in various MERS-CoV strains isolated from different regions. When UBR5 was knocked down, the ability of ORF4b to suppress innate immunity was enhanced and MERS-CoV replication was stronger. As an anti-MERS-CoV host protein, UBR5 targets and degrades ORF4b protein through the ubiquitin proteasome system, thereby attenuating the anti-immunity ability of ORF4b and ultimately inhibiting MERS-CoV immune escape, which is a novel antagonistic mechanism of the host against MERS-CoV infection. IMPORTANCE ORF4b was an accessory protein unique to MERS-CoV and was not present in SARS-CoV and SARS-CoV-2 which can also cause severe respiratory disease. Moreover, ORF4b inhibited the production of antiviral cytokines in both the cytoplasm and the nucleus, which was likely to be associated with the high lethality of MERS-CoV. However, whether the host proteins regulate the function of ORF4b is unknown. Our study first determined that UBR5, a host E3 ligase, was a potential host anti-MERS-CoV protein that could reduce the protein level of ORF4b and diminish its anti-immunity ability by inducing ubiquitination and degradation. Based on the discovery of ORF4b-UBR5, a critical molecular target, further increasing the degradation of ORF4b caused by UBR5 could provide a new strategy for the clinical development of drugs for MERS-CoV.


Assuntos
Infecções por Coronavirus , Interações entre Hospedeiro e Microrganismos , Coronavírus da Síndrome Respiratória do Oriente Médio , Proteólise , Ubiquitina-Proteína Ligases , Ubiquitinação , Proteínas Virais , Infecções por Coronavirus/imunologia , Infecções por Coronavirus/prevenção & controle , Infecções por Coronavirus/virologia , Citocinas/imunologia , Humanos , Imunidade Inata , Coronavírus da Síndrome Respiratória do Oriente Médio/imunologia , Coronavírus da Síndrome Respiratória do Oriente Médio/metabolismo , Terapia de Alvo Molecular , Complexo de Endopeptidases do Proteassoma/metabolismo , Coronavírus Relacionado à Síndrome Respiratória Aguda Grave , SARS-CoV-2 , Ubiquitina-Proteína Ligases/metabolismo , Ubiquitinas/metabolismo , Proteínas Virais/química , Proteínas Virais/metabolismo , Replicação Viral
17.
J Med Internet Res ; 24(6): e37213, 2022 06 03.
Artigo em Inglês | MEDLINE | ID: mdl-35657661

RESUMO

BACKGROUND: Phenotype information in electronic health records (EHRs) is mainly recorded in unstructured free text, which cannot be directly used for clinical research. EHR-based deep-phenotyping methods can structure phenotype information in EHRs with high fidelity, making it the focus of medical informatics. However, developing a deep-phenotyping method for non-English EHRs (ie, Chinese EHRs) is challenging. Although numerous EHR resources exist in China, fine-grained annotation data that are suitable for developing deep-phenotyping methods are limited. It is challenging to develop a deep-phenotyping method for Chinese EHRs in such a low-resource scenario. OBJECTIVE: In this study, we aimed to develop a deep-phenotyping method with good generalization ability for Chinese EHRs based on limited fine-grained annotation data. METHODS: The core of the methodology was to identify linguistic patterns of phenotype descriptions in Chinese EHRs with a sequence motif discovery tool and perform deep phenotyping of Chinese EHRs by recognizing linguistic patterns in free text. Specifically, 1000 Chinese EHRs were manually annotated based on a fine-grained information model, PhenoSSU (Semantic Structured Unit of Phenotypes). The annotation data set was randomly divided into a training set (n=700, 70%) and a testing set (n=300, 30%). The process for mining linguistic patterns was divided into three steps. First, free text in the training set was encoded as single-letter sequences (P: phenotype, A: attribute). Second, a biological sequence analysis tool-MEME (Multiple Expectation Maximums for Motif Elicitation)-was used to identify motifs in the single-letter sequences. Finally, the identified motifs were reduced to a series of regular expressions representing linguistic patterns of PhenoSSU instances in Chinese EHRs. Based on the discovered linguistic patterns, we developed a deep-phenotyping method for Chinese EHRs, including a deep learning-based method for named entity recognition and a pattern recognition-based method for attribute prediction. RESULTS: In total, 51 sequence motifs with statistical significance were mined from 700 Chinese EHRs in the training set and were combined into six regular expressions. It was found that these six regular expressions could be learned from a mean of 134 (SD 9.7) annotated EHRs in the training set. The deep-phenotyping algorithm for Chinese EHRs could recognize PhenoSSU instances with an overall accuracy of 0.844 on the test set. For the subtask of entity recognition, the algorithm achieved an F1 score of 0.898 with the Bidirectional Encoder Representations from Transformers-bidirectional long short-term memory and conditional random field model; for the subtask of attribute prediction, the algorithm achieved a weighted accuracy of 0.940 with the linguistic pattern-based method. CONCLUSIONS: We developed a simple but effective strategy to perform deep phenotyping of Chinese EHRs with limited fine-grained annotation data. Our work will promote the second use of Chinese EHRs and give inspiration to other non-English-speaking countries.


Assuntos
Registros Eletrônicos de Saúde , Informática Médica , Algoritmos , Humanos , Fenótipo , Semântica
18.
Front Oncol ; 12: 821578, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35530341

RESUMO

Background: The tumor microenvironment (TME), which involves infiltration of multiple immune cells into the tumor tissues, plays an essential role in clinical benefit to therapy. The chemokines and their receptors influence migration and functions of both tumor and immune cells. Also, molecular characteristics are associated with the efficacy of melanoma therapy. However, there lacked exploration of immune characteristics and the association with molecular characteristics. Methods: We collected the currently available 569 melanoma samples that had both the genomic and transcriptional data from TCGA and SRA databases. We first identified TME subtypes based on the developed immune signatures, and then divided the samples into two immune cohorts based on the immune score. Next, we estimated the compositions of the immune cells of the two cohorts, and performed differential expression genes (DEGs) and functional enrichments. In addition, we investigated the interactions of chemokines and their receptors under immune cells. Finally, we explored the genomic characteristics under different immune subtypes. Results: TME type D had a better prognosis among the four subtypes. The high-immunity cohort had significantly high 16 immune cells. The 63 upregulated and 384 downregulated genes in the high-immunity cohort were enriched in immune-related biological processes, and keratin, pigmentation and epithelial cells, respectively. The correlations of chemokines and their receptors with immune cell infiltration, such as CCR5-CCL4/CCL5 and CXCR3-CXCL9/CXCL10/CXCL11/CXCL13 axis, showed that the recruitments of 11 immune cells, such as CD4T cells and CD8T cells, were modulated by chemokines and their receptors. The proportions of the four TME subtypes in each molecular subtype were comparable. The two driver genes, CDKN2A and PRB2, had significantly different MAFs between the high-immunity and low-immunity. Conclusion: We dissected the characteristics of immune infiltration, the interactions of chemokines and their receptors under immune cells, and the correlation of molecular and immune characteristics. Our work will enable the reasonable selection of anti-melanoma treatments and accelerate the development of new therapeutic strategies for melanoma.

19.
F S Sci ; 3(2): 108-117, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35560008

RESUMO

OBJECTIVE: To facilitate the identification of related genes and candidate biomarkers for disorders of sex development (DSD), we present disorders of sex development atlas (http://dsd.geneworks.cn). Disorders of sex development are a spectrum of endocrine diseases with distinct mutations of genes or chromosomes, but several issues regarding their pathogenesis remain elusive. High-throughput methods have allowed genomic and transcriptomic analyses of DSD; however, these data are deposited in various repositories owing to a lack of integrated online resources. DESIGN: A descriptive study of a specialized gene discovery platform designed for DSD. SETTING: Publicly available DSD omics datasets and self-produced datasets. PATIENT(S): None. INTERVENTION(S): None. MAIN OUTCOME MEASURE(S): The gene ranking result, with detailed information based on DSD terms in a gene-disease association knowledge base, and results of differential gene expression and mutation analyses from omics datasets. RESULT(S): The disorders of sex development atlas maintains both a knowledgebase for ranking DSD candidate genes and a database for DSD-related omics data analysis and visualization. We included 4 dominant classes of DSD in the knowledgebase: 15 subclasses and 44 specific disease names. Construction of the knowledgebase was centered upon Phenolyzer, with add-on seed gene databases customized by DSD-related genes collected from MalaCards, GeneCards, and DisGeNET. For the database, 25 experimental datasets related to DSD were integrated, including 24 public datasets from Gene Expression Omnibus and Sequence Read Archive and 1 self-generated dataset. A total of 474 samples from 240 DSD samples were collected for the database. CONCLUSION(S): This platform provides a friendly interface that integrates flexible and comprehensive analysis tools for differential expression and gene mutations between the DSD groups and controls.


Assuntos
Transtornos do Desenvolvimento Sexual , Desenvolvimento Sexual , Transtornos do Desenvolvimento Sexual/genética , Estudos de Associação Genética , Genômica , Humanos , Mutação , Desenvolvimento Sexual/genética
20.
IEEE J Biomed Health Inform ; 26(8): 4142-4152, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35609107

RESUMO

Electronic health record (EHR) resources are valuable but remain underexplored because most clinical information, especially phenotype information, is buried in the free text of EHRs. An intelligent annotation tool plays an important role in unlocking the full potential of EHRs by transforming free-text phenotype information into a computer-readable form. Deep phenotyping has shown its advantage in representing phenotype information in EHRs with high fidelity; however, most existing annotation tools are not suitable for the deep phenotyping task. Here, we developed an intelligent annotation tool named PIAT with a major focus on the deep phenotyping of Chinese EHRs. PIAT can improve the annotation efficiency for EHR-based deep phenotyping with a simple but effective interactive interface, automatic preannotation support, and a learning mechanism. Specifically, experts can proofread automatic annotation results from the annotation algorithm in the web-based interactive interface, and EHRs reviewed by experts can be used for evolving the underlying annotation algorithm. In this way, the annotation process of deep phenotyping EHRs will become easier. In conclusion, we create a powerful intelligent system for the deep phenotyping of Chinese EHRs. It is hoped that our work will inspire further studies in constructing intelligent systems for deep phenotyping English and non-English EHRs.


Assuntos
Algoritmos , Registros Eletrônicos de Saúde , China , Fenótipo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA