Búsqueda | BVS Bolivia

TEINet: a deep learning framework for prediction of TCR-epitope binding specificity.

Jiang, Yuepeng; Huo, Miaozhe; Cheng Li, Shuai.

Brief Bioinform ; 24(2)2023 03 19.

Artículo en Inglés | MEDLINE | ID: mdl-36907658

RESUMEN

The adaptive immune response to foreign antigens is initiated by T-cell receptor (TCR) recognition on the antigens. Recent experimental advances have enabled the generation of a large amount of TCR data and their cognate antigenic targets, allowing machine learning models to predict the binding specificity of TCRs. In this work, we present TEINet, a deep learning framework that utilizes transfer learning to address this prediction problem. TEINet employs two separately pretrained encoders to transform TCR and epitope sequences into numerical vectors, which are subsequently fed into a fully connected neural network to predict their binding specificities. A major challenge for binding specificity prediction is the lack of a unified approach to sampling negative data. Here, we first assess the current negative sampling approaches comprehensively and suggest that the Unified Epitope is the most suitable one. Subsequently, we compare TEINet with three baseline methods and observe that TEINet achieves an average AUROC of 0.760, which outperforms baseline methods by 6.4-26%. Furthermore, we investigate the impacts of the pretraining step and notice that excessive pretraining may lower its transferability to the final prediction task. Our results and analysis show that TEINet can make an accurate prediction using only the TCR sequence (CDR3$\beta $) and the epitope sequence, providing novel insights to understand the interactions between TCRs and epitopes.

Asunto(s)

Aprendizaje Profundo , Epítopos de Linfocito T , Receptores de Antígenos de Linfocitos T , Unión Proteica

Both simulation and sequencing data reveal coinfections with multiple SARS-CoV-2 variants in the COVID-19 pandemic.

Li, Yinhu; Jiang, Yiqi; Li, Zhengtu; Yu, Yonghan; Chen, Jiaxing; Jia, Wenlong; Kaow Ng, Yen; Ye, Feng; Cheng Li, Shuai; Shen, Bairong.

Comput Struct Biotechnol J ; 20: 1389-1401, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-35342534

RESUMEN

SARS-CoV-2 is a single-stranded RNA betacoronavirus with a high mutation rate. The rapidly emerging SARS-CoV-2 variants could increase transmissibility and diminish vaccine protection. However, whether coinfection with multiple SARS-CoV-2 variants exists remains controversial. This study collected 12,986 and 4,113 SARS-CoV-2 genomes from the GISAID database on May 11, 2020 (GISAID20May11), and Apr 1, 2021 (GISAID21Apr1), respectively. With single-nucleotide variant (SNV) and network clique analyses, we constructed single-nucleotide polymorphism (SNP) coexistence networks and discovered maximal SNP cliques of sizes 16 and 34 in the GISAID20May11 and GISAID21Apr1 datasets, respectively. Simulating the transmission routes and SNV accumulations, we discovered a linear relationship between the size of the maximal clique and the number of coinfected variants. We deduced that the COVID-19 cases in GISAID20May11 and GISAID21Apr1 were coinfections with 3.20 and 3.42 variants on average, respectively. Additionally, we performed Nanopore sequencing on 42 COVID-19 patients and discovered recurrent heterozygous SNPs in twenty of the patients, including loci 8,782 and 28,144, which were crucial for SARS-CoV-2 lineage divergence. In conclusion, our findings reported SARS-CoV-2 variants coinfection in COVID-19 patients and demonstrated the increasing number of coinfected variants.

Deep learning model reveals potential risk genes for ADHD, especially Ephrin receptor gene EPHA5.

Liu, Lu; Feng, Xikang; Li, Haimei; Cheng Li, Shuai; Qian, Qiujin; Wang, Yufeng.

Brief Bioinform ; 22(6)2021 11 05.

Artículo en Inglés | MEDLINE | ID: mdl-34109382

RESUMEN

Attention deficit hyperactivity disorder (ADHD) is a common neurodevelopmental disorder. Although genome-wide association studies (GWAS) identify the risk ADHD-associated variants and genes with significant P-values, they may neglect the combined effect of multiple variants with insignificant P-values. Here, we proposed a convolutional neural network (CNN) to classify 1033 individuals diagnosed with ADHD from 950 healthy controls according to their genomic data. The model takes the single nucleotide polymorphism (SNP) loci of P-values $\le{1\times 10^{-3}}$, i.e. 764 loci, as inputs, and achieved an accuracy of 0.9018, AUC of 0.9570, sensitivity of 0.8980 and specificity of 0.9055. By incorporating the saliency analysis for the deep learning network, a total of 96 candidate genes were found, of which 14 genes have been reported in previous ADHD-related studies. Furthermore, joint Gene Ontology enrichment and expression Quantitative Trait Loci analysis identified a potential risk gene for ADHD, EPHA5 with a variant of rs4860671. Overall, our CNN deep learning model exhibited a high accuracy for ADHD classification and demonstrated that the deep learning model could capture variants' combining effect with insignificant P-value, while GWAS fails. To our best knowledge, our model is the first deep learning method for the classification of ADHD with SNPs data.

Asunto(s)

Trastorno por Déficit de Atención con Hiperactividad/genética , Biomarcadores , Aprendizaje Profundo , Predisposición Genética a la Enfermedad , Receptor EphA5/genética , Área Bajo la Curva , Trastorno por Déficit de Atención con Hiperactividad/diagnóstico , Biología Computacional/métodos , Ontología de Genes , Estudio de Asociación del Genoma Completo , Humanos , Desequilibrio de Ligamiento , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Curva ROC

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA