Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 83
Filtrar
1.
Comput Biol Chem ; 112: 108158, 2024 Jul 23.
Artigo em Inglês | MEDLINE | ID: mdl-39053174

RESUMO

Studying the relationship between sequences and their corresponding three-dimensional structure assists structural biologists in solving the protein-folding problem. Despite several experimental and in-silico approaches, still understanding or decoding the three-dimensional structures from the sequence remains a mystery. In such cases, the accuracy of the structure prediction plays an indispensable role. To address this issue, an updated web server (CSSP-2.0) has been created to improve the accuracy of our previous version of CSSP by deploying the existing algorithms. It uses input as probabilities and predicts the consensus for the secondary structure as a highly accurate three-state Q3 (helix, strand, and coil). This prediction is achieved using six recent top-performing methods: MUFOLD-SS, RaptorX, PSSpred v4, PSIPRED, JPred v4, and Porter 5.0. CSSP-2.0 validation includes datasets involving various protein classes from the PDB, CullPDB, and AlphaFold databases. Our results indicate a significant improvement in the accuracy of the consensus Q3 prediction. Using CSSP-2.0, crystallographers can sort out the stable regular secondary structures from the entire complex structure, which would aid in inferring the functional annotation of hypothetical proteins. The web server is freely available at https://bioserver3.physics.iisc.ac.in/cgi-bin/cssp-2/.

2.
BMC Cancer ; 24(1): 900, 2024 Jul 26.
Artigo em Inglês | MEDLINE | ID: mdl-39060972

RESUMO

Leukemia is a type of blood cell cancer that is in the bone marrow's blood-forming cells. Two types of Leukemia are acute and chronic; acute enhances fast and chronic growth gradually which are further classified into lymphocytic and myeloid leukemias. This work evaluates a unique deep convolutional neural network (CNN) classifier that improves identification precision by carefully examining concatenated peptide patterns. The study uses leukemia protein expression for experiments supporting two different techniques including independence and applied cross-validation. In addition to CNN, multilayer perceptron (MLP), gated recurrent unit (GRU), and recurrent neural network (RNN) are applied. The experimental results show that the CNN model surpasses competitors with its outstanding predictability in independent and cross-validation testing applied on different features extracted from protein expressions such as amino acid composition (AAC) with a group of AAC (GAAC), tripeptide composition (TPC) with a group of TPC (GTPC), and dipeptide composition (DPC) for calculating its accuracies with their receiver operating characteristic (ROC) curve. In independence testing, a feature expression of AAC and a group of GAAC are applied using MLP and CNN modules, and ROC curves are achieved with overall 100% accuracy for the detection of protein patterns. In cross-validation testing, a feature expression on a group of AAC and GAAC patterns achieved 98.33% accuracy which is the highest for the CNN module. Furthermore, ROC curves show a 0.965% extraordinary result for the GRU module. The findings show that the CNN model is excellent at figuring out leukemia illnesses from protein expressions with higher accuracy.


Assuntos
Leucemia , Redes Neurais de Computação , Humanos , Leucemia/metabolismo , Leucemia/patologia , Curva ROC , Peptídeos/análise
3.
BMC Genomics ; 25(1): 466, 2024 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-38741045

RESUMO

BACKGROUND: Protein-protein interactions (PPIs) hold significant importance in biology, with precise PPI prediction as a pivotal factor in comprehending cellular processes and facilitating drug design. However, experimental determination of PPIs is laborious, time-consuming, and often constrained by technical limitations. METHODS: We introduce a new node representation method based on initial information fusion, called FFANE, which amalgamates PPI networks and protein sequence data to enhance the precision of PPIs' prediction. A Gaussian kernel similarity matrix is initially established by leveraging protein structural resemblances. Concurrently, protein sequence similarities are gauged using the Levenshtein distance, enabling the capture of diverse protein attributes. Subsequently, to construct an initial information matrix, these two feature matrices are merged by employing weighted fusion to achieve an organic amalgamation of structural and sequence details. To gain a more profound understanding of the amalgamated features, a Stacked Autoencoder (SAE) is employed for encoding learning, thereby yielding more representative feature representations. Ultimately, classification models are trained to predict PPIs by using the well-learned fusion feature. RESULTS: When employing 5-fold cross-validation experiments on SVM, our proposed method achieved average accuracies of 94.28%, 97.69%, and 84.05% in terms of Saccharomyces cerevisiae, Homo sapiens, and Helicobacter pylori datasets, respectively. CONCLUSION: Experimental findings across various authentic datasets validate the efficacy and superiority of this fusion feature representation approach, underscoring its potential value in bioinformatics.


Assuntos
Biologia Computacional , Mapeamento de Interação de Proteínas , Mapeamento de Interação de Proteínas/métodos , Biologia Computacional/métodos , Algoritmos , Helicobacter pylori/metabolismo , Helicobacter pylori/genética , Máquina de Vetores de Suporte , Proteínas/metabolismo , Proteínas/química , Humanos , Mapas de Interação de Proteínas , Bases de Dados de Proteínas
4.
BMC Med Inform Decis Mak ; 24(1): 122, 2024 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-38741115

RESUMO

MOTIVATION: Drug repurposing speeds up the development of new treatments, being less costly, risky, and time consuming than de novo drug discovery. There are numerous biological elements that contribute to the development of diseases and, as a result, to the repurposing of drugs. METHODS: In this article, we analysed the potential role of protein sequences in drug repurposing scenarios. For this purpose, we embedded the protein sequences by performing four state of the art methods and validated their capacity to encapsulate essential biological information through visualization. Then, we compared the differences in sequence distance between protein-drug target pairs of drug repurposing and non - drug repurposing data. Thus, we were able to uncover patterns that define protein sequences in repurposing cases. RESULTS: We found statistically significant sequence distance differences between protein pairs in the repurposing data and the rest of protein pairs in non-repurposing data. In this manner, we verified the potential of using numerical representations of sequences to generate repurposing hypotheses in the future.


Assuntos
Reposicionamento de Medicamentos , Humanos , Análise de Sequência de Proteína
5.
Med Biol Eng Comput ; 62(8): 2449-2483, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38622438

RESUMO

Understanding protein structures is crucial for various bioinformatics research, including drug discovery, disease diagnosis, and evolutionary studies. Protein structure classification is a critical aspect of structural biology, where supervised machine learning algorithms classify structures based on data from databases such as Protein Data Bank (PDB). However, the challenge lies in designing numerical embeddings for protein structures without losing essential information. Although some effort has been made in the literature, researchers have not effectively and rigorously combined the structural and sequence-based features for efficient protein classification to the best of our knowledge. To this end, we propose numerical embeddings that extract relevant features for protein sequences fetched from PDB structures from popular datasets such as PDB Bind and STCRDAB. The features are physicochemical properties such as aromaticity, instability index, flexibility, Grand Average of Hydropathy (GRAVY), isoelectric point, charge at pH, secondary structure fracture, molar extinction coefficient, and molecular weight. We also incorporate scaling features for the sliding windows (e.g., k-mers), which include Kyte and Doolittle (KD) hydropathy scale, Eisenberg hydrophobicity scale, Hydrophilicity scale, Flexibility of the amino acids, and Hydropathy scale. Multiple-feature selection aims to improve the accuracy of protein classification models. The results showed that the selected features significantly improved the predictive performance of existing embeddings.


Assuntos
Bases de Dados de Proteínas , Proteínas , Proteínas/química , Proteínas/metabolismo , Biologia Computacional/métodos , Algoritmos , Conformação Proteica
6.
bioRxiv ; 2024 Apr 10.
Artigo em Inglês | MEDLINE | ID: mdl-38645092

RESUMO

Objective biomarkers of food intake are a sought-after goal in nutrition research. Most biomarker development to date has focused on metabolites detected in blood, urine, skin or hair, but detection of consumed foods in stool has also been shown to be possible via DNA sequencing. An additional food macromolecule in stool that harbors sequence information is protein. However, the use of protein as an intake biomarker has only been explored to a very limited extent. Here, we evaluate and compare measurement of residual food-derived DNA and protein in stool as potential biomarkers of intake. We performed a pilot study of DNA sequencing-based metabarcoding (FoodSeq) and mass spectrometry-based metaproteomics in five individuals' stool sampled in short, longitudinal bursts accompanied by detailed diet records (n=27 total samples). Dietary data provided by stool DNA, stool protein, and written diet record independently identified a strong within-person dietary signature, identified similar food taxa, and had significantly similar global structure in two of the three pairwise comparisons between measurement techniques (DNA-to-protein and DNA-to-diet record). Metaproteomics identified proteins including myosin, ovalbumin, and beta-lactoglobulin that differentiated food tissue types like beef from dairy and chicken from egg, distinctions that were not possible by DNA alone. Overall, our results lay the groundwork for development of targeted metaproteomic assays for dietary assessment and demonstrate that diverse molecular components of food can be leveraged to study food intake using stool samples.

7.
Comput Struct Biotechnol J ; 23: 1244-1259, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38550974

RESUMO

Understanding protein-protein interactions (PPIs) at the molecular level may lead to innovations in medicine and biochemistry. The assumption that there are certain "hot spots" on protein surfaces that mediate their interactions with other proteins has led to a search for specific sequences involved in protein-protein contacts. In this work, we analyze sequential amino acid motifs, both at the single motif and at the motif-motif level, across a large and diverse dataset of biologically relevant protein-protein interfaces retrieved from the PDB, comparing their presence at interfaces and surfaces in a statistically rigorous manner. At the single motif level, our results indicate statistically significant over-presence of hydrophobic and in particular aromatic residues and under-presence of charged residues at protein-protein interfaces. Certain PPI-mediating motifs reported in the literature (e.g., the Tyrosine-based Motif YxxΦ and the PDZ-Binding Motif X-S/T-X-V/I) were confirmed to have a significant presence at interfaces. In addition, multiple PPI-mediating motifs were reported in the ELM database and from those present in our dataset, half were confirmed to have a statistically significant presence at interfaces whereas others were not. At the single residue, motif-motif level, Cysteine-Cysteine contacts were found to be the most abundant ones followed by interactions involving aromatic/hydrophobic residues. Top ranking, longer motif-motif pairs show predominance of Leucine and aromatic residues. Finally, preliminary energy calculations (using the MM/GBSA procedure) indicate a partial correlation between the probability of motifs-pair to be a part of a protein-protein interface and the strength of the interactions between the motifs. In conclusion, this study points to specific characteristics of motifs that have a higher probability to mediate protein-protein interactions. Prominent motifs identified in this study may be used in the future as possible components in protein engineering.

8.
Microbiol Resour Announc ; 13(3): e0077923, 2024 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-38385708

RESUMO

We created a database of all currently known mobile colistin resistance genes and variants (n = 115). It contains accession numbers of the gene and protein sequences, mutations between the protein variants and the main proteins, and additional metadata. It is accompanied by all genetic and protein sequences as two aggregated FASTA files.

9.
Comput Biol Med ; 170: 107956, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38217977

RESUMO

The classification and prediction of T-cell receptors (TCRs) protein sequences are of significant interest in understanding the immune system and developing personalized immunotherapies. In this study, we propose a novel approach using Pseudo Amino Acid Composition (PseAAC) protein encoding for accurate TCR protein sequence classification. The PseAAC2Vec encoding method captures the physicochemical properties of amino acids and their local sequence information, enabling the representation of protein sequences as fixed-length feature vectors. By incorporating physicochemical properties such as hydrophobicity, polarity, charge, molecular weight, and solvent accessibility, PseAAC2Vec provides a comprehensive and informative characterization of TCR protein sequences. To evaluate the effectiveness of the proposed PseAAC2Vec encoding approach, we assembled a large dataset of TCR protein sequences with annotated classes. We applied the PseAAC2Vec encoding scheme to each sequence and generated feature vectors based on a specified window size. Subsequently, we employed state-of-the-art machine learning algorithms, such as support vector machines (SVM) and random forests (RF), to classify the TCR protein sequences. Experimental results on the benchmark dataset demonstrated the superior performance of the PseAAC2Vec-based approach compared to existing methods. The PseAAC2Vec encoding effectively captures the discriminative patterns in TCR protein sequences, leading to improved classification accuracy and robustness. Furthermore, the encoding scheme showed promising results across different window sizes, indicating its adaptability to varying sequence contexts.


Assuntos
Biologia Computacional , Proteínas , Biologia Computacional/métodos , Proteínas/química , Sequência de Aminoácidos , Aminoácidos/química , Aminoácidos/metabolismo , Algoritmos , Máquina de Vetores de Suporte , Análise de Sequência de Proteína/métodos , Bases de Dados de Proteínas
10.
Front Bioinform ; 3: 1227193, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37900964

RESUMO

Understanding protein sequences and how they relate to the functions of proteins is extremely important. One of the most basic operations in bioinformatics is sequence alignment and usually the first things learned from these are which positions are the most conserved and often these are critical parts of the structure, such as enzyme active site residues. In addition, the contact pairs in a protein usually correspond closely to the correlations between residue positions in the multiple sequence alignment, and these usually change in a systematic and coordinated way, if one position changes then the other member of the pair also changes to compensate. In the present work, these correlated pairs are taken as anchor points for a new type of sequence alignment. The main advantage of the method here is its combining the remote homolog detection from our method PROST with pairwise sequence substitutions in the rigorous method from Kleinjung et al. We show a few examples of some resulting sequence alignments, and how they can lead to improvements in alignments for function, even for a disordered protein.

11.
Res Sq ; 2023 Aug 03.
Artigo em Inglês | MEDLINE | ID: mdl-37577664

RESUMO

Predicting protein variant effects through machine learning is often challenged by the scarcity of experimentally measured effect labels. Recently, protein language models (pLMs) emerge as zero-shot predictors without the need of effect labels, by modeling the evolutionary distribution of functional protein sequences. However, biological contexts important to variant effects are implicitly modeled and effectively marginalized. By assessing the sequence awareness and the structure awareness of pLMs, we find that their improvements often correlate with better variant effect prediction but their tradeoff can present a barrier as observed in over-finetuning to specific family sequences. We introduce a framework of structure-informed pLMs (SI-pLMs) to inject protein structural contexts purposely and controllably, by extending masked sequence denoising in conventional pLMs to cross-modality denoising. Our SI-pLMs are applicable to revising any sequence-only pLMs through model architecture and training objectives. They do not require structure data as model inputs for variant effect prediction and only use structures as context provider and model regularizer during training. Numerical results over deep mutagenesis scanning benchmarks show that our SI-pLMs, despite relatively compact sizes, are robustly top performers against competing methods including other pLMs, regardless of the target protein family's evolutionary information content or the tendency to overfitting / over-finetuning. Learned distributions in structural contexts could enhance sequence distributions in predicting variant effects. Ablation studies reveal major contributing factors and analyses of sequence embeddings provide further insights. The data and scripts are available at https://github.com/Stephen2526/Structure-informed_PLM.git.

12.
J R Soc Interface ; 20(199): 20220707, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36751926

RESUMO

Local and global inference methods have been developed to infer structural contacts from multiple sequence alignments of homologous proteins. They rely on correlations in amino acid usage at contacting sites. Because homologous proteins share a common ancestry, their sequences also feature phylogenetic correlations, which can impair contact inference. We investigate this effect by generating controlled synthetic data from a minimal model where the importance of contacts and of phylogeny can be tuned. We demonstrate that global inference methods, specifically Potts models, are more resilient to phylogenetic correlations than local methods, based on covariance or mutual information. This holds whether or not phylogenetic corrections are used, and may explain the success of global methods. We analyse the roles of selection strength and of phylogenetic relatedness. We show that sites that mutate early in the phylogeny yield false positive contacts. We consider natural data and realistic synthetic data, and our findings generalize to these cases. Our results highlight the impact of phylogeny on contact prediction from protein sequences and illustrate the interplay between the rich structure of biological data and inference.


Assuntos
Algoritmos , Proteínas , Filogenia , Proteínas/química , Sequência de Aminoácidos , Alinhamento de Sequência
13.
Elife ; 122023 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-36734516

RESUMO

Computational models starting from large ensembles of evolutionarily related protein sequences capture a representation of protein families and learn constraints associated to protein structure and function. They thus open the possibility for generating novel sequences belonging to protein families. Protein language models trained on multiple sequence alignments, such as MSA Transformer, are highly attractive candidates to this end. We propose and test an iterative method that directly employs the masked language modeling objective to generate sequences using MSA Transformer. We demonstrate that the resulting sequences score as well as natural sequences, for homology, coevolution, and structure-based measures. For large protein families, our synthetic sequences have similar or better properties compared to sequences generated by Potts models, including experimentally validated ones. Moreover, for small protein families, our generation method based on MSA Transformer outperforms Potts models. Our method also more accurately reproduces the higher-order statistics and the distribution of sequences in sequence space of natural data than Potts models. MSA Transformer is thus a strong candidate for protein sequence generation and protein design.


Assuntos
Proteínas , Alinhamento de Sequência , Proteínas/química , Sequência de Aminoácidos
14.
J Evol Biol ; 36(3): 499-506, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36598184

RESUMO

Amino acid substitution models represent the substitution rates among amino acids during the evolution of protein sequences. The models are a prerequisite for maximum likelihood or Bayesian methods to analyse the phylogenetic relationships among species based on their protein sequences. Estimating amino acid substitution models requires large protein datasets and intensive computation. In this paper, we presented the estimation of both time-reversible model (Q.met) and time non-reversible model (NQ.met) for multicellular animals (Metazoa). Analyses showed that the Q.met and NQ.met models were significantly better than existing models in analysing metazoan protein sequences. Moreover, the time non-reversible model NQ.met enables us to reconstruct the rooted phylogenetic tree for Metazoa. We recommend researchers to employ the Q.met and NQ.met models in analysing metazoan protein sequences.


Assuntos
Evolução Molecular , Proteínas , Animais , Filogenia , Substituição de Aminoácidos , Teorema de Bayes , Modelos Genéticos
15.
Front Genet ; 13: 935717, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36506312

RESUMO

There is a great deal of importance to SNARE proteins, and their absence from function can lead to a variety of diseases. The SNARE protein is known as a membrane fusion protein, and it is crucial for mediating vesicle fusion. The identification of SNARE proteins must therefore be conducted with an accurate method. Through extensive experiments, we have developed a model based on graph-regularized k-local hyperplane distance nearest neighbor model (GHKNN) binary classification. In this, the model uses the physicochemical property extraction method to extract protein sequence features and the SMOTE method to upsample protein sequence features. The combination achieves the most accurate performance for identifying all protein sequences. Finally, we compare the model based on GHKNN binary classification with other classifiers and measure them using four different metrics: SN, SP, ACC, and MCC. In experiments, the model performs significantly better than other classifiers.

16.
J Mol Graph Model ; 117: 108283, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-35994925

RESUMO

Predicting molecular properties and compound-protein interactions (CPIs) are two important areas of drug design and discovery. They are also an essential way to discover lead compounds in virtual screening. Recently, in silico methods based on deep learning have demonstrated excellent performance in various challenges. It is imperative to develop efficient computational methods to predict accurately both molecular properties and CPIs in drug research using deep learning techniques. In this paper, we propose a deep learning method applicable to both molecular property prediction and CPI prediction based on the idea that both are generally influenced by chemical structure and sequence information of compounds and proteins. Molecular properties are inferred by integrating the molecular structure and sequence information of compounds, and CPIs are predicted by integrating protein sequence and compound structure. The method combines topological structure and sequence fingerprint information of molecules, extracts adequately raw data features, and generates highly representative features for prediction. Molecular property prediction experiments were conducted on BACE, P53 and hERG datasets, and CPI prediction experiments were conducted on Human, C. elegans and KIBA datasets. MG-S achieves outperformance in molecular property prediction on P53, the differences in AUC, Precision and MCC are 0.030, 0.050 and 0.100, respectively, over the suboptimal baseline model, and provides consistently good results on BACE and hERG.The model also achieves impressive performance in CPI prediction, the differences in AUC, Precision and MCC on KIBA are 0.141, 0.138, 0.090 and 0.082, respectively, compared with the state-of-the-art models. The comprehensive results show that the MG-S model has higher performance, better classification ability, and faster convergence. MG-S will serve as a useful method to predict compound properties and CPIs in the early stages of drug design and discovery.Our code and datasets are available at: https://github.com/happay-ending/cpi_cpp.


Assuntos
Aprendizado Profundo , Animais , Humanos , Sequência de Aminoácidos , Caenorhabditis elegans , Proteína Supressora de Tumor p53
17.
Sensors (Basel) ; 22(14)2022 Jul 13.
Artigo em Inglês | MEDLINE | ID: mdl-35890916

RESUMO

One of the hallmarks of diabetes is an increased modification of cellular proteins. The most prominent type of modification stems from the reaction of methylglyoxal with arginine and lysine residues, leading to structural and functional impairments of target proteins. For lysine glycation, several algorithms allow a prediction of occurrence; thus, making it possible to pinpoint likely targets. However, according to our knowledge, no approaches have been published for predicting the likelihood of arginine glycation. There are indications that arginine and not lysine is the most prominent target for the toxic dialdehyde. One of the reasons why there is no arginine glycation predictor is the limited availability of quantitative data. Here, we used a recently published high-quality dataset of arginine modification probabilities to employ an artificial neural network strategy. Despite the limited data availability, our results achieve an accuracy of about 75% of correctly predicting the exact value of the glycation probability of an arginine-containing peptide without setting thresholds upon whether it is decided if a given arginine is modified or not. This contribution suggests a solution for predicting arginine glycation of short peptides.


Assuntos
Arginina , Produtos Finais de Glicação Avançada , Produtos Finais de Glicação Avançada/química , Lisina/química , Redes Neurais de Computação , Peptídeos/química , Proteínas , Aldeído Pirúvico/química , Aldeído Pirúvico/metabolismo
18.
Front Genet ; 13: 874397, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35669192

RESUMO

Calcium-dependent protein kinases (CDPKs) are a class of serine/threonine protein kinases encoded by several gene families that play key roles in stress response and plant growth and development. In this study, the BLAST method was used to search for protein sequences of the potato Calcium-dependent protein kinase gene family. The chromosome location, phylogeny, gene structures, gene duplication, cis-acting elements, protein-protein interaction, and expression profiles were analyzed. Twenty-five CDPK genes in the potato genome were identified based on RNA-seq data and were clustered into four groups (I-IV) based on their structural features and phylogenetic analysis. The result showed the composition of the promoter region of the StCDPKs gene, including light-responsive elements such as Box4, hormone-responsive elements such as ABRE, and stress-responsive elements such as MBS. Four pairs of segmental duplications were found in StCDPKs genes and the Ka/Ks ratios were below 1, indicating a purifying selection of the genes. The protein-protein interaction network revealed defense-related proteins such as; respiratory burst oxidase homologs (RBOHs) interacting with potato CDPKs. Transcript abundance was measured via RT-PCR between the two cultivars and their relative expression of CDPK genes was analyzed after 15, 20, and 25 days of drought. There were varied expression patterns of StCDPK3/13/21 and 23, between the two potato cultivars under mannitol induced-drought conditions. Correlation analysis showed that StCDPK21/22 and StCDPK3 may be the major differentially expressed genes involved in the regulation of malondialdehyde (MDA) and proline content in response to drought stress, opening a new research direction for genetic improvement of drought resistance in potato.

19.
BMC Plant Biol ; 22(1): 227, 2022 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-35501681

RESUMO

BACKGROUND: Creeping bentgrass (Agrostis soionifera) is a perennial grass of Gramineae, belonging to cold season turfgrass, but has poor disease resistance. Up to now, little is known about the induced systemic resistance (ISR) mechanism, especially the relevant functional proteins, which is important to disease resistance of turfgrass. Achieving more information of proteins of infected creeping bentgrass is helpful to understand the ISR mechanism. RESULTS: With BDO treatment, creeping bentgrass seedlings were grown, and the ISR response was induced by infecting Rhizoctonia solani. High-quality protein sequences of creeping bentgrass seedlings were obtained. Some of protein sequences were functionally annotated according to the database alignment while a large part of the obtained protein sequences was left non-annotated. To treat the non-annotated sequences, a prediction model based on convolutional neural network was established with the dataset from Uniport database in three domains to acquire good performance, especially the higher false positive control rate. With established model, the non-annotated protein sequences of creeping bentgrass were analyzed to annotate proteins relevant to disease-resistance response and signal transduction. CONCLUSIONS: The prediction model based on convolutional neural network was successfully applied to select good candidates of the proteins with functions relevant to the ISR mechanism from the protein sequences which cannot be annotated by database alignment. The waste of sequence data can be avoided, and research time and labor will be saved in further research of protein of creeping bentgrass by molecular biology technology. It also provides reference for other sequence analysis of turfgrass disease-resistance research.


Assuntos
Agrostis , Agrostis/genética , Sequência de Aminoácidos , Resistência à Doença , Redes Neurais de Computação , Poaceae/genética , Plântula
20.
Artigo em Inglês | MEDLINE | ID: mdl-35627437

RESUMO

SARS-CoV-2 (COVID-19) has been one of the worst global health crises in the 21st century. The currently available rollout vaccines are not 100% effective for COVID-19 due to the evolving nature of the virus. There is a real need for a concerted effort to fight the virus, and research from diverse fields must contribute. Artificial intelligence-based approaches have proven to be significantly effective in every branch of our daily lives, including healthcare and medical domains. During the early days of this pandemic, artificial intelligence (AI) was utilized in the fight against this virus outbreak and it has played a major role in containing the spread of the virus. It provided innovative opportunities to speed up the development of disease interventions. Several methods, models, AI-based devices, robotics, and technologies have been proposed and utilized for diverse tasks such as surveillance, spread prediction, peak time prediction, classification, hospitalization, healthcare management, heath system capacity, etc. This paper attempts to provide a quick, concise, and precise survey of the state-of-the-art AI-based techniques, technologies, and datasets used in fighting COVID-19. Several domains, including forecasting, surveillance, dynamic times series forecasting, spread prediction, genomics, compute vision, peak time prediction, the classification of medical imaging-including CT and X-ray and how they can be processed-and biological data (genome and protein sequences) have been investigated. An overview of the open-access computational resources and platforms is given and their useful tools are pointed out. The paper presents the potential research areas in AI and will thus encourage researchers to contribute to fighting against the virus and aid global health by slowing down the spread of the virus. This will be a significant contribution to help minimize the high death rate across the globe.


Assuntos
COVID-19 , Robótica , Inteligência Artificial , COVID-19/epidemiologia , Atenção à Saúde , Humanos , SARS-CoV-2
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA