Pesquisa | Biblioteca Virtual em Saúde

Predicting novel mosquito-associated viruses from metatranscriptomic dark matter.

de Andrade, Amanda Araújo Serrão; Brustolini, Otávio; Grivet, Marco; Schrago, Carlos G; Vasconcelos, Ana Tereza Ribeiro.

NAR Genom Bioinform ; 6(3): lqae077, 2024 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-38962253

RESUMO

The exponential growth of metatranscriptomic studies dedicated to arboviral surveillance in mosquitoes has yielded an unprecedented volume of unclassified sequences referred to as the virome dark matter. Mosquito-associated viruses are classified based on their host range into Mosquito-specific viruses (MSV) or Arboviruses. While MSV replication is restricted to mosquito cells, Arboviruses infect both mosquito vectors and vertebrate hosts. We developed the MosViR pipeline designed to identify complex genomic discriminatory patterns for predicting novel MSV or Arboviruses from viral contigs as short as 500 bp. The pipeline combines the predicted probability score from multiple predictive models, ensuring a robust classification with Area Under ROC (AUC) values exceeding 0.99 for test datasets. To assess the practical utility of MosViR in actual cases, we conducted a comprehensive analysis of 24 published mosquito metatranscriptomic datasets. By mining this metatranscriptomic dark matter, we identified 605 novel mosquito-associated viruses, with eight putative novel Arboviruses exhibiting high probability scores. Our findings highlight the limitations of current homology-based identification methods and emphasize the potentially transformative impact of the MosViR pipeline in advancing the classification of mosquito-associated viruses. MosViR offers a powerful and highly accurate tool for arboviral surveillance and for elucidating the complexities of the mosquito RNA virome.

(m, n)-mer-a simple statistical feature for sequence classification.

de Andrade, Amanda Araújo Serrão; Grivet, Marco; Brustolini, Otávio; Vasconcelos, Ana Tereza Ribeiro.

Bioinform Adv ; 3(1): vbad088, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37448814

RESUMO

Summary: The (m, n)-mer is a simple alternative classification feature based on conditional probability distributions. In this application note, we compared k-mer and (m, n)-mer frequency features in 11 distinct datasets used for binary, multiclass and clustering classifications. Our findings show that the (m, n)-mer frequency features are related to the highest performance metrics and often statistically outperformed the k-mers. Here, the (m, n)-mer frequencies improved performance for classifying smaller sequence lengths (as short as 300 bp) and yielded higher metrics when using short values of k (ranging from 2 to 4). Therefore, we present the (m, n)-mers frequencies to the scientific community as a feature that seems to be quite effective in identifying complex discriminatory patterns and classifying polyphyletic sequence groups. Availability and implementation: The (m, n)-mer algorithm is released as an R package within the CRAN project (https://cran.r-project.org/web/packages/mnmer) and is also available at https://github.com/labinfo-lncc/mnmer. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

The generation and utilization of a cancer-oriented representation of the human transcriptome by using expressed sequence tags.

Brentani, Helena; Caballero, Otávia L; Camargo, Anamaria A; da Silva, Aline M; da Silva, Wilson Araújo; Dias Neto, Emmanuel; Grivet, Marco; Gruber, Arthur; Guimaraes, Pedro Edson Moreira; Hide, Winston; Iseli, Christian; Jongeneel, C Victor; Kelso, Janet; Nagai, Maria Aparecida; Ojopi, Elida Paula Benquique; Osorio, Elisson C; Reis, Eduardo M R; Riggins, Gregory J; Simpson, Andrew John George; de Souza, Sandro; Stevenson, Brian J; Strausberg, Robert L; Tajara, Eloiza H; Verjovski-Almeida, Sergio; Acencio, Marcio Luis; Bengtson, Mário Henrique; Bettoni, Fabiana; Bodmer, Walter F; Briones, Marcelo R S; Camargo, Luiz Paulo; Cavenee, Webster; Cerutti, Janete M; Coelho Andrade, Luis Eduardo; Costa dos Santos, Paulo César; Ramos Costa, Maria Cristina; da Silva, Israel Tojal; Estécio, Marcos Roberto H; Sa Ferreira, Karine; Furnari, Frank B; Faria, Milton; Galante, Pedro A F; Guimaraes, Gustavo S; Holanda, Adriano Jesus; Kimura, Edna Teruko; Leerkes, Maarten R; Lu, Xin; Maciel, Rui M B; Martins, Elizabeth A L; Massirer, Katlin Brauer; Melo, Analy S A.

Proc Natl Acad Sci U S A ; 100(23): 13418-23, 2003 Nov 11.

Artigo em Inglês | MEDLINE | ID: mdl-14593198

RESUMO

Whereas genome sequencing defines the genetic potential of an organism, transcript sequencing defines the utilization of this potential and links the genome with most areas of biology. To exploit the information within the human genome in the fight against cancer, we have deposited some two million expressed sequence tags (ESTs) from human tumors and their corresponding normal tissues in the public databases. The data currently define approximately 23,500 genes, of which only approximately 1,250 are still represented only by ESTs. Examination of the EST coverage of known cancer-related (CR) genes reveals that <1% do not have corresponding ESTs, indicating that the representation of genes associated with commonly studied tumors is high. The careful recording of the origin of all ESTs we have produced has enabled detailed definition of where the genes they represent are expressed in the human body. More than 100,000 ESTs are available for seven tissues, indicating a surprising variability of gene usage that has led to the discovery of a significant number of genes with restricted expression, and that may thus be therapeutically useful. The ESTs also reveal novel nonsynonymous germline variants (although the one-pass nature of the data necessitates careful validation) and many alternatively spliced transcripts. Although widely exploited by the scientific community, vindicating our totally open source policy, the EST data generated still provide extensive information that remains to be systematically explored, and that may further facilitate progress toward both the understanding and treatment of human cancers.

Assuntos

Etiquetas de Sequências Expressas , Regulação Neoplásica da Expressão Gênica , Neoplasias/genética , Proteoma , RNA Mensageiro/metabolismo , Mapeamento Cromossômico , Bases de Dados Genéticas , Variação Genética , Humanos , Neoplasias/metabolismo , Polimorfismo de Nucleotídeo Único , Distribuição Tecidual

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA