RESUMO
Despite Brazil's tradition of successful mass immunization programs, the country has been experiencing alarming declines in vaccination coverage, especially among children. That is aggravated by the growth of anti-vaccine movements and the spread of health misinformation in social media in the last decade, which have worsened during the COVID-19 outbreak. Several reports link populism and far-right politicians to anti-vaccination support worldwide, which was also the case in Brazil during president Jair Bolsonaro's administration. This project aimed to identify the circulating pro and anti-vaccine narratives in Portuguese on Twitter, during a crucial decision-making period regarding childhood vaccination in Brazil, from December 9, 2021, until February 9, 2022. From the over one million tweets and four million retweets collected, we identified two well-defined groups, one in favor and another against vaccination. Within the sample, we selected 1500 influencer tweets with the highest impact (>500 retweets) and conducted content analysis. Although the pro-vaccine influencers were more retweeted than anti-vaxxer ones, we observed that anti-vaccine movements were more succesful in framing discussions on Twitter. The subject of COVID-19 was the target of political polarization embedded in populist, anti-science and anti-traditional media discourses promoted by anti-vaxxers. As a counterpart, the pro-vaccine influencers reacted inarticulately, focusing on criticizing the anti-vaccination actors, attitudes, and policies instead of promoting vaccines. Based on reults, we claim that a well-coordinated network of health communicators from science centers and health institutions, in partnership with properly briefed social media influencers and fact-checking sources, would more efectively pre-tempt the public about vaccine misinformation.
Assuntos
COVID-19 , Mídias Sociais , Vacinas , Criança , Humanos , Brasil/epidemiologia , COVID-19/prevenção & controle , Vacinas/efeitos adversos , VacinaçãoRESUMO
DNA sequencers output a large set of very long biological data strings that we should persist in databases rather than basic text file systems. Many different data models and database management systems (DBMS) may deal with both storage and efficiency issues regarding genomic datasets. Specifically, there is a need for handling strings with variable sizes while keeping their biological meaning. Relational database management systems (RDBMS) provide several data types that could be further explored for the genomics context. Besides, they enforce integrity, consistency, and enable good abstractions for more conventional data. We propose the relational text data type to represent and manipulate biological sequences and their derivatives. We present a logical schema for representing the core biological information, which may be inferred from a given biological conceptual data schema and the corresponding function manipulations. We implement and evaluate these stored functions into an actual RDBMS for both efficacy and efficiency. We show that it is possible to enforce basic and complex requirements for the genomic domain. We claim that the well-established relational text data type in RDBMS may appropriately handle the representation and persistency of biological sequences. We base our approach on the idea of domain-specific abstract data types that can store data with semantically defined functions while hiding those details from non-technical end-users.
RESUMO
BACKGROUND: The amount of data and behavior changes in society happens at a swift pace in this interconnected world. Consequently, machine learning algorithms lose accuracy because they do not know these new patterns. This change in the data pattern is known as concept drift. There exist many approaches for dealing with these drifts. Usually, these methods are costly to implement because they require (i) knowledge of drift detection algorithms, (ii) software engineering strategies, and (iii) continuous maintenance concerning new drifts. RESULTS: This article proposes to create Driftage: a new framework using multi-agent systems to simplify the implementation of concept drift detectors considerably and divide concept drift detection responsibilities between agents, enhancing explainability of each part of drift detection. As a case study, we illustrate our strategy using a muscle activity monitor of electromyography. We show a reduction in the number of false-positive drifts detected, improving detection interpretability, and enabling concept drift detectors' interactivity with other knowledge bases. CONCLUSION: We conclude that using Driftage, arises a new paradigm to implement concept drift algorithms with multi-agent architecture that contributes to split drift detection responsability, algorithms interpretability and more dynamic algorithms adaptation.
Assuntos
Algoritmos , Aprendizado de Máquina , SoftwareRESUMO
Scientific workflows can be understood as arrangements of managed activities executed by different processing entities. It is a regular Bioinformatics approach applying workflows to solve problems in Molecular Biology, notably those related to sequence analyses. Due to the nature of the raw data and the in silico environment of Molecular Biology experiments, apart from the research subject, 2 practical and closely related problems have been studied: reproducibility and computational environment. When aiming to enhance the reproducibility of Bioinformatics experiments, various aspects should be considered. The reproducibility requirements comprise the data provenance, which enables the acquisition of knowledge about the trajectory of data over a defined workflow, the settings of the programs, and the entire computational environment. Cloud computing is a booming alternative that can provide this computational environment, hiding technical details, and delivering a more affordable, accessible, and configurable on-demand environment for researchers. Considering this specific scenario, we proposed a solution to improve the reproducibility of Bioinformatics workflows in a cloud computing environment using both Infrastructure as a Service (IaaS) and Not only SQL (NoSQL) database systems. To meet the goal, we have built 3 typical Bioinformatics workflows and ran them on 1 private and 2 public clouds, using different types of NoSQL database systems to persist the provenance data according to the Provenance Data Model (PROV-DM). We present here the results and a guide for the deployment of a cloud environment for Bioinformatics exploring the characteristics of various NoSQL database systems to persist provenance data.
RESUMO
Non-coding RNAs (ncRNAs) constitute an important set of transcripts produced in the cells of organisms. Among them, there is a large amount of a particular class of long ncRNAs that are difficult to predict, the so-called long intergenic ncRNAs (lincRNAs), which might play essential roles in gene regulation and other cellular processes. Despite the importance of these lincRNAs, there is still a lack of biological knowledge and, currently, the few computational methods considered are so specific that they cannot be successfully applied to other species different from those that they have been originally designed to. Prediction of lncRNAs have been performed with machine learning techniques. Particularly, for lincRNA prediction, supervised learning methods have been explored in recent literature. As far as we know, there are no methods nor workflows specially designed to predict lincRNAs in plants. In this context, this work proposes a workflow to predict lincRNAs on plants, considering a workflow that includes known bioinformatics tools together with machine learning techniques, here a support vector machine (SVM). We discuss two case studies that allowed to identify novel lincRNAs, in sugarcane (Saccharum spp.) and in maize (Zea mays). From the results, we also could identify differentially-expressed lincRNAs in sugarcane and maize plants submitted to pathogenic and beneficial microorganisms.
RESUMO
Studies have highlighted the importance of non-coding RNA regulation in plant-microbe interaction. However, the roles of sugarcane microRNAs (miRNAs) in the regulation of disease responses have not been investigated. Firstly, we screened the sRNA transcriptome of sugarcane infected with Acidovorax avenae. Conserved and novel miRNAs were identified. Additionally, small interfering RNAs (siRNAs) were aligned to differentially expressed sequences from the sugarcane transcriptome. Interestingly, many siRNAs aligned to a transcript encoding a copper-transporter gene whose expression was induced in the presence of A. avenae, while the siRNAs were repressed in the presence of A. avenae. Moreover, a long intergenic non-coding RNA was identified as a potential target or decoy of miR408. To extend the bioinformatics analysis, we carried out independent inoculations and the expression patterns of six miRNAs were validated by quantitative reverse transcription-PCR (qRT-PCR). Among these miRNAs, miR408-a copper-microRNA-was downregulated. The cleavage of a putative miR408 target, a laccase, was confirmed by a modified 5'RACE (rapid amplification of cDNA ends) assay. MiR408 was also downregulated in samples infected with other pathogens, but it was upregulated in the presence of a beneficial diazotrophic bacteria. Our results suggest that regulation by miR408 is important in sugarcane sensing whether microorganisms are either pathogenic or beneficial, triggering specific miRNA-mediated regulatory mechanisms accordingly.
RESUMO
Sugarcane is an important tropical crop mainly cultivated to produce ethanol and sugar. Crop productivity is negatively affected by Acidovorax avenae subsp avenae (Aaa), which causes the red stripe disease. Little is known about the molecular mechanisms triggered in response to the infection. We have investigated the molecular mechanism activated in sugarcane using a RNA-seq approach. We have produced a de novo transcriptome assembly (TR7) from sugarcane RNA-seq libraries submitted to drought and infection with Aaa. Together, these libraries present 247 million of raw reads and resulted in 168,767 reference transcripts. Mapping in TR7 of reads obtained from infected libraries, revealed 798 differentially expressed transcripts, of which 723 were annotated, corresponding to 467 genes. GO and KEGG enrichment analysis showed that several metabolic pathways, such as code for proteins response to stress, metabolism of carbohydrates, processes of transcription and translation of proteins, amino acid metabolism and biosynthesis of secondary metabolites were significantly regulated in sugarcane. Differential analysis revealed that genes in the biosynthetic pathways of ET and JA PRRs, oxidative burst genes, NBS-LRR genes, cell wall fortification genes, SAR induced genes and pathogenesis-related genes (PR) were upregulated. In addition, 20 genes were validated by RT-qPCR. Together, these data contribute to a better understanding of the molecular mechanisms triggered by the Aaa in sugarcane and opens the opportunity for the development of molecular markers associated with disease tolerance in breeding programs.
Assuntos
Comamonadaceae/crescimento & desenvolvimento , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica de Plantas , Saccharum/genética , Transcriptoma/genética , Comamonadaceae/fisiologia , Ontologia Genética , Interações Hospedeiro-Patógeno , Anotação de Sequência Molecular , Doenças das Plantas/genética , Doenças das Plantas/microbiologia , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Saccharum/microbiologia , Análise de Sequência de RNA/métodosRESUMO
Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB.
RESUMO
MOTIVATION: Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith-Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. AVAILABILITY: The database can be accessed through http://proteinworlddb.org
Assuntos
Bases de Dados de Proteínas , Genômica/métodos , Proteínas/química , Alinhamento de Sequência/métodos , Software , Algoritmos , Genoma , Filogenia , Proteínas/genéticaRESUMO
There are many bioinformatics tools that deal with input/output (I/O) issues by using filing systems from the most common operating systems, such as Linux or MS Windows. However, as data volumes increase, there is a need for more efficient disk access, ad hoc memory management and specific page-replacement policies. We propose a device driver that can be used by multiple applications. It keeps the application code unchanged, providing a non-intrusive and flexible strategy for I/O calls that may be adopted in a straightforward manner. With our approach, database developers can define their own I/O management strategies. We used our device driver to manage Basic Local Alignment Search Tool (BLAST) I/O calls. Based on preliminary experimental results with National Center for Biotechnology Information (NCBI) BLAST, this approach can provide database management systems-like data management features, which may be used for BLAST and many other computational biology applications.