Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
Add more filters










Database
Language
Publication year range
1.
Comput Struct Biotechnol J ; 21: 86-98, 2023.
Article in English | MEDLINE | ID: mdl-36514333

ABSTRACT

Analysis of differential gene expression from RNA-seq data has become a standard for several research areas. The steps for the computational analysis include many data types and file formats, and a wide variety of computational tools that can be applied alone or together as pipelines. This paper presents a review of the differential expression analysis pipeline, addressing its steps and the respective objectives, the principal methods available in each step, and their properties, therefore introducing an organized overview to this context. This review aims to address mainly the aspects involved in the differentially expressed gene (DEG) analysis from RNA sequencing data (RNA-seq), considering the computational methods. In addition, a timeline of the computational methods for DEG is shown and discussed, and the relationships existing between the most important computational tools are presented by an interaction network. A discussion on the challenges and gaps in DEG analysis is also highlighted in this review. This paper will serve as a tutorial for new entrants into the field and help established users update their analysis pipelines.

2.
PLoS One ; 17(1): e0261200, 2022.
Article in English | MEDLINE | ID: mdl-35041687

ABSTRACT

The growth and popularization of platforms on scientific production has been the subject of several studies, producing relevant analyses of co-authorship behavior among groups of researchers. Researchers and their scientific productions can be analysed as co-authorship social networks, so researchers are linked through common publications. In this context, co-authoring networks can be analysed to find patterns that can describe or characterize them. This work presents the analysis and characterization of co-authorship networks of academic Brazilian graduate programs in computer science. Data from Brazilian researchers were collected and modeled as co-authoring networks among the graduate programs that researchers take part in. Each network topology was analysed with complex network measurements and three proposed qualitative indices that evaluate the publication's quality. In addition, the co-authorship networks of the computer science graduate programs were characterized in relation to the assessment received by CAPES, which attributes a qualitative grade to the graduate programs in Brazil. The results show the most relevant topological measurements for the program's characterization and the evaluations received by the programs in different qualitative degrees, relating the main topological patterns of the co-authorship networks and the CAPES grades of the Brazilian graduate programs in computer science.


Subject(s)
Bibliometrics , Brazil
3.
Methods Mol Biol ; 2362: 147-172, 2021.
Article in English | MEDLINE | ID: mdl-34195962

ABSTRACT

This chapter provides two main contributions: (1) a description of computational tools and databases used to identify and analyze transposable elements (TEs) and circRNAs in plants; and (2) data analysis on public TE and circRNA data. Our goal is to highlight the primary information available in the literature on circular noncoding RNAs and transposable elements in plants. The exploratory analysis performed on publicly available circRNA and TEs data help discuss four sequence features. Finally, we investigate the association on circRNAs:TE in plants in the model organism Arabidopsis thaliana.


Subject(s)
Arabidopsis , DNA Transposable Elements , Arabidopsis/genetics , Computational Biology , DNA Transposable Elements/genetics , Plants/genetics , RNA, Circular
4.
FEMS Microbiol Lett ; 366(11)2019 06 01.
Article in English | MEDLINE | ID: mdl-30860585

ABSTRACT

Bradyrhizobium diazoefficiens CPAC 7 and Bradyrhizobium japonicum CPAC 15 are broadly used in commercial inoculants in Brazil, contributing to most of the nitrogen required by the soybean crop. These strains differ in their symbiotic properties: CPAC 7 is more efficient in fixing nitrogen, whereas CPAC 15 is more competitive. Comparative genomics revealed many transposases close to genes associated with symbiosis in the symbiotic island of these strains. Given the importance that insertion sequences (IS) elements have to bacterial genomes, we focused on identifying the local impact of these elements in the genomes of these and other related Bradyrhizobium strains to further understand their phenotypic differences. Analyses were performed using bioinformatics approaches. We found IS elements disrupting and inserted at regulatory regions of genes involved in symbiosis. Further comparative analyses with 21 Bradyrhizobium genomes revealed insertional polymorphism with distinguishing patterns between B. diazoefficiens and B. japonicum lineages. Finally, 13 of these potentially impacted genes are differentially expressed under symbiotic conditions in B. diazoefficiens USDA 110. Thus, IS elements are associated with the diversity of Bradyrhizobium, possibly by providing mechanisms for natural variation of symbiotic effectiveness.


Subject(s)
Bradyrhizobium/genetics , Bradyrhizobium/metabolism , DNA Transposable Elements/genetics , Glycine max/microbiology , Computational Biology , Genomic Islands/genetics , Nitrogen Fixation/genetics , Nitrogen Fixation/physiology
5.
Nucleic Acids Res ; 46(16): e96, 2018 09 19.
Article in English | MEDLINE | ID: mdl-29873784

ABSTRACT

With the emergence of Next Generation Sequencing (NGS) technologies, a large volume of sequence data in particular de novo sequencing was rapidly produced at relatively low costs. In this context, computational tools are increasingly important to assist in the identification of relevant information to understand the functioning of organisms. This work introduces BASiNET, an alignment-free tool for classifying biological sequences based on the feature extraction from complex network measurements. The method initially transform the sequences and represents them as complex networks. Then it extracts topological measures and constructs a feature vector that is used to classify the sequences. The method was evaluated in the classification of coding and non-coding RNAs of 13 species and compared to the CNCI, PLEK and CPC2 methods. BASiNET outperformed all compared methods in all adopted organisms and datasets. BASiNET have classified sequences in all organisms with high accuracy and low standard deviation, showing that the method is robust and non-biased by the organism. The proposed methodology is implemented in open source in R language and freely available for download at https://cran.r-project.org/package=BASiNET.


Subject(s)
Computational Biology/methods , High-Throughput Nucleotide Sequencing/methods , RNA, Long Noncoding/genetics , RNA, Messenger/genetics , Sequence Analysis, RNA/methods , Algorithms , Internet , Reproducibility of Results , Software
6.
PLoS One ; 12(12): e0190152, 2017.
Article in English | MEDLINE | ID: mdl-29267363

ABSTRACT

The correct identification of differentially expressed genes (DEGs) between specific conditions is a key in the understanding phenotypic variation. High-throughput transcriptome sequencing (RNA-Seq) has become the main option for these studies. Thus, the number of methods and softwares for differential expression analysis from RNA-Seq data also increased rapidly. However, there is no consensus about the most appropriate pipeline or protocol for identifying differentially expressed genes from RNA-Seq data. This work presents an extended review on the topic that includes the evaluation of six methods of mapping reads, including pseudo-alignment and quasi-mapping and nine methods of differential expression analysis from RNA-Seq data. The adopted methods were evaluated based on real RNA-Seq data, using qRT-PCR data as reference (gold-standard). As part of the results, we developed a software that performs all the analysis presented in this work, which is freely available at https://github.com/costasilvati/consexpression. The results indicated that mapping methods have minimal impact on the final DEGs analysis, considering that adopted data have an annotated reference genome. Regarding the adopted experimental model, the DEGs identification methods that have more consistent results were the limma+voom, NOIseq and DESeq2. Additionally, the consensus among five DEGs identification methods guarantees a list of DEGs with great accuracy, indicating that the combination of different methods can produce more suitable results. The consensus option is also included for use in the available software.


Subject(s)
Sequence Analysis, RNA/methods , Software , Gene Expression , Humans
7.
J Comput Biol ; 24(11): 1125-1133, 2017 Nov.
Article in English | MEDLINE | ID: mdl-28570142

ABSTRACT

Recently, there has been an increase in the number of whole bacterial genomes sequenced, mainly due to the advancing of next-generation sequencing technologies. In face of this, there is a need to provide new analytical alternatives that can follow this advance. Given our current knowledge about the genomic plasticity of bacteria and that those genomic regions can uncover important features about this microorganism, our goal was to develop a fast methodology based on maximum entropy (ME) to guide the researcher to regions that could be prioritized during the analysis. This methodology was compared with other available methods. In addition, ME was applied to eight different bacterial genera. The methodology consists of two main steps: processing the nucleotide sequence and ME calculation. We applied ME to Xanthomonas axonopodis pv. citri 306 (XAC) and Xanthomonas campestris pv. campestris ATCC 33913 (XCC), both of which have their anomalous regions well documented. We then compared our results against those from Alien Hunter, HGT-DB, Islander, IslandPath, and SIGI-HMM. ME was shown to be superior in terms of efficiency and analysis duration. Besides, ME only needs the genome sequence in FASTA format as input. The proposed strategy based on ME is able to help in bacterial genome exploration. This is a simple and fast strategy for individual genomes in comparison with other available methods, without relying on previous annotation and alignments. This methodology can also be a new option in the early stages of analysis of newly sequenced bacterial genomes.


Subject(s)
DNA, Bacterial/genetics , Entropy , Genome, Bacterial , Genomics/methods , Xanthomonas/genetics , Xanthomonas/classification
8.
BMC Syst Biol ; 5: 61, 2011 May 05.
Article in English | MEDLINE | ID: mdl-21545720

ABSTRACT

BACKGROUND: The inference of gene regulatory networks (GRNs) from large-scale expression profiles is one of the most challenging problems of Systems Biology nowadays. Many techniques and models have been proposed for this task. However, it is not generally possible to recover the original topology with great accuracy, mainly due to the short time series data in face of the high complexity of the networks and the intrinsic noise of the expression measurements. In order to improve the accuracy of GRNs inference methods based on entropy (mutual information), a new criterion function is here proposed. RESULTS: In this paper we introduce the use of generalized entropy proposed by Tsallis, for the inference of GRNs from time series expression profiles. The inference process is based on a feature selection approach and the conditional entropy is applied as criterion function. In order to assess the proposed methodology, the algorithm is applied to recover the network topology from temporal expressions generated by an artificial gene network (AGN) model as well as from the DREAM challenge. The adopted AGN is based on theoretical models of complex networks and its gene transference function is obtained from random drawing on the set of possible Boolean functions, thus creating its dynamics. On the other hand, DREAM time series data presents variation of network size and its topologies are based on real networks. The dynamics are generated by continuous differential equations with noise and perturbation. By adopting both data sources, it is possible to estimate the average quality of the inference with respect to different network topologies, transfer functions and network sizes. CONCLUSIONS: A remarkable improvement of accuracy was observed in the experimental results by reducing the number of false connections in the inferred topology by the non-Shannon entropy. The obtained best free parameter of the Tsallis entropy was on average in the range 2.5 ≤ q ≤ 3.5 (hence, subextensive entropy), which opens new perspectives for GRNs inference methods based on information theory and for investigation of the nonextensivity of such networks. The inference algorithm and criterion function proposed here were implemented and included in the DimReduction software, which is freely available at http://sourceforge.net/projects/dimreduction and http://code.google.com/p/dimreduction/.


Subject(s)
Computational Biology/methods , Entropy , Gene Regulatory Networks , Models, Genetic , Time Factors
9.
BMC Bioinformatics ; 9: 451, 2008 Oct 22.
Article in English | MEDLINE | ID: mdl-18945362

ABSTRACT

BACKGROUND: Feature selection is a pattern recognition approach to choose important variables according to some criteria in order to distinguish or explain certain phenomena (i.e., for dimensionality reduction). There are many genomic and proteomic applications that rely on feature selection to answer questions such as selecting signature genes which are informative about some biological state, e.g., normal tissues and several types of cancer; or inferring a prediction network among elements such as genes, proteins and external stimuli. In these applications, a recurrent problem is the lack of samples to perform an adequate estimate of the joint probabilities between element states. A myriad of feature selection algorithms and criterion functions have been proposed, although it is difficult to point the best solution for each application. RESULTS: The intent of this work is to provide an open-source multiplatform graphical environment for bioinformatics problems, which supports many feature selection algorithms, criterion functions and graphic visualization tools such as scatterplots, parallel coordinates and graphs. A feature selection approach for growing genetic networks from seed genes (targets or predictors) is also implemented in the system. CONCLUSION: The proposed feature selection environment allows data analysis using several algorithms, criterion functions and graphic visualization tools. Our experiments have shown the software effectiveness in two distinct types of biological problems. Besides, the environment can be used in different pattern recognition applications, although the main concern regards bioinformatics tasks.


Subject(s)
Computational Biology/methods , Genomics/methods , Pattern Recognition, Automated/methods , Software , Algorithms , Bayes Theorem , Data Interpretation, Statistical , Internet , Markov Chains , Models, Genetic , Reproducibility of Results , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...