Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 49
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 24(4)2023 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-37328692

RESUMEN

Protein complexes are key functional units in cellular processes. High-throughput techniques, such as co-fractionation coupled with mass spectrometry (CF-MS), have advanced protein complex studies by enabling global interactome inference. However, dealing with complex fractionation characteristics to define true interactions is not a simple task, since CF-MS is prone to false positives due to the co-elution of non-interacting proteins by chance. Several computational methods have been designed to analyze CF-MS data and construct probabilistic protein-protein interaction (PPI) networks. Current methods usually first infer PPIs based on handcrafted CF-MS features, and then use clustering algorithms to form potential protein complexes. While powerful, these methods suffer from the potential bias of handcrafted features and severely imbalanced data distribution. However, the handcrafted features based on domain knowledge might introduce bias, and current methods also tend to overfit due to the severely imbalanced PPI data. To address these issues, we present a balanced end-to-end learning architecture, Software for Prediction of Interactome with Feature-extraction Free Elution Data (SPIFFED), to integrate feature representation from raw CF-MS data and interactome prediction by convolutional neural network. SPIFFED outperforms the state-of-the-art methods in predicting PPIs under the conventional imbalanced training. When trained with balanced data, SPIFFED had greatly improved sensitivity for true PPIs. Moreover, the ensemble SPIFFED model provides different voting schemes to integrate predicted PPIs from multiple CF-MS data. Using the clustering software (i.e. ClusterONE), SPIFFED allows users to infer high-confidence protein complexes depending on the CF-MS experimental designs. The source code of SPIFFED is freely available at: https://github.com/bio-it-station/SPIFFED.


Asunto(s)
Mapeo de Interacción de Proteínas , Proteínas , Mapeo de Interacción de Proteínas/métodos , Proteínas/química , Algoritmos , Mapas de Interacción de Proteínas , Programas Informáticos
2.
Brief Bioinform ; 24(3)2023 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-37088981

RESUMEN

BACKGROUND: Ubiquitous presence of short extrachromosomal circular DNAs (eccDNAs) in eukaryotic cells has perplexed generations of biologists. Their widespread origins in the genome lacking apparent specificity led some studies to conclude their formation as random or near-random. Despite this, the search for specific formation of short eccDNA continues with a recent surge of interest in biomarker development. RESULTS: To shed new light on the conflicting views on short eccDNAs' randomness, here we present DeepCircle, a bioinformatics framework incorporating convolution- and attention-based neural networks to assess their predictability. Short human eccDNAs from different datasets indeed have low similarity in genomic locations, but DeepCircle successfully learned shared DNA sequence features to make accurate cross-datasets predictions (accuracy: convolution-based models: 79.65 ± 4.7%, attention-based models: 83.31 ± 4.18%). CONCLUSIONS: The excellent performance of our models shows that the intrinsic predictability of eccDNAs is encoded in the sequences across tissue origins. Our work demonstrates how the perceived lack of specificity in genomics data can be re-assessed by deep learning models to uncover unexpected similarity.


Asunto(s)
ADN Circular , ADN , Humanos , Genoma , Células Eucariotas , Biomarcadores
3.
Nucleic Acids Res ; 50(W1): W616-W622, 2022 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-35536289

RESUMEN

With the proliferation of genomic sequence data for biomedical research, the exploration of human genetic information by domain experts requires a comprehensive interrogation of large numbers of scientific publications in PubMed. However, a query in PubMed essentially provides search results sorted only by the date of publication. A search engine for retrieving and interpreting complex relations between biomedical concepts in scientific publications remains lacking. Here, we present pubmedKB, a web server designed to extract and visualize semantic relationships between four biomedical entity types: variants, genes, diseases, and chemicals. pubmedKB uses state-of-the-art natural language processing techniques to extract semantic relations from the large number of PubMed abstracts. Currently, over 2 million semantic relations between biomedical entity pairs are extracted from over 33 million PubMed abstracts in pubmedKB. pubmedKB has a user-friendly interface with an interactive semantic graph, enabling the user to easily query entities and explore entity relations. Supporting sentences with the highlighted snippets allow to easily navigate the publications. Combined with a new explorative approach to literature mining and an interactive interface for researchers, pubmedKB thus enables rapid, intelligent searching of the large biomedical literature to provide useful knowledge and insights. pubmedKB is available at https://www.pubmedkb.cc/.


Asunto(s)
Computadores , Motor de Búsqueda , Humanos , PubMed , Semántica , Minería de Datos/métodos
4.
Nucleic Acids Res ; 49(13): 7318-7329, 2021 07 21.
Artículo en Inglés | MEDLINE | ID: mdl-34197604

RESUMEN

Integrating omics data with quantification of biological traits provides unparalleled opportunities for discovery of genetic regulators by in silico inference. However, current approaches to analyze genetic-perturbation screens are limited by their reliance on annotation libraries for prioritization of hits and subsequent targeted experimentation. Here, we present iTARGEX (identification of Trait-Associated Regulatory Genes via mixture regression using EXpectation maximization), an association framework with no requirement of a priori knowledge of gene function. After creating this tool, we used it to test associations between gene expression profiles and two biological traits in single-gene deletion budding yeast mutants, including transcription homeostasis during S phase and global protein turnover. For each trait, we discovered novel regulators without prior functional annotations. The functional effects of the novel candidates were then validated experimentally, providing solid evidence for their roles in the respective traits. Hence, we conclude that iTARGEX can reliably identify novel factors involved in given biological traits. As such, it is capable of converting genome-wide observations into causal gene function predictions. Further application of iTARGEX in other contexts is expected to facilitate the discovery of new regulators and provide observations for novel mechanistic hypotheses regarding different biological traits and phenotypes.


Asunto(s)
Perfilación de la Expresión Génica , Genes Reguladores , Proteolisis , Fase S/genética , Programas Informáticos , Transcripción Genética , Proteínas Portadoras/genética , Biología Computacional/métodos , Replicación del ADN , Eliminación de Gen , Homeostasis , Mutación , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética
5.
BMC Genomics ; 22(Suppl 5): 919, 2022 May 09.
Artículo en Inglés | MEDLINE | ID: mdl-35534820

RESUMEN

BACKGROUND: Alternative splicing (AS) increases the diversity of transcriptome and could fine-tune the function of genes, so that understanding the regulation of AS is vital. AS could be regulated by many different cis-regulatory elements, such as enhancer. Enhancer has been experimentally proved to regulate AS in some genes. However, there is a lack of genome-wide studies on the association between enhancer and AS (enhancer-AS association). To bridge the gap, here we developed an integrative analysis on a genome-wide scale to identify enhancer-AS associations in human and mouse. RESULT: We collected enhancer datasets which include 28 human and 24 mouse tissues and cell lines, and RNA-seq datasets which are paired with the selected tissues. Combining with data integration and statistical analysis, we identified 3,242 human and 7,716 mouse genes which have significant enhancer-AS associations in at least one tissue. On average, for each gene, about 6% of enhancers in human (5% in mouse) are associated to AS change and for each enhancer, approximately one gene is identified to have enhancer-AS association in both human and mouse. We found that 52% of the human significant (34% in mouse) enhancer-AS associations are the co-existence of homologous genes and homologous enhancers. We further constructed a user-friendly platform, named Visualization of Enhancer-associated Alternative Splicing (VEnAS, http://venas.iis.sinica.edu.tw/ ), to provide genomic architecture, intuitive association plot, and contingency table of the significant enhancer-AS associations. CONCLUSION: This study provides the first genome-wide identification of enhancer-AS associations in human and mouse. The results suggest that a notable portion of enhancers are playing roles in AS regulations. The analyzed results and the proposed platform VEnAS would provide a further understanding of enhancers on regulating alternative splicing.


Asunto(s)
Empalme Alternativo , Elementos de Facilitación Genéticos , Animales , Estudio de Asociación del Genoma Completo , Genómica/métodos , Humanos , Ratones , RNA-Seq
6.
BMC Genomics ; 22(Suppl 5): 917, 2022 Apr 13.
Artículo en Inglés | MEDLINE | ID: mdl-35418014

RESUMEN

BACKGROUND: Many long non-coding RNAs (lncRNAs) have been extensively identified in higher eukaryotic species. The function of lncRNAs has been reported to play important roles in diverse biological processes, including developmental regulation and behavioral plasticity. However, there are no reports of systematic characterization of long non-coding RNAs in the fire ant Solenopsis invicta. RESULTS: In this study, we performed a genome-wide analysis of lncRNAs in the brains of S. invicta from RNA-seq. In total, 1,393 novel lncRNA transcripts were identified in the fire ant. In contrast to the annotated lncRNA transcripts having at least two exons, novel lncRNAs are monoexonic transcripts with a shorter length. Besides, the transcriptome from virgin alate and dealate mated queens were analyzed and compared. The results showed 295 differentially expressed mRNA genes (DEGs) and 65 differentially expressed lncRNA genes (DELs) between virgin and mated queens, of which 17 lncRNAs were highly expressed in the virgin alates and 47 lncRNAs were highly expressed in the mated dealates. By identifying the DEL:DEG pairs with a high association in their expression (Spearman's |rho|> 0.8 and p-value < 0.01), many DELs were co-regulated with DEGs after mating. Furthermore, several remarkable lncRNAs (MSTRG.6523, MSTRG.588, and nc909) that were found to associate with particular coding genes may play important roles in the regulation of brain gene expression in reproductive transition in fire ants. CONCLUSION: This study provides the first genome-wide identification of S. invicta lncRNAs in the brains in different reproductive states. It will contribute to a fuller understanding of the transcriptional regulation underpinning reproductive changes.


Asunto(s)
Hormigas , ARN Largo no Codificante , Animales , Hormigas/genética , Encéfalo/metabolismo , Femenino , Perfilación de la Expresión Génica , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismo , Transcriptoma
7.
Nucleic Acids Res ; 47(10): 5181-5192, 2019 06 04.
Artículo en Inglés | MEDLINE | ID: mdl-30918956

RESUMEN

Eukaryotic cells pack their genomic DNA into euchromatin and heterochromatin. Boundaries between these domains have been shown to be set by boundary elements. In Tetrahymena, heterochromatin domains are targeted for deletion from the somatic nuclei through a sophisticated programmed DNA rearrangement mechanism, resulting in the elimination of 34% of the germline genome in ∼10,000 dispersed segments. Here we showed that most of these deletions occur consistently with very limited variations in their boundaries among inbred lines. We identified several potential flanking regulatory sequences, each associated with a subset of deletions, using a genome-wide motif finding approach. These flanking sequences are inverted repeats with the copies located at nearly identical distances from the opposite ends of the deleted regions, suggesting potential roles in boundary determination. By removing and testing two such inverted repeats in vivo, we found that the ability for boundary maintenance of the associated deletion were lost. Furthermore, we analyzed the deletion boundaries in mutants of a known boundary-determining protein, Lia3p and found that the subset of deletions that are affected by LIA3 knockout contained common features of flanking regulatory sequences. This study suggests a common mechanism for setting deletion boundaries by flanking inverted repeats in Tetrahymena thermophila.


Asunto(s)
ADN Protozoario/genética , Eliminación de Gen , Heterocromatina/química , Proteínas Protozoarias/genética , Tetrahymena thermophila/genética , Secuencias de Aminoácidos , Núcleo Celular/metabolismo , ADN Protozoario/metabolismo , Eucromatina/química , Regulación de la Expresión Génica , Reordenamiento Génico , Genoma de Protozoos , Dominios Proteicos
8.
Bioinformatics ; 35(8): 1414-1415, 2019 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-30202999

RESUMEN

SUMMARY: In higher eukaryotes, the generation of transcript isoforms from a single gene through alternative splicing (AS) and alternative transcription (AT) mechanisms increases functional and regulatory diversities. Annotating these alternative transcript events is essential for genomic studies. However, there are no existing tools that generate comprehensive annotations of all these alternative transcript events including both AS and AT events. In the present study, we develop CATANA, with the encoded exon usage patterns based on the flattened gene model, to identify ten types of AS and AT events. We demonstrate the power and versatility of CATANA by showing greater depth of annotations of alternative transcript events according to either genome annotation or RNA-seq data. AVAILABILITY AND IMPLEMENTATION: CATANA is available on https://github.com/shiauck/CATANA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Empalme Alternativo , Programas Informáticos , Transcripción Genética , Exones , Genoma , Análisis de Secuencia de ARN
9.
BMC Genomics ; 18(1): 786, 2017 Oct 16.
Artículo en Inglés | MEDLINE | ID: mdl-29037146

RESUMEN

BACKGROUND: The regulatory roles of long intergenic noncoding RNAs (lincRNAs) in humans have been revealed through the use of advanced sequencing technology. Recently, three possible scenarios of lincRNA origins have been proposed: de novo origination from intergenic regions, duplication from other long noncoding RNAs, and pseudogenization from protein-coding genes. The first two scenarios are largely studied and supported, yet few studies focused on the evolution from pseudogenized protein-coding sequence to lincRNA. Due to the non-mutually exclusive nature of these three scenarios and the need of systematic investigation of lincRNA origination, we conducted a comparative genomics study to investigate the evolution of human lincRNAs. RESULTS: Combining with syntenic analysis and stringent Blastn e-value cutoff, we found that the majority of lincRNAs are aligned to intergenic regions of other species. Interestingly, 193 human lincRNAs could have protein-coding orthologs in at least two of nine vertebrates. Transposable elements in these conserved regions in human genome are much less than expectation. Moreover, 19% of these lincRNAs have overlaps with or are close to pseudogenes in the human genome. CONCLUSIONS: We suggest that a notable portion of lincRNAs could be derived from pseudogenized protein-coding genes. Furthermore, based on our computational analysis, we hypothesize that a subset of these lincRNAs could have potential to regulate their paralogs by functioning as competing endogenous RNAs. Our results provide evolutionary evidence of the relationship between human lincRNAs and protein-coding genes.


Asunto(s)
Genómica , Seudogenes/genética , ARN Largo no Codificante/genética , Animales , Elementos Transponibles de ADN/genética , Evolución Molecular , Genoma Humano/genética , Humanos , Anotación de Secuencia Molecular , Homología de Secuencia de Ácido Nucleico
10.
Mol Microbiol ; 97(6): 1128-41, 2015 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-26082024

RESUMEN

Helicobacter pylori inhabits the gastric mucosa where it senses and responds to various stresses via a two-component systems (TCSs) that enable its persistent colonization. The aim of this study was to investigate whether any of the three paired TCSs (ArsRS, FleRS and CrdRS) in H. pylori respond to nitrosative stress. The results showed that the expression of crdS was significantly increased upon exposure to nitric oxide (NO). crdS-knockout (ΔcrdS) and crdR/crdS-knockout (ΔcrdRS) H. pylori, but not arsS-knockout (ΔarsS) or fleS-knockout (ΔfleS) H. pylori, showed a significant loss of viability upon exposure to NO compared with wild-type strain. Knockin crdS (ΔcrdS-in) significantly restored viability in the presence of NO. Global transcriptional profiling analysis of wild-type and ΔcrdS H. pylori in the presence or absence of NO showed that 101 genes were differentially expressed, including copper resistance determinant A (crdA), transport, binding and envelope proteins. The CrdR binding motifs were investigated by competitive electrophoretic mobility shift assay, which revealed that the two AC-rich regions in the crdA promoter region are required for binding. These results demonstrate that CrdR-crdA interaction enables H. pylori to survive under nitrosative stress.


Asunto(s)
Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Helicobacter pylori/metabolismo , Óxido Nítrico/metabolismo , Estrés Fisiológico , Secuencia de Bases , Cobre/metabolismo , Proteínas de Unión al ADN/metabolismo , Perfilación de la Expresión Génica , Técnicas de Inactivación de Genes , Helicobacter pylori/genética , Datos de Secuencia Molecular , Regiones Promotoras Genéticas
11.
PLoS Comput Biol ; 11(8): e1004418, 2015 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-26291518

RESUMEN

Transcription factor (TF) binding is determined by the presence of specific sequence motifs (SM) and chromatin accessibility, where the latter is influenced by both chromatin state (CS) and DNA structure (DS) properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA "intrinsic properties" (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy) that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome.


Asunto(s)
Cromatina/química , ADN de Hongos/química , ADN de Hongos/metabolismo , Proteínas Fúngicas/metabolismo , Motivos de Nucleótidos/genética , Saccharomyces cerevisiae/genética , Factores de Transcripción/metabolismo , Cromatina/genética , Cromatina/metabolismo , Biología Computacional , ADN de Hongos/genética , Proteínas Fúngicas/química , Proteínas Fúngicas/genética , Unión Proteica , Factores de Transcripción/química , Factores de Transcripción/genética
12.
Nucleic Acids Res ; 42(2): 739-47, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24153112

RESUMEN

Non-B DNA structures are abundant in the genome and are often associated with critical biological processes, including gene regulation, chromosome rearrangement and genome stabilization. In particular, G-quadruplex (G4) may affect alternative splicing based on its ability to impede the activity of RNA polymerase II. However, the specific role of non-B DNA structures in splicing regulation still awaits investigation. Here, we provide a genome-wide and cross-species investigation of the associations between five non-B DNA structures and exon skipping. Our results indicate a statistically significant correlation of each examined non-B DNA structures with exon skipping in both human and mouse. We further show that the contributions of non-B DNA structures to exon skipping are influenced by the occurring region. These correlations and contributions are also significantly different in human and mouse. Finally, we detailed the effects of G4 by showing that occurring on the template strand and the length of G-run, which is highly related to the stability of a G4 structure, are significantly correlated with exon skipping activity. We thus show that, in addition to the well-known effects of RNA and protein structure, the relative positional arrangement of intronic non-B DNA structures may also impact exon skipping.


Asunto(s)
Empalme Alternativo , ADN/química , Exones , Intrones , Animales , G-Cuádruplex , Humanos , Ratones , Especificidad de la Especie
14.
Nucleic Acids Res ; 41(13): 6371-80, 2013 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-23658220

RESUMEN

Transcription factor (TF) and microRNA (miRNA) are two crucial trans-regulatory factors that coordinately control gene expression. Understanding the impacts of these two factors on the rate of protein sequence evolution is of great importance in evolutionary biology. While many biological factors associated with evolutionary rate variations have been studied, evolutionary analysis of simultaneously accounting for TF and miRNA regulations across metazoans is still uninvestigated. Here, we provide a series of statistical analyses to assess the influences of TF and miRNA regulations on evolutionary rates across metazoans (human, mouse and fruit fly). Our results reveal that the negative correlations between trans-regulation and evolutionary rates hold well across metazoans, but the strength of TF regulation as a rate indicator becomes weak when the other confounding factors that may affect evolutionary rates are controlled. We show that miRNA regulation tends to be a more essential indicator of evolutionary rates than TF regulation, and the combination of TF and miRNA regulations has a significant dependent effect on protein evolutionary rates. We also show that trans-regulation (especially miRNA regulation) is much more important in human/mouse than in fruit fly in determining protein evolutionary rates, suggesting a considerable variation in rate determinants between vertebrates and invertebrates.


Asunto(s)
Evolución Molecular , Regulación de la Expresión Génica , MicroARNs/metabolismo , Factores de Transcripción/metabolismo , Animales , Sitios de Unión , Drosophila melanogaster/genética , Humanos , Ratones , Proteínas/genética
15.
Bioinformatics ; 28(5): 701-8, 2012 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-22238267

RESUMEN

MOTIVATION: Gene regulation involves complicated mechanisms such as cooperativity between a set of transcription factors (TFs). Previous studies have used target genes shared by two TFs as a clue to infer TF-TF interactions. However, this task remains challenging because the target genes with low binding affinity are frequently omitted by experimental data, especially when a single strict threshold is employed. This article aims at improving the accuracy of inferring TF-TF interactions by incorporating motif discovery as a fundamental step when detecting overlapping targets of TFs based on ChIP-chip data. RESULTS: The proposed method, simTFBS, outperforms three naïve methods that adopt fixed thresholds when inferring TF-TF interactions based on ChIP-chip data. In addition, simTFBS is compared with two advanced methods and demonstrates its advantages in predicting TF-TF interactions. By comparing simTFBS with predictions based on the set of available annotated yeast TF binding motifs, we demonstrate that the good performance of simTFBS is indeed coming from the additional motifs found by the proposed procedures. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Redes Reguladoras de Genes , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Factores de Transcripción/metabolismo , Inmunoprecipitación de Cromatina , Regulación Fúngica de la Expresión Génica , Análisis de Secuencia por Matrices de Oligonucleótidos , Unión Proteica , Proteínas de Saccharomyces cerevisiae/genética
16.
iScience ; 26(5): 106635, 2023 May 19.
Artículo en Inglés | MEDLINE | ID: mdl-37138775

RESUMEN

Enhanced phenotypic diversity increases a population's likelihood of surviving catastrophic conditions. Hsp90, an essential molecular chaperone and a central network hub in eukaryotes, has been observed to suppress or enhance the effects of genetic variation on phenotypic diversity in response to environmental cues. Because many Hsp90-interacting genes are involved in signaling transduction pathways and transcriptional regulation, we tested how common Hsp90-dependent differential gene expression is in natural populations. Many genes exhibited Hsp90-dependent strain-specific differential expression in five diverse yeast strains. We further identified transcription factors (TFs) potentially contributing to variable expression. We found that on Hsp90 inhibition or environmental stress, activities or abundances of Hsp90-dependent TFs varied among strains, resulting in differential strain-specific expression of their target genes, which consequently led to phenotypic diversity. We provide evidence that individual strains can readily display specific Hsp90-dependent gene expression, suggesting that the evolutionary impacts of Hsp90 are widespread in nature.

17.
NAR Genom Bioinform ; 5(2): lqad043, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-37223317

RESUMEN

Long-non-coding RNAs (lncRNAs) are defined as RNA sequences which are >200 nt with no coding capacity. These lncRNAs participate in various biological mechanisms, and are widely abundant in a diversity of species. There is well-documented evidence that lncRNAs can interact with genomic DNAs by forming triple helices (triplexes). Previously, several computational methods have been designed based on the Hoogsteen base-pair rule to find theoretical RNA-DNA:DNA triplexes. While powerful, these methods suffer from a high false-positive rate between the predicted triplexes and the biological experiments. To address this issue, we first collected the experimental data of genomic RNA-DNA triplexes from antisense oligonucleotide (ASO)-mediated capture assays and used Triplexator, the most widely used tool for lncRNA-DNA interaction, to reveal the intrinsic information on true triplex binding potential. Based on the analysis, we proposed six computational attributes as filters to improve the in-silico triplex prediction by removing most false positives. Further, we have built a new database, TRIPBASE, as the first comprehensive collection of genome-wide triplex predictions of human lncRNAs. In TRIPBASE, the user interface allows scientists to apply customized filtering criteria to access the potential triplexes of human lncRNAs in the cis-regulatory regions of the human genome. TRIPBASE can be accessed at https://tripbase.iis.sinica.edu.tw/.

18.
BMC Genomics ; 13: 717, 2012 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-23256513

RESUMEN

BACKGROUND: New genes that originate from non-coding DNA rather than being duplicated from parent genes are called de novo genes. Their short evolution time and lack of parent genes provide a chance to study the evolution of cis-regulatory elements in the initial stage of gene emergence. Although a few reports have discussed cis-regulatory elements in new genes, knowledge of the characteristics of these elements in de novo genes is lacking. Here, we conducted a comprehensive investigation to depict the emergence and establishment of cis-regulatory elements in de novo yeast genes. RESULTS: In a genome-wide investigation, we found that the number of transcription factor binding sites (TFBSs) in de novo genes of S. cerevisiae increased rapidly and quickly became comparable to the number of TFBSs in established genes. This phenomenon might have resulted from certain characteristics of de novo genes; namely, a relatively frequent gain of TFBSs, an unexpectedly high number of preexisting TFBSs, or lower selection pressure in the promoter regions of the de novo genes. Furthermore, we identified differences in the promoter architecture between de novo genes and duplicated new genes, suggesting that distinct regulatory strategies might be employed by genes of different origin. Finally, our functional analyses of the yeast de novo genes revealed that they might be related to reproduction. CONCLUSIONS: Our observations showed that de novo genes and duplicated new genes possess mutually distinct regulatory characteristics, implying that these two types of genes might have different roles in evolution.


Asunto(s)
Evolución Molecular , Duplicación de Gen , Genes Fúngicos/genética , Secuencias Reguladoras de Ácidos Nucleicos/genética , Saccharomyces cerevisiae/genética , Sitios de Unión , Nucleosomas/genética , Reproducción/genética , Saccharomyces cerevisiae/citología , Saccharomyces cerevisiae/fisiología , Selección Genética , TATA Box/genética , Factores de Transcripción/metabolismo
19.
Bioinformatics ; 27(24): 3341-7, 2011 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-22016405

RESUMEN

MOTIVATION: Metagenomics involves sampling and studying the genetic materials in microbial communities. Several statistical methods have been proposed for comparative analysis of microbial community compositions. Most of the methods are based on the estimated abundances of taxonomic units or functional groups from metagenomic samples. However, such estimated abundances might deviate from the true abundances in habitats due to sampling biases and other systematic artifacts in metagenomic data processing. RESULTS: We developed the MetaRank scheme to convert abundances into ranks. MetaRank employs a series of statistical hypothesis tests to compare abundances within a microbial community and determine their ranks. We applied MetaRank to synthetic samples and real metagenomes. The results confirm that MetaRank can reduce the effects of sampling biases and clarify the characteristics of metagenomes in comparative studies of microbial communities. Therefore, MetaRank provides a useful rank-based approach to analyzing microbiomes. CONTACT: hktsai@iis.sinica.edu.tw SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Bacterias/clasificación , Tracto Gastrointestinal/microbiología , Metagenoma , Metagenómica/métodos , Obesidad/microbiología , Adulto , Bacterias/genética , Bacterias/aislamiento & purificación , Biología Computacional/métodos , ADN Bacteriano/genética , Humanos , Lactante , Filogenia , ARN Ribosómico 16S/genética , Análisis de Secuencia de ADN/métodos
20.
Bioinformatics ; 27(16): 2298-9, 2011 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-21697124

RESUMEN

SUMMARY: MetaABC is a metagenomic platform that integrates several binning tools coupled with methods for removing artifacts, analyzing unassigned reads and controlling sampling biases. It allows users to arrive at a better interpretation via series of distinct combinations of analysis tools. After execution, MetaABC provides outputs in various visual formats such as tables, pie and bar charts as well as clustering result diagrams. AVAILABILITY: MetaABC source code and documentation are available at http://bits2.iis.sinica.edu.tw/MetaABC/ CONTACT: dywang@gate.sinica.edu.tw; hktsai@iis.sinica.edu.tw SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Metagenómica/métodos , Programas Informáticos , Análisis por Conglomerados , Integración de Sistemas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA