Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
Bioinformatics ; 37(1): 126-128, 2021 Apr 09.
Artículo en Inglés | MEDLINE | ID: mdl-33367516

RESUMEN

SUMMARY: Since its introduction, RNA-Seq technology has been used extensively in studies of pathogenic bacteria to identify and quantify differences in gene expression across multiple samples from bacteria exposed to different conditions. With some exceptions, tools for studying gene expression, determination of differential gene expression, downstream pathway analysis and normalization of data collected in extreme biological conditions is still lacking. Here, we describe ProkSeq, a user-friendly, fully automated RNA-Seq data analysis pipeline designed for prokaryotes. ProkSeq provides a wide variety of options for analysing differential expression, normalizing expression data and visualizing data and results. AVAILABILITY AND IMPLEMENTATION: ProkSeq is implemented in Python and is published under the MIT source license. The pipeline is available as a Docker container https://hub.docker.com/repository/docker/snandids/prokseq-v2.0, or can be used through Anaconda: https://anaconda.org/snandiDS/prokseq. The code is available on Github: https://github.com/snandiDS/prokseq and a detailed user documentation, including a manual and tutorial can be found at https://prokseqV20.readthedocs.io. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

2.
BMC Genomics ; 22(1): 336, 2021 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-33971818

RESUMEN

BACKGROUND: Our understanding of genome regulation is ever-evolving with the continuous discovery of new modes of gene regulation, and transcriptomic studies of mammalian genomes have revealed the presence of a considerable population of non-coding RNA molecules among the transcripts expressed. One such non-coding RNA molecule is long non-coding RNA (lncRNA). However, the function of lncRNAs in gene regulation is not well understood; moreover, finding conserved lncRNA across species is a challenging task. Therefore, we propose a novel approach to identify conserved lncRNAs and functionally annotate these molecules. RESULTS: In this study, we exploited existing myogenic transcriptome data and identified conserved lncRNAs in mice and humans. We identified the lncRNAs expressing differentially between the early and later stages of muscle development. Differential expression of these lncRNAs was confirmed experimentally in cultured mouse muscle C2C12 cells. We utilized the three-dimensional architecture of the genome and identified topologically associated domains for these lncRNAs. Additionally, we correlated the expression of genes in domains for functional annotation of these trans-lncRNAs in myogenesis. Using this approach, we identified conserved lncRNAs in myogenesis and functionally annotated them. CONCLUSIONS: With this novel approach, we identified the conserved lncRNAs in myogenesis in humans and mice and functionally annotated them. The method identified a large number of lncRNAs are involved in myogenesis. Further studies are required to investigate the reason for the conservation of the lncRNAs in human and mouse while their sequences are dissimilar. Our approach can be used to identify novel lncRNAs conserved in different species and functionally annotated them.


Asunto(s)
ARN Largo no Codificante , Animales , Biología Computacional , Genoma , Ratones , Desarrollo de Músculos/genética , ARN Largo no Codificante/genética , Transcriptoma
3.
J Biol Chem ; 293(37): 14342-14358, 2018 09 14.
Artículo en Inglés | MEDLINE | ID: mdl-30068546

RESUMEN

Polycomb group proteins are essential epigenetic repressors. They form multiple protein complexes of which two kinds, PRC1 and PRC2, are indispensable for repression. Although much is known about their biochemical properties, how mammalian PRC1 and PRC2 are targeted to specific genes is poorly understood. Here, we establish the cyclin D2 (CCND2) oncogene as a simple model to address this question. We provide the evidence that the targeting of PRC1 to CCND2 involves a dedicated PRC1-targeting element (PTE). The PTE appears to act in concert with an adjacent cytosine-phosphate-guanine (CpG) island to arrange for the robust binding of PRC1 and PRC2 to repressed CCND2 Our findings pave the way to identify sequence-specific DNA-binding proteins implicated in the targeting of mammalian PRC1 complexes and provide novel link between polycomb repression and cancer.


Asunto(s)
Ciclina D2/genética , Ciclina D2/metabolismo , Oncogenes , Proteínas del Grupo Polycomb/metabolismo , Animales , Sitios de Unión , Silenciador del Gen , Humanos , Ratones , Unión Proteica , Transcripción Genética
4.
Nucleic Acids Res ; 41(19): 8822-41, 2013 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-23913413

RESUMEN

In higher organisms, gene regulation is controlled by the interplay of non-random combinations of multiple transcription factors (TFs). Although numerous attempts have been made to identify these combinations, important details, such as mutual positioning of the factors that have an important role in the TF interplay, are still missing. The goal of the present work is in silico mapping of some of such associating factors based on their mutual positioning, using computational screening. We have selected the process of myogenesis as a study case, and we focused on TF combinations involving master myogenic TF Myogenic differentiation (MyoD) with other factors situated at specific distances from it. The results of our work show that some muscle-specific factors occur together with MyoD within the range of ±100 bp in a large number of promoters. We confirm co-occurrence of the MyoD with muscle-specific factors as described in earlier studies. However, we have also found novel relationships of MyoD with other factors not specific for muscle. Additionally, we have observed that MyoD tends to associate with different factors in proximal and distal promoter areas. The major outcome of our study is establishing the genome-wide connection between biological interactions of TFs and close co-occurrence of their binding sites.


Asunto(s)
Proteína MioD/metabolismo , Regiones Promotoras Genéticas , Factores de Transcripción/metabolismo , Animales , Sitios de Unión , Simulación por Computador , Elementos de Facilitación Genéticos , Humanos , Ratones , Desarrollo de Músculos/genética , Mioblastos/metabolismo
5.
Nucleic Acids Res ; 40(17): 8227-39, 2012 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-22730291

RESUMEN

The Six1 transcription factor is a homeodomain protein involved in controlling gene expression during embryonic development. Six1 establishes gene expression profiles that enable skeletal myogenesis and nephrogenesis, among others. While several homeodomain factors have been extensively characterized with regards to their DNA-binding properties, relatively little is known of the properties of Six1. We have used the genomic binding profile of Six1 during the myogenic differentiation of myoblasts to obtain a better understanding of its preferences for recognizing certain DNA sequences. DNA sequence analyses on our genomic binding dataset, combined with biochemical characterization using binding assays, reveal that Six1 has a much broader DNA-binding sequence spectrum than had been previously determined. Moreover, using a position weight matrix optimization algorithm, we generated a highly sensitive and specific matrix that can be used to predict novel Six1-binding sites with highest accuracy. Furthermore, our results support the idea of a mode of DNA recognition by this factor where Six1 itself is sufficient for sequence discrimination, and where Six1 domains outside of its homeodomain contribute to binding site selection. Together, our results provide new light on the properties of this important transcription factor, and will enable more accurate modeling of Six1 function in bioinformatic studies.


Asunto(s)
ADN/química , Proteínas de Homeodominio/metabolismo , Animales , Sitios de Unión , ADN/metabolismo , Genómica/métodos , Ratones , Mioblastos/metabolismo , Motivos de Nucleótidos , Posición Específica de Matrices de Puntuación , Unión Proteica , Estructura Terciaria de Proteína , Análisis de Secuencia de ADN
6.
BMC Genomics ; 13: 416, 2012 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-22913572

RESUMEN

BACKGROUND: The identifying of binding sites for transcription factors is a key component of gene regulatory network analysis. This is often done using position-weight matrices (PWMs). Because of the importance of in silico mapping of tentative binding sites, we previously developed an approach for PWM optimization that substantially improves the accuracy of such mapping. RESULTS: The present work implements the optimization algorithm applied to the existing PWM for GATA-3 transcription factor and builds a new di-nucleotide PWM. The existing available PWM is based on experimental data adopted from Jaspar. The optimized PWM substantially improves the sensitivity and specificity of the TF mapping compared to the conventional applications. The refined PWM also facilitates in silico identification of novel binding sites that are supported by experimental data. We also describe uncommon positioning of binding motifs for several T-cell lineage specific factors in human promoters. CONCLUSION: Our proposed di-nucleotide PWM approach outperforms the conventional mono-nucleotide PWM approach with respect to GATA-3. Therefore our new di-nucleotide PWM provides new insight into plausible transcriptional regulatory interactions in human promoters.


Asunto(s)
Sitios de Unión , Biología Computacional/métodos , Factor de Transcripción GATA3/genética , Posición Específica de Matrices de Puntuación , Algoritmos , Bases de Datos Genéticas , Humanos , Regiones Promotoras Genéticas
7.
J Cancer Res Ther ; 18(1): 231-239, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35381789

RESUMEN

Aims: Nonsmall-cell lung carcinoma comprises 85% of lung malignancies and is usually associated with a poor prognosis due to diagnosis at advanced stages. Molecular diagnosis of computerized tomography (CT)-guided biopsy has the potential to identify subtypes of lung carcinoma like adenocarcinoma (AC) and squamous cell carcinoma (SCC) along with its molecular stratification. This approach will help predict the genetic signature of lung cancer in individual patients. Subjects and Methods: Histopathologically proved a CT-guided biopsy sample of lung cancer cases was used to screen for the expression of microRNA (miRNA) earlier quantitated in blood plasma. Primers against hsa-miR2114, hsa-miR2115, hsa-miR2116, hsa-miR2117, hsa-miR449c, and hsa-miR548q with control RNU6 were used to screen 30 AC, 30 SCC, 5 nonspecific granulomatous inflammation, and 8 control samples. Reverse transcription polymerase chain reaction (RT-PCR) data revealed expression of hsa-miR2114 and hsa-miR548q in AC as well as SCC. Results: RT-PCR data revealed that the expression of hsa-miR2116 and hsa-miR449c was found upregulated in AC while hsa-miR2117 was expressed in SCC cases. Bioinformatic analysis revealed that genes, where these miRNAs are located, were also upregulated while targets of these miRNAs were downregulated. Conclusions: miRNAs expression pattern in the CT-guided biopsy samples can be used as a potential tool to differentially diagnose lung cancer subtypes. The expression pattern of miRNAs matches very well in blood plasma and tissue samples, albeit levels were very low in the earlier case than later. This approach can also be used for screening mutations and other molecular markers in a personalized manner for the management of lung cancer patients.


Asunto(s)
Neoplasias Pulmonares , MicroARNs , Biopsia , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Humanos , Pulmón/diagnóstico por imagen , Neoplasias Pulmonares/diagnóstico por imagen , Neoplasias Pulmonares/genética , MicroARNs/genética , MicroARNs/metabolismo , Tomografía Computarizada por Rayos X
8.
J Comput Biol ; 27(8): 1313-1328, 2020 08.
Artículo en Inglés | MEDLINE | ID: mdl-31855461

RESUMEN

Multiple transcription factors (TFs) bind to specific sites in the genome and interact among themselves to form the cis-regulatory modules (CRMs). They are essential in modulating the expression of genes, and it is important to study this interplay to understand gene regulation. In the present study, we integrated experimentally identified TF binding sites collected from published studies with computationally predicted TF binding sites to identify Drosophila CRMs. Along with the detection of the previously known CRMs, this approach identified novel protein combinations. We determined high-occupancy target sites, where a large number of TFs bind. Investigating these sites revealed that Giant, Dichaete, and Knirp are highly enriched in these locations. A common TAG team motif was observed at these sites, which might play a role in recruiting other TFs. While comparing the binding sites at distal and proximal promoters, we found that certain regulatory TFs, such as Zelda, were highly enriched in enhancers. Our study has shown that, from the information available concerning the TF binding sites, the real CRMs could be predicted accurately and efficiently. Although we only may claim co-occurrence of these proteins in this study, it may actually point to their interaction (as known interaction proteins typically co-occur together). Such an integrative approach can, therefore, help us to provide a better understanding of the interplay among the factors, even though further experimental verification is required.


Asunto(s)
Proteínas de Drosophila/genética , Proteínas Nucleares/genética , Proteínas Represoras/genética , Factores de Transcripción SOX/genética , Factores de Transcripción/genética , Animales , Sitios de Unión/genética , Biología Computacional , Proteínas de Unión al ADN/genética , Regulación de la Expresión Génica/genética , Genoma de los Insectos/genética , Elementos Reguladores de la Transcripción , Secuencias Reguladoras de Ácidos Nucleicos/genética , Programas Informáticos
9.
Sci Rep ; 9(1): 2775, 2019 02 26.
Artículo en Inglés | MEDLINE | ID: mdl-30808983

RESUMEN

Sequence comparison is an essential part of modern molecular biology research. In this study, we estimated the parameters of Markov chain by considering the frequencies of occurrence of the all possible amino acid pairs from each alignment-free protein sequence. These estimated Markov chain parameters were used to calculate similarity between two protein sequences based on a fuzzy integral algorithm. For validation, our result was compared with both alignment-based (ClustalW) and alignment-free methods on six benchmark datasets. The results indicate that our developed algorithm has a better clustering performance for protein sequence comparison.


Asunto(s)
Proteínas/química , Algoritmos , Secuencia de Aminoácidos , Complejo I de Transporte de Electrón/química , Complejo I de Transporte de Electrón/clasificación , Humanos , Cadenas de Markov , Proteínas Mitocondriales/química , Proteínas Mitocondriales/clasificación , NADH Deshidrogenasa/química , NADH Deshidrogenasa/clasificación , Filogenia , Proteínas/clasificación , Alineación de Secuencia
10.
Sci Rep ; 9(1): 3753, 2019 03 06.
Artículo en Inglés | MEDLINE | ID: mdl-30842590

RESUMEN

A larger amount of sequence data in private and public databases produced by next-generation sequencing put new challenges due to limitation associated with the alignment-based method for sequence comparison. So, there is a high need for faster sequence analysis algorithms. In this study, we developed an alignment-free algorithm for faster sequence analysis. The novelty of our approach is the inclusion of fuzzy integral with Markov chain for sequence analysis in the alignment-free model. The method estimate the parameters of a Markov chain by considering the frequencies of occurrence of all possible nucleotide pairs from each DNA sequence. These estimated Markov chain parameters were used to calculate similarity among all pairwise combinations of DNA sequences based on a fuzzy integral algorithm. This matrix is used as an input for the neighbor program in the PHYLIP package for phylogenetic tree construction. Our method was tested on eight benchmark datasets and on in-house generated datasets (18 s rDNA sequences from 11 arbuscular mycorrhizal fungi (AMF) and 16 s rDNA sequences of 40 bacterial isolates from plant interior). The results indicate that the fuzzy integral algorithm is an efficient and feasible alignment-free method for sequence analysis on the genomic scale.


Asunto(s)
Bacterias/genética , Biología Computacional/métodos , Micorrizas/genética , Análisis de Secuencia de ADN/métodos , Algoritmos , Bacterias/aislamiento & purificación , Análisis por Conglomerados , ADN Ribosómico/genética , Lógica Difusa , Cadenas de Markov , Micorrizas/aislamiento & purificación , Filogenia , Plantas/microbiología
11.
BMC Bioinformatics ; 8: 104, 2007 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-17389042

RESUMEN

BACKGROUND: Profile Hidden Markov Models (HMM) are statistical representations of protein families derived from patterns of sequence conservation in multiple alignments and have been used in identifying remote homologues with considerable success. These conservation patterns arise from fold specific signals, shared across multiple families, and function specific signals unique to the families. The availability of sequences pre-classified according to their function permits the use of negative training sequences to improve the specificity of the HMM, both by optimizing the threshold cutoff and by modifying emission probabilities to minimize the influence of fold-specific signals. A protocol to generate family specific HMMs is described that first constructs a profile HMM from an alignment of the family's sequences and then uses this model to identify sequences belonging to other classes that score above the default threshold (false positives). Ten-fold cross validation is used to optimise the discrimination threshold score for the model. The advent of fast multiple alignment methods enables the use of the profile alignments to align the true and false positive sequences, and the resulting alignments are used to modify the emission probabilities in the original model. RESULTS: The protocol, called HMM-ModE, was validated on a set of sequences belonging to six sub-families of the AGC family of kinases. These sequences have an average sequence similarity of 63% among the group though each sub-group has a different substrate specificity. The optimisation of discrimination threshold, by using negative sequences scored against the model improves specificity in test cases from an average of 21% to 98%. Further discrimination by the HMM after modifying model probabilities using negative training sequences is provided in a few cases, the average specificity rising to 99%. Similar improvements were obtained with a sample of G-Protein coupled receptors sub-classified with respect to their substrate specificity, though the average sequence identity across the sub-families is just 20.6%. The protocol is applied in a high-throughput classification exercise on protein kinases. CONCLUSION: The protocol has the potential to maximise the contributions of discriminating residues to classify proteins based on their molecular function, using pre-classified positive and negative sequence training data. The high specificity of the method, and increasing availability of pre-classified sequence data holds the potential for its application in sequence annotation.


Asunto(s)
Algoritmos , Inteligencia Artificial , Reconocimiento de Normas Patrones Automatizadas/métodos , Proteínas/química , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Interpretación Estadística de Datos , Análisis Discriminante , Cadenas de Markov , Modelos Químicos , Modelos Estadísticos
12.
BMC Genomics ; 6: 116, 2005 Sep 09.
Artículo en Inglés | MEDLINE | ID: mdl-16150155

RESUMEN

BACKGROUND: Theoretical proteome analysis, generated by plotting theoretical isoelectric points (pI) against molecular masses of all proteins encoded by the genome show a multimodal distribution for pI. This multimodal distribution is an effect of allowed combinations of the charged amino acids, and not due to evolutionary causes. The variation in this distribution can be correlated to the organisms ecological niche. Contributions to this variation maybe mapped to individual proteins by studying the variation in pI of orthologs across microorganism genomes. RESULTS: The distribution of ortholog pI values showed trimodal distributions for all prokaryotic genomes analyzed, similar to whole proteome plots. Pairwise analysis of pI variation show that a few COGs are conserved within, but most vary between, the acidic and basic regions of the distribution, while molecular mass is more highly conserved. At the level of functional grouping of orthologs, five groups vary significantly from the population of orthologs, which is attributed to either conservation at the level of sequences or a bias for either positively or negatively charged residues contributing to the function. Individual COGs conserved in both the acidic and basic regions of the trimodal distribution are identified, and orthologs that best represent the variation in levels of the acidic and basic regions are listed. CONCLUSION: The analysis of pI distribution by using orthologs provides a basis for resolution of theoretical proteome comparison at the level of individual proteins. Orthologs identified that significantly vary between the major acidic and basic regions maybe used as representative of the variation of the entire proteome.


Asunto(s)
Biología Computacional/métodos , Genoma Bacteriano , Proteoma , Proteómica/métodos , Proteínas Bacterianas , Análisis por Conglomerados , Simulación por Computador , Bases de Datos de Proteínas , Electroforesis en Gel Bidimensional , Concentración de Iones de Hidrógeno , Punto Isoeléctrico , Modelos Estadísticos , Sistemas de Lectura Abierta , Proteínas/química
13.
Adv Bioinformatics ; 2011: 743782, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-21541071

RESUMEN

Various enzyme identification protocols involving homology transfer by sequence-sequence or profile-sequence comparisons have been devised which utilise Swiss-Prot sequences associated with EC numbers as the training set. A profile HMM constructed for a particular EC number might select sequences which perform a different enzymatic function due to the presence of certain fold-specific residues which are conserved in enzymes sharing a common fold. We describe a protocol, ModEnzA (HMM-ModE Enzyme Annotation), which generates profile HMMs highly specific at a functional level as defined by the EC numbers by incorporating information from negative training sequences. We enrich the training dataset by mining sequences from the NCBI Non-Redundant database for increased sensitivity. We compare our method with other enzyme identification methods, both for assigning EC numbers to a genome as well as identifying protein sequences associated with an enzymatic activity. We report a sensitivity of 88% and specificity of 95% in identifying EC numbers and annotating enzymatic sequences from the E. coli genome which is higher than any other method. With the next-generation sequencing methods producing a huge amount of sequence data, the development and use of fully automated yet accurate protocols such as ModEnzA is warranted for rapid annotation of newly sequenced genomes and metagenomic sequences.

14.
Mycopathologia ; 164(1): 1-17, 2007 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-17574539

RESUMEN

In the absence of steroid receptors and any known mechanism of gene regulation by steroid hormones in Candida albicans, we did a genome-wide analysis of C. albicans cells treated with progesterone using Eurogentec cDNA microarrays to find the complete repertoire of steroid responsive genes. Northern blotting analysis was employed to validate the genes that were differentially regulated by progesterone in the microarray experiments. A total of 99 genes were found to be significantly regulated by progesterone, among them 60 were up-regulated and 39 were down-regulated. It was observed that progesterone considerably enhanced the expression of multi-drug resistance (MDR) genes belonging to ATP Binding Cassette (CDR1 and CDR2) super-family of multidrug transporters, suggesting a possible relationship between steroid stress and MDR genes. Several genes associated with hyphal induction and the establishment of pathogenesis were also found up-regulated. In silico search for various transcription factor (TF) binding sites in the promoter of the affected genes revealed that EFG1, CPH1, NRG1, TUP1, MIG1 and AP-1 regulated genes are responsive to progesterone. The stress responsive elements (STRE; AG(4) or C(4)T) were also found in the promoters of several responsive genes. Our data sheds new light on the regulation of gene expression in C. albicans by human steroids, and its correlation with drug resistance, virulence, morphogenesis and general stress response. A comparison with drug induced stress response has also been discussed.


Asunto(s)
Candida albicans/genética , Regulación Fúngica de la Expresión Génica/efectos de los fármacos , Genoma Fúngico , Progesterona/farmacología , Transportadoras de Casetes de Unión a ATP/genética , Northern Blotting , Candida albicans/efectos de los fármacos , Candida albicans/crecimiento & desarrollo , Candidiasis/microbiología , Farmacorresistencia Fúngica/genética , Proteínas Fúngicas/genética , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA