Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 59
Filtrar
1.
BMC Genomics ; 25(1): 428, 2024 Apr 30.
Artículo en Inglés | MEDLINE | ID: mdl-38689225

RESUMEN

BACKGROUND: Although many studies have been done to reveal artificial selection signatures in commercial and indigenous chickens, a limited number of genes have been linked to specific traits. To identify more trait-related artificial selection signatures and genes, we re-sequenced a total of 85 individuals of five indigenous chicken breeds with distinct traits from Yunnan Province, China. RESULTS: We found 30 million non-redundant single nucleotide variants and small indels (< 50 bp) in the indigenous chickens, of which 10 million were not seen in 60 broilers, 56 layers and 35 red jungle fowls (RJFs) that we compared with. The variants in each breed are enriched in non-coding regions, while those in coding regions are largely tolerant, suggesting that most variants might affect cis-regulatory sequences. Based on 27 million bi-allelic single nucleotide polymorphisms identified in the chickens, we found numerous selective sweeps and affected genes in each indigenous chicken breed and substantially larger numbers of selective sweeps and affected genes in the broilers and layers than previously reported using a rigorous statistical model. Consistent with the locations of the variants, the vast majority (~ 98.3%) of the identified selective sweeps overlap known quantitative trait loci (QTLs). Meanwhile, 74.2% known QTLs overlap our identified selective sweeps. We confirmed most of previously identified trait-related genes and identified many novel ones, some of which might be related to body size and high egg production traits. Using RT-qPCR, we validated differential expression of eight genes (GHR, GHRHR, IGF2BP1, OVALX, ELF2, MGARP, NOCT, SLC25A15) that might be related to body size and high egg production traits in relevant tissues of relevant breeds. CONCLUSION: We identify 30 million single nucleotide variants and small indels in the five indigenous chicken breeds, 10 million of which are novel. We predict substantially more selective sweeps and affected genes than previously reported in both indigenous and commercial breeds. These variants and affected genes are good candidates for further experimental investigations of genotype-phenotype relationships and practical applications in chicken breeding programs.


Asunto(s)
Pollos , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Selección Genética , Animales , Pollos/genética , Genoma , Mutación INDEL , Cruzamiento , Fenotipo , Genómica/métodos
2.
BMC Genomics ; 25(1): 430, 2024 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-38693501

RESUMEN

BACKGROUND: Although multiple chicken genomes have been assembled and annotated, the numbers of protein-coding genes in chicken genomes and their variation among breeds are still uncertain due to the low quality of these genome assemblies and limited resources used in their gene annotations. To fill these gaps, we recently assembled genomes of four indigenous chicken breeds with distinct traits at chromosome-level. In this study, we annotated genes in each of these assembled genomes using a combination of RNA-seq- and homology-based approaches. RESULTS: We identified varying numbers (17,497-17,718) of protein-coding genes in the four indigenous chicken genomes, while recovering 51 of the 274 "missing" genes in birds in general, and 36 of the 174 "missing" genes in chickens in particular. Intriguingly, based on deeply sequenced RNA-seq data collected in multiple tissues in the four breeds, we found 571 ~ 627 protein-coding genes in each genome, which were missing in the annotations of the reference chicken genomes (GRCg6a and GRCg7b/w). After removing redundancy, we ended up with a total of 1,420 newly annotated genes (NAGs). The NAGs tend to be found in subtelomeric regions of macro-chromosomes (chr1 to chr5, plus chrZ) and middle chromosomes (chr6 to chr13, plus chrW), as well as in micro-chromosomes (chr14 to chr39) and unplaced contigs, where G/C contents are high. Moreover, the NAGs have elevated quadruplexes G frequencies, while both G/C contents and quadruplexes G frequencies in their surrounding regions are also high. The NAGs showed tissue-specific expression, and we were able to verify 39 (92.9%) of 42 randomly selected ones in various tissues of the four chicken breeds using RT-qPCR experiments. Most of the NAGs were also encoded in the reference chicken genomes, thus, these genomes might harbor more genes than previously thought. CONCLUSION: The NAGs are widely distributed in wild, indigenous and commercial chickens, and they might play critical roles in chicken physiology. Counting these new genes, chicken genomes harbor more genes than originally thought.


Asunto(s)
Pollos , Genoma , Anotación de Secuencia Molecular , Animales , Pollos/genética , Composición de Base , Telómero/genética , Cromosomas/genética , Genómica/métodos
3.
BMC Genomics ; 24(1): 88, 2023 Feb 24.
Artículo en Inglés | MEDLINE | ID: mdl-36829151

RESUMEN

BACKGROUND: The stress response of Saccharomyces cerevisiae has been extensively studied in the past decade. However, with the advent of recent technology in single-cell transcriptome profiling, there is a new opportunity to expand and further understanding of the yeast stress response with greater resolution on a system level. To understand transcriptomic changes in baker's yeast S. cerevisiae cells under stress conditions, we sequenced 117 yeast cells under three stress treatments (hypotonic condition, glucose starvation and amino acid starvation) using a full-length single-cell RNA-Seq method. RESULTS: We found that though single cells from the same treatment showed varying degrees of uniformity, technical noise and batch effects can confound results significantly. However, upon careful selection of samples to reduce technical artifacts and account for batch-effects, we were able to capture distinct transcriptomic signatures for different stress conditions as well as putative regulatory relationships between transcription factors and target genes. CONCLUSION: Our results show that a full-length single-cell based transcriptomic analysis of the yeast may help paint a clearer picture of how the model organism responds to stress than do bulk cell population-based methods.


Asunto(s)
Proteínas de Saccharomyces cerevisiae , Saccharomyces cerevisiae , Saccharomyces cerevisiae/genética , Transcriptoma , Perfilación de la Expresión Génica , Proteínas de Saccharomyces cerevisiae/genética , Factores de Transcripción/metabolismo
4.
BMC Biol ; 20(1): 221, 2022 10 05.
Artículo en Inglés | MEDLINE | ID: mdl-36199141

RESUMEN

BACKGROUND: Predicting cis-regulatory modules (CRMs) in a genome and their functional states in various cell/tissue types of the organism are two related challenging computational tasks. Most current methods attempt to simultaneously achieve both using data of multiple epigenetic marks in a cell/tissue type. Though conceptually attractive, they suffer high false discovery rates and limited applications. To fill the gaps, we proposed a two-step strategy to first predict a map of CRMs in the genome, and then predict functional states of all the CRMs in various cell/tissue types of the organism. We have recently developed an algorithm for the first step that was able to more accurately and completely predict CRMs in a genome than existing methods by integrating numerous transcription factor ChIP-seq datasets in the organism. Here, we presented machine-learning methods for the second step. RESULTS: We showed that functional states in a cell/tissue type of all the CRMs in the genome could be accurately predicted using data of only 1~4 epigenetic marks by a variety of machine-learning classifiers. Our predictions are substantially more accurate than the best achieved so far. Interestingly, a model trained on a cell/tissue type in humans can accurately predict functional states of CRMs in different cell/tissue types of humans as well as of mice, and vice versa. Therefore, epigenetic code that defines functional states of CRMs in various cell/tissue types is universal at least in humans and mice. Moreover, we found that from tens to hundreds of thousands of CRMs were active in a human and mouse cell/tissue type, and up to 99.98% of them were reutilized in different cell/tissue types, while as small as 0.02% of them were unique to a cell/tissue type that might define the cell/tissue type. CONCLUSIONS: Our two-step approach can accurately predict functional states in any cell/tissue type of all the CRMs in the genome using data of only 1~4 epigenetic marks. Our approach is also more cost-effective than existing methods that typically use data of more epigenetic marks. Our results suggest common epigenetic rules for defining functional states of CRMs in various cell/tissue types in humans and mice.


Asunto(s)
Genoma , Factores de Transcripción , Algoritmos , Animales , Sitios de Unión , Epigénesis Genética , Regulación de la Expresión Génica , Humanos , Ratones , Factores de Transcripción/metabolismo
5.
BMC Genomics ; 23(1): 714, 2022 Oct 19.
Artículo en Inglés | MEDLINE | ID: mdl-36261804

RESUMEN

BACKGROUND: Mouse is probably the most important model organism to study mammal biology and human diseases. A better understanding of the mouse genome will help understand the human genome, biology and diseases. However, despite the recent progress, the characterization of the regulatory sequences in the mouse genome is still far from complete, limiting its use to understand the regulatory sequences in the human genome. RESULTS: Here, by integrating binding peaks in ~ 9,000 transcription factor (TF) ChIP-seq datasets that cover 79.9% of the mouse mappable genome using an efficient pipeline, we were able to partition these binding peak-covered genome regions into a cis-regulatory module (CRM) candidate (CRMC) set and a non-CRMC set. The CRMCs contain 912,197 putative CRMs and 38,554,729 TF binding sites (TFBSs) islands, covering 55.5% and 24.4% of the mappable genome, respectively. The CRMCs tend to be under strong evolutionary constraints, indicating that they are likely cis-regulatory; while the non-CRMCs are largely selectively neutral, indicating that they are unlikely cis-regulatory. Based on evolutionary profiles of the genome positions, we further estimated that 63.8% and 27.4% of the mouse genome might code for CRMs and TFBSs, respectively. CONCLUSIONS: Validation using experimental data suggests that at least most of the CRMCs are authentic. Thus, this unprecedentedly comprehensive map of CRMs and TFBSs can be a good resource to guide experimental studies of regulatory genomes in mice and humans.


Asunto(s)
Genoma Humano , Elementos Reguladores de la Transcripción , Humanos , Ratones , Animales , Elementos Reguladores de la Transcripción/genética , Sitios de Unión/genética , Unión Proteica , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Mamíferos/genética
6.
BMC Genomics ; 23(1): 173, 2022 Mar 02.
Artículo en Inglés | MEDLINE | ID: mdl-35236293

RESUMEN

BACKGROUND: Melanin is an important antioxidant in food and has been used in medicine and cosmetology. Chicken meat with high melanin content from black-boned chickens have been considered a high nutritious food with potential medicinal properties. The molecular mechanism of melanogenesis of skeletal muscle in black-boned chickens remain poorly understood. This study investigated the biological gene-metabolite associations regulating the muscle melanogenesis pathways in Wuliangshan black-boned chickens with two normal boned chicken breeds as control. RESULTS: We identified 25 differentially expressed genes and 11 transcription factors in the melanogenesis pathways. High levels of the meat flavor compounds inosine monophosphate, hypoxanthine, lysophospholipid, hydroxyoctadecadienoic acid, and nicotinamide mononucleotide were found in Wuliangshan black-boned chickens. CONCLUSION: Integrative analysis of transcriptomics and metabolomics revealed the dual physiological functions of the PDZK1 gene, involved in pigmentation and/or melanogenesis and regulating the phospholipid signaling processes in muscle of black boned chickens.


Asunto(s)
Pollos , Transcriptoma , Animales , Pollos/genética , Carne , Metabolómica , Músculo Esquelético
7.
Bioinformatics ; 37(19): 3235-3242, 2021 Oct 11.
Artículo en Inglés | MEDLINE | ID: mdl-33961003

RESUMEN

MOTIVATION: Recent breakthroughs of single-cell RNA sequencing (scRNA-seq) technologies offer an exciting opportunity to identify heterogeneous cell types in complex tissues. However, the unavoidable biological noise and technical artifacts in scRNA-seq data as well as the high dimensionality of expression vectors make the problem highly challenging. Consequently, although numerous tools have been developed, their accuracy remains to be improved. RESULTS: Here, we introduce a novel clustering algorithm and tool RCSL (Rank Constrained Similarity Learning) to accurately identify various cell types using scRNA-seq data from a complex tissue. RCSL considers both local similarity and global similarity among the cells to discern the subtle differences among cells of the same type as well as larger differences among cells of different types. RCSL uses Spearman's rank correlations of a cell's expression vector with those of other cells to measure its global similarity, and adaptively learns neighbor representation of a cell as its local similarity. The overall similarity of a cell to other cells is a linear combination of its global similarity and local similarity. RCSL automatically estimates the number of cell types defined in the similarity matrix, and identifies them by constructing a block-diagonal matrix, such that its distance to the similarity matrix is minimized. Each block-diagonal submatrix is a cell cluster/type, corresponding to a connected component in the cognate similarity graph. When tested on 16 benchmark scRNA-seq datasets in which the cell types are well-annotated, RCSL substantially outperformed six state-of-the-art methods in accuracy and robustness as measured by three metrics. AVAILABILITY AND IMPLEMENTATION: The RCSL algorithm is implemented in R and can be freely downloaded at https://cran.r-project.org/web/packages/RCSL/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

8.
BMC Microbiol ; 22(1): 132, 2022 05 14.
Artículo en Inglés | MEDLINE | ID: mdl-35568809

RESUMEN

BACKGROUND: Microbiota play important roles in the gastrointestinal tract (GIT) of dairy cattle as the communities are responsible for host health, growth, and production performance. However, a systematic characterization and comparison of microbial communities in the GIT of cattle housed in different management units on a modern dairy farm are still lacking. We used 16S rRNA gene sequencing to evaluate the fecal bacterial communities of 90 dairy cattle housed in 12 distinctly defined management units on a modern dairy farm. RESULTS: We found that cattle from management units 5, 6, 8, and 9 had similar bacterial communities while the other units showed varying levels of differences. Hutch calves had a dramatically different bacterial community than adult cattle, with at least 10 genera exclusively detected in their samples but not in non-neonatal cattle. Moreover, we compared fecal bacteria of cattle from every pair of the management units and detailed the number and relative abundance of the significantly differential genera. Lastly, we identified 181 pairs of strongly correlated taxa in the community, showing possible synergistic or antagonistic relationships. CONCLUSIONS: This study assesses the fecal microbiota of cattle from 12 distinctly defined management units along the production line on a California dairy farm. The results highlight the similarities and differences of fecal microbiota between cattle from each pair of the management units. Especially, the data indicate that the newborn calves host very different gut bacterial communities than non-neonatal cattle, while non-neonatal cattle adopt one of the two distinct types of gut bacterial communities with subtle differences among the management units. The gut microbial communities of dairy cattle change dramatically in bacterial abundances at different taxonomic levels along the production line. The findings provide a reference for research and practice in modern dairy farm management.


Asunto(s)
Microbiota , Animales , Bacterias/genética , Bovinos , Heces/microbiología , Tracto Gastrointestinal/microbiología , ARN Ribosómico 16S/genética
9.
J Appl Microbiol ; 133(5): 2915-2930, 2022 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-35882518

RESUMEN

Intestinal microbiota is considered to play an integral role in maintaining health of host by modulating several physiological functions including nutrition, metabolism and immunity. Accumulated data from human and animal studies indicate that intestinal microbes can affect lipid metabolism in host through various direct and indirect biological mechanisms. These mechanisms include the production of various signalling molecules by the intestinal microbiome, which exert a strong effect on lipid metabolism, bile secretion in the liver, reverse transport of cholesterol and energy expenditure and insulin sensitivity in peripheral tissues. This review discusses the findings of recent studies suggesting an emerging role of intestinal microbiota and its metabolites in regulating lipid metabolism and the association of intestinal microbiota with obesity. Additionally, we discuss the controversies and challenges in this research area. However, intestinal micro-organisms are also affected by some external factors, which in turn influence the regulation of microbial lipid metabolism. Therefore, we also discuss the effects of probiotics, prebiotics, diet structure, exercise and other factors on intestinal microbiological changes and lipid metabolism regulation.


Asunto(s)
Microbioma Gastrointestinal , Probióticos , Animales , Humanos , Prebióticos , Metabolismo de los Lípidos , Obesidad/microbiología
10.
Bioinformatics ; 36(20): 5054-5060, 2020 12 22.
Artículo en Inglés | MEDLINE | ID: mdl-32653907

RESUMEN

MOTIVATION: Biclustering has emerged as a powerful approach to identifying functional patterns in complex biological data. However, existing tools are limited by their accuracy and efficiency to recognize various kinds of complex biclusters submerged in ever large datasets. We introduce a novel fast and highly accurate algorithm RecBic to identify various forms of complex biclusters in gene expression datasets. RESULTS: We designed RecBic to identify various trend-preserving biclusters, particularly, those with narrow shapes, i.e. clusters where the number of genes is larger than the number of conditions/samples. Given a gene expression matrix, RecBic starts with a column seed, and grows it into a full-sized bicluster by simply repetitively comparing real numbers. When tested on simulated datasets in which the elements of implanted trend-preserving biclusters and those of the background matrix have the same distribution, RecBic was able to identify the implanted biclusters in a nearly perfect manner, outperforming all the compared salient tools in terms of accuracy and robustness to noise and overlaps between the clusters. Moreover, RecBic also showed superiority in identifying functionally related genes in real gene expression datasets. AVAILABILITY AND IMPLEMENTATION: Code, sample input data and usage instructions are available at the following websites. Code: https://github.com/holyzews/RecBic/tree/master/RecBic/. Data: http://doi.org/10.5281/zenodo.3842717. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica , Análisis por Conglomerados , Expresión Génica
11.
BMC Genomics ; 21(1): 537, 2020 Aug 04.
Artículo en Inglés | MEDLINE | ID: mdl-32753030

RESUMEN

BACKGROUND: Protein phosphorylation by kinases plays crucial roles in various biological processes including signal transduction and tumorigenesis, thus a better understanding of protein phosphorylation events in cells is fundamental for studying protein functions and designing drugs to treat diseases caused by the malfunction of phosphorylation. Although a large number of phosphorylation sites in proteins have been identified using high-throughput phosphoproteomic technologies, their specific catalyzing kinases remain largely unknown. Therefore, computational methods are urgently needed to predict the kinases that catalyze the phosphorylation of these sites. RESULTS: We developed KSP, a new algorithm for predicting catalyzing kinases for experimentally identified phosphorylation sites in human proteins. KSP constructs a network based on known protein-protein interactions and kinase-substrate relationships. Based on the network, it computes an affinity score between a phosphorylation site and kinases, and returns the top-ranked kinases of the score as candidate catalyzing kinases. When tested on known kinase-substrate pairs, KSP outperforms existing methods including NetworKIN, iGPS, and PKIS. CONCLUSIONS: We developed a novel accurate tool for predicting catalyzing kinases of known phosphorylation sites. It can work as a complementary network approach for sequence-based phosphorylation site predictors.


Asunto(s)
Proteínas Quinasas , Transducción de Señal , Algoritmos , Humanos , Fosforilación , Proteínas Quinasas/metabolismo
12.
Bioinformatics ; 35(22): 4632-4639, 2019 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-31070745

RESUMEN

MOTIVATION: The availability of numerous ChIP-seq datasets for transcription factors (TF) has provided an unprecedented opportunity to identify all TF binding sites in genomes. However, the progress has been hindered by the lack of a highly efficient and accurate tool to find not only the target motifs, but also cooperative motifs in very big datasets. RESULTS: We herein present an ultrafast and accurate motif-finding algorithm, ProSampler, based on a novel numeration method and Gibbs sampler. ProSampler runs orders of magnitude faster than the fastest existing tools while often more accurately identifying motifs of both the target TFs and cooperators. Thus, ProSampler can greatly facilitate the efforts to identify the entire cis-regulatory code in genomes. AVAILABILITY AND IMPLEMENTATION: Source code and binaries are freely available for download at https://github.com/zhengchangsulab/prosampler. It was implemented in C++ and supported on Linux, macOS and MS Windows platforms. SUPPLEMENTARY INFORMATION: Supplementary materials are available at Bioinformatics online.


Asunto(s)
Secuenciación de Inmunoprecipitación de Cromatina , Programas Informáticos , Algoritmos , Sitios de Unión , Inmunoprecipitación de Cromatina
13.
Nucleic Acids Res ; 46(11): 5395-5409, 2018 06 20.
Artículo en Inglés | MEDLINE | ID: mdl-29733395

RESUMEN

Accumulating evidence indicates that transcription factor (TF) binding sites, or cis-regulatory elements (CREs), and their clusters termed cis-regulatory modules (CRMs) play a more important role than do gene-coding sequences in specifying complex traits in humans, including the susceptibility to common complex diseases. To fully characterize their roles in deriving the complex traits/diseases, it is necessary to annotate all CREs and CRMs encoded in the human genome. However, the current annotations of CREs and CRMs in the human genome are still very limited and mostly coarse-grained, as they often lack the detailed information of CREs in CRMs. Here, we integrated 620 TF ChIP-seq datasets produced by the ENCODE project for 168 TFs in 79 different cell/tissue types and predicted an unprecedentedly completely map of CREs in CRMs in the human genome at single nucleotide resolution. The map includes 305 912 CRMs containing a total of 1 178 913 CREs belonging to 736 unique TF binding motifs. The predicted CREs and CRMs tend to be subject to either purifying selection or positive selection, thus are likely to be functional. Based on the results, we also examined the status of available ChIP-seq datasets for predicting the entire regulatory genome of humans.


Asunto(s)
Secuencia de Bases/genética , Genoma Humano/genética , Elementos Reguladores de la Transcripción/genética , Secuencias Reguladoras de Ácidos Nucleicos/genética , Algoritmos , Sitios de Unión , Línea Celular Tumoral , Predisposición Genética a la Enfermedad/genética , Células HeLa , Humanos
14.
BMC Genomics ; 20(1): 709, 2019 Sep 12.
Artículo en Inglés | MEDLINE | ID: mdl-31510916

RESUMEN

BACKGROUND: Although DNA sequence plays a crucial role in establishing the unique epigenome of a cell type, little is known about the sequence determinants that lead to the unique epigenomes of different cell types produced during cell differentiation. To fill this gap, we employed two types of deep convolutional neural networks (CNNs) constructed for each of differentially related cell types and for each of histone marks measured in the cells, to learn the sequence determinants of various histone modification patterns in each cell type. RESULTS: We applied our models to four differentially related human CD4+ T cell types and six histone marks measured in each cell type. The cell models can accurately predict the histone marks in each cell type, while the mark models can also accurately predict the cell types based on a single mark. Sequence motifs learned by both the cell or mark models are highly similar to known binding motifs of transcription factors known to play important roles in CD4+ T cell differentiation. Both the unique histone mark patterns in each cell type and the different patterns of the same histone mark in different cell types are determined by a set of motifs with unique combinations. Interestingly, the level of sharing motifs learned in the different cell models reflects the lineage relationships of the cells, while the level of sharing motifs learned in the different histone mark models reflects their functional relationships. These models can also enable the prediction of the importance of learned motifs and their interactions in determining specific histone mark patterns in the cell types. CONCLUSION: Sequence determinants of various histone modification patterns in different cell types can be revealed by comparative analysis of motifs learned in the CNN models for multiple cell types and histone marks. The learned motifs are interpretable and may provide insights into the underlying molecular mechanisms of establishing the unique epigenomes in different cell types. Thus, our results support the hypothesis that DNA sequences ultimately determine the unique epigenomes of different cell types through their interactions with transcriptional factors, epigenome remodeling system and extracellular cues during cell differentiation.


Asunto(s)
Diferenciación Celular/genética , Aprendizaje Profundo , Epigenómica , Linfocitos T CD4-Positivos/citología , Linfocitos T CD4-Positivos/metabolismo , Linaje de la Célula , Secuencia Conservada , Código de Histonas , Humanos , Motivos de Nucleótidos/genética
15.
Biochem Biophys Res Commun ; 519(4): 714-720, 2019 11 19.
Artículo en Inglés | MEDLINE | ID: mdl-31543345

RESUMEN

Proteases play critical roles in a wide variety of fundamental biological functions, and numerous protease inhibitors have been developed to treat various diseases including cancer. A wide range of experimental and computational methods have been developed to investigate the specificity and catalytic mechanisms of proteases. However, these methods only focused on the preferences of a single position around a cleavage site in a substrate, rarely on the compositionality of the subsites. We present new methods to quantify the specificity of proteases by considering the combinatorial patterns of amino acid residuals of cleavage sites in substrates. By incorporating the preference at positions, we modeled three types of favorable combinations of residues in cleavage sites. Moreover, by constructing a relationship weight matrix of residues between two positions, we can easily identify unfavorable combinations of residues at the positions. Applying these methods to a set of known cleavage sites of proteases, we revealed numerous favorable and unfavorable residues in cooperative positions in the protease cleavage sites. The results can help understand the specificity and catalytic mechanisms of proteases. To our knowledge, this is the first study that quantifies unfavorable combinations of amino acids between two sites. Furthermore, this method is not limited to the study of proteases and cleavage sites, and can be generalized to uncover the relationships of residues at meaningful sites in other proteins.


Asunto(s)
Algoritmos , Aminoácidos/metabolismo , Modelos Teóricos , Péptido Hidrolasas/metabolismo , Secuencia de Aminoácidos , Aminoácidos/genética , Animales , Sitios de Unión/genética , Biocatálisis , Humanos , Péptido Hidrolasas/genética , Especificidad por Sustrato
16.
Bioinformatics ; 31(12): 1974-80, 2015 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-25805722

RESUMEN

MOTIVATION: The recent advance of single-cell technologies has brought new insights into complex biological phenomena. In particular, genome-wide single-cell measurements such as transcriptome sequencing enable the characterization of cellular composition as well as functional variation in homogenic cell populations. An important step in the single-cell transcriptome analysis is to group cells that belong to the same cell types based on gene expression patterns. The corresponding computational problem is to cluster a noisy high dimensional dataset with substantially fewer objects (cells) than the number of variables (genes). RESULTS: In this article, we describe a novel algorithm named shared nearest neighbor (SNN)-Cliq that clusters single-cell transcriptomes. SNN-Cliq utilizes the concept of shared nearest neighbor that shows advantages in handling high-dimensional data. When evaluated on a variety of synthetic and real experimental datasets, SNN-Cliq outperformed the state-of-the-art methods tested. More importantly, the clustering results of SNN-Cliq reflect the cell types or origins with high accuracy. AVAILABILITY AND IMPLEMENTATION: The algorithm is implemented in MATLAB and Python. The source code can be downloaded at http://bioinfo.uncc.edu/SNNCliq.


Asunto(s)
Algoritmos , Linaje de la Célula/genética , Análisis por Conglomerados , Embrión de Mamíferos/metabolismo , Neoplasias/genética , Análisis de la Célula Individual/métodos , Transcriptoma , Animales , Embrión de Mamíferos/citología , Regulación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Ratones , Lenguajes de Programación
17.
Dev Biol ; 393(2): 236-244, 2014 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-25050933

RESUMEN

The nematode Caenorhabditis elegans (C. elegans) is an ideal model organism to study the cell fate specification mechanisms during embryogenesis. It is generally believed that cell fate specification in C. elegans is mainly mediated by lineage-based mechanisms, where the specification paths are driven forward by a succession of asymmetric cell divisions. However, little is known about how each binary decision is made by gene regulatory programs. In this study, we endeavor to obtain a global understanding of cell lineage/fate divergence processes during the early embryogenesis of C. elegans. We reanalyzed the EPIC data set, which traced the expression level of reporter genes at single-cell resolution on a nearly continuous time scale up to the 350-cell stage in C. elegans embryos. We examined the expression patterns for a total of 131 genes from 287 embryos with high quality image recordings, among which 86 genes have replicate embryos. Our results reveal that during early embryogenesis, divergence between sister lineages could be largely explained by a few genes. We predicted genes driving lineage divergence and explored their expression patterns in sister lineages. Moreover, we found that divisions leading to fate divergence are associated with a large number of genes being differentially expressed between sister lineages. Interestingly, we found that the developmental paths of lineages could be differentiated by a small set of genes. Therefore, our results support the notion that the cell fate patterns in C. elegans are achieved through stepwise binary decisions punctuated by cell divisions. Our predicted genes driving lineage divergence provide good starting points for future detailed characterization of their roles in the embryogenesis in this important model organism.


Asunto(s)
Caenorhabditis elegans/genética , Linaje de la Célula/genética , Desarrollo Embrionario/genética , Animales , Proteínas de Caenorhabditis elegans/genética , Diferenciación Celular/genética , División Celular , Expresión Génica , Regulación del Desarrollo de la Expresión Génica , Genes Reporteros/genética
18.
BMC Genomics ; 15: 1047, 2014 Dec 02.
Artículo en Inglés | MEDLINE | ID: mdl-25442502

RESUMEN

BACKGROUND: In eukaryotes, transcriptional regulation is usually mediated by interactions of multiple transcription factors (TFs) with their respective specific cis-regulatory elements (CREs) in the so-called cis-regulatory modules (CRMs) in DNA. Although the knowledge of CREs and CRMs in a genome is crucial to elucidate gene regulatory networks and understand many important biological phenomena, little is known about the CREs and CRMs in most eukaryotic genomes due to the difficulty to characterize them by either computational or traditional experimental methods. However, the exponentially increasing number of TF binding location data produced by the recent wide adaptation of chromatin immunoprecipitation coupled with microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) technologies has provided an unprecedented opportunity to identify CRMs and CREs in genomes. Nonetheless, how to effectively mine these large volumes of ChIP data to identify CREs and CRMs at nucleotide resolution is a highly challenging task. RESULTS: We have developed a novel graph-theoretic based algorithm DePCRM for genome-wide de novo predictions of CREs and CRMs using a large number of ChIP datasets. DePCRM predicts CREs and CRMs by identifying overrepresented combinatorial CRE motif patterns in multiple ChIP datasets in an effective way. When applied to 168 ChIP datasets of 56 TFs from D. melanogaster, DePCRM identified 184 and 746 overrepresented CRE motifs and their combinatorial patterns, respectively, and predicted a total of 115,932 CRMs in the genome. The predictions recover 77.9% of known CRMs in the datasets and 89.3% of known CRMs containing at least one predicted CRE. We found that the putative CRMs as well as CREs as a whole in a CRM are more conserved than randomly selected sequences. CONCLUSION: Our results suggest that the CRMs predicted by DePCRM are highly likely to be functional. Our algorithm is the first of its kind for de novo genome-wide prediction of CREs and CRMs using larger number of transcription factor ChIP datasets. The algorithm and predictions will hopefully facilitate the elucidation of gene regulatory networks in eukaryotes. All the predicted CREs, CRMs, and their target genes are available at http://bioinfo.uncc.edu/mniu/pcrms/www/.


Asunto(s)
Drosophila melanogaster/genética , Elementos Reguladores de la Transcripción/genética , Secuencias Reguladoras de Ácidos Nucleicos/genética , Programas Informáticos , Algoritmos , Animales , Secuencia de Bases , Sitios de Unión/genética , Inmunoprecipitación de Cromatina , Regulación de la Expresión Génica , Genoma , Humanos , Análisis por Micromatrices , Unión Proteica/genética , Factores de Transcripción/genética
19.
Adv Genet (Hoboken) ; 5(2): 2300209, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38884049

RESUMEN

The VISTA enhancer database is a valuable resource for evaluating predicted enhancers in humans and mice. In addition to thousands of validated positive regions (VPRs) in the human and mouse genomes, the database also contains similar numbers of validated negative regions (VNRs). It is previously shown that the VPRs are on average half as long as predicted overlapping enhancers that are highly conserved and hypothesize that the VPRs may be truncated forms of long bona fide enhancers. Here, it is shown that like the VPRs, the VNRs also are under strong evolutionary constraints and overlap predicted enhancers in the genomes. The VNRs are also on average half as long as predicted overlapping enhancers that are highly conserved. Moreover, the VNRs and the VPRs display similar cell/tissue-specific modification patterns of key epigenetic marks of active enhancers. Furthermore, the VNRs and the VPRs show similar impact score spectra of in silico mutagenesis. These highly similar properties between the VPRs and the VNRs suggest that like the VPRs, the VNRs may also be truncated forms of long bona fide enhancers.

20.
bioRxiv ; 2024 Jun 04.
Artículo en Inglés | MEDLINE | ID: mdl-38895354

RESUMEN

The oocyte germline of the C. elegans hermaphrodite presents a unique model to study the formation of oocytes. However, the size of the model animal and difficulties in retrieval of specific stages of the germline have obviated closer systematic studies of this process throughout the years. Here, we present a transcriptomic level analysis into the oogenesis of C. elegans hermaphrodites. We dissected a hermaphrodite gonad into seven sections corresponding to the mitotic distal region, the pachytene, the diplotene, the early diakinesis region and the 3 most proximal oocytes, and deeply sequenced the transcriptome of each of them along with that of the fertilized egg using a single-cell RNA-seq protocol. We identified specific gene expression events as well as gene splicing events in finer detail along the oocyte germline and provided novel insights into underlying mechanisms of the oogenesis process. Furthermore, through careful review of relevant research literature coupled with patterns observed in our analysis, we attempt to delineate transcripts that may serve functions in the interaction between the germline and cells of the somatic gonad. These results expand our knowledge of the transcriptomic space of the C. elegans germline and lay a foundation on which future studies of the germline can be based upon.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA