Pesquisa | Biblioteca Virtual em Saúde

1.

GENCODE 2021.

Frankish, Adam; Diekhans, Mark; Jungreis, Irwin; Lagarde, Julien; Loveland, Jane E; Mudge, Jonathan M; Sisu, Cristina; Wright, James C; Armstrong, Joel; Barnes, If; Berry, Andrew; Bignell, Alexandra; Boix, Carles; Carbonell Sala, Silvia; Cunningham, Fiona; Di Domenico, Tomás; Donaldson, Sarah; Fiddes, Ian T; García Girón, Carlos; Gonzalez, Jose Manuel; Grego, Tiago; Hardy, Matthew; Hourlier, Thibaut; Howe, Kevin L; Hunt, Toby; Izuogu, Osagie G; Johnson, Rory; Martin, Fergal J; Martínez, Laura; Mohanan, Shamika; Muir, Paul; Navarro, Fabio C P; Parker, Anne; Pei, Baikang; Pozo, Fernando; Riera, Ferriol Calvet; Ruffier, Magali; Schmitt, Bianca M; Stapleton, Eloise; Suner, Marie-Marthe; Sycheva, Irina; Uszczynska-Ratajczak, Barbara; Wolf, Maxim Y; Xu, Jinuri; Yang, Yucheng T; Yates, Andrew; Zerbino, Daniel; Zhang, Yan; Choudhary, Jyoti S; Gerstein, Mark.

Nucleic Acids Res ; 49(D1): D916-D923, 2021 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-33270111

RESUMO

The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.

Assuntos

COVID-19/prevenção & controle , Biologia Computacional/métodos , Bases de Dados Genéticas , Genômica/métodos , Anotação de Sequência Molecular/métodos , SARS-CoV-2/genética , Animais , COVID-19/epidemiologia , COVID-19/virologia , Epidemias , Humanos , Internet , Camundongos , Pseudogenes/genética , RNA Longo não Codificante/genética , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiologia , Transcrição Gênica/genética

2.

Target and tissue selectivity of PROTAC degraders.

Guenette, Robert G; Yang, Seung Wook; Min, Jaeki; Pei, Baikang; Potts, Patrick Ryan.

Chem Soc Rev ; 51(14): 5740-5756, 2022 Jul 18.

Artigo em Inglês | MEDLINE | ID: mdl-35587208

RESUMO

Targeted protein degradation (TPD) strategies have revolutionized how scientists tackle challenging protein targets deemed undruggable with traditional small molecule inhibitors. Many promising campaigns to inhibit proteins have failed due to factors surrounding inhibition selectivity and targeting of compounds to specific tissues and cell types. One of the major improvements that PROTAC (proteolysis targeting chimera) and molecular glue technology can exert is highly selective control of target inhibition. Multiple studies have shown that PROTACs can gain selectivity for their protein targets beyond that of their parent ligands via optimization of linker length and stabilization of ternary complexes. Due to the bifunctional nature of PROTACs, the tissue selective nature of E3 ligases can be exploited to uncover novel targeting mechanisms. In this review, we provide critical analysis of the recent progress towards making selective PROTAC molecules and new PROTAC technologies that will continue to push the boundaries of achieving selectivity. These efforts have wide implications in the future of treating disease as they will broaden the possible targets that can be addressed by small molecules, like undruggable proteins or broadly active targets that would benefit from degradation in specific tissue types.

Assuntos

Proteólise , Ubiquitina-Proteína Ligases , Ligantes , Ubiquitina-Proteína Ligases/metabolismo

3.

GENCODE reference annotation for the human and mouse genomes.

Frankish, Adam; Diekhans, Mark; Ferreira, Anne-Maud; Johnson, Rory; Jungreis, Irwin; Loveland, Jane; Mudge, Jonathan M; Sisu, Cristina; Wright, James; Armstrong, Joel; Barnes, If; Berry, Andrew; Bignell, Alexandra; Carbonell Sala, Silvia; Chrast, Jacqueline; Cunningham, Fiona; Di Domenico, Tomás; Donaldson, Sarah; Fiddes, Ian T; García Girón, Carlos; Gonzalez, Jose Manuel; Grego, Tiago; Hardy, Matthew; Hourlier, Thibaut; Hunt, Toby; Izuogu, Osagie G; Lagarde, Julien; Martin, Fergal J; Martínez, Laura; Mohanan, Shamika; Muir, Paul; Navarro, Fabio C P; Parker, Anne; Pei, Baikang; Pozo, Fernando; Ruffier, Magali; Schmitt, Bianca M; Stapleton, Eloise; Suner, Marie-Marthe; Sycheva, Irina; Uszczynska-Ratajczak, Barbara; Xu, Jinuri; Yates, Andrew; Zerbino, Daniel; Zhang, Yan; Aken, Bronwen; Choudhary, Jyoti S; Gerstein, Mark; Guigó, Roderic; Hubbard, Tim J P.

Nucleic Acids Res ; 47(D1): D766-D773, 2019 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-30357393

RESUMO

The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.

Assuntos

Bases de Dados Genéticas , Genoma Humano/genética , Genômica , Pseudogenes/genética , Animais , Biologia Computacional , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Software

4.

IConMHC: a deep learning convolutional neural network model to predict peptide and MHC-I binding affinity.

Pei, Baikang; Hsu, Yi-Hsiang.

Immunogenetics ; 72(5): 295-304, 2020 07.

Artigo em Inglês | MEDLINE | ID: mdl-32577798

RESUMO

Tumor-specific neoantigens are mutated self-peptides presented by tumor cell major histocompatibility complex (MHC) molecules and are necessary to elicit host's anti-cancer cytotoxic T cell responses. It could be specifically recognized by neoantigen-specific T cell receptors (TCRs). However, current wet-lab assays for identifying peptide MHC binding are too expensive and time-consuming to meet the clinical needs. In this study, we developed an in silico method with a deep convolutional neural network (CNN) model, iConMHC, to predict peptide MHC binding affinity. Unlike other in silico methods that only learn from properties of amino acid in neoantigen peptides alone and/or MHCs alone, iConMHC learns from physical and chemical interaction properties between pairwise amino acids from the two molecules. These properties, such as contact potentials and distances in folded proteins, directly affect neoantigen-MHC binding affinity. In addition, IConMHC is a pan-allele model that is capable of making predictions for all the MHC alleles. Even for those rare MHC alleles without training data, iConMHC can make predictions with reasonable accuracy. We benchmarked iConMHC with other commonly used MHC-I binding predictors and found our model performs better than most of the pan-allele models.

Assuntos

Aprendizado Profundo , Antígenos de Histocompatibilidade Classe I/metabolismo , Peptídeos/metabolismo , Alelos , Sequência de Aminoácidos , Antígenos de Neoplasias/química , Antígenos de Neoplasias/metabolismo , Simulação por Computador , Bases de Dados de Proteínas , Antígenos de Histocompatibilidade Classe I/química , Antígenos de Histocompatibilidade Classe I/genética , Humanos , Redes Neurais de Computação , Peptídeos/química , Ligação Proteica , Reprodutibilidade dos Testes

5.

Comparative analysis of the transcriptome across distant species.

Gerstein, Mark B; Rozowsky, Joel; Yan, Koon-Kiu; Wang, Daifeng; Cheng, Chao; Brown, James B; Davis, Carrie A; Hillier, LaDeana; Sisu, Cristina; Li, Jingyi Jessica; Pei, Baikang; Harmanci, Arif O; Duff, Michael O; Djebali, Sarah; Alexander, Roger P; Alver, Burak H; Auerbach, Raymond; Bell, Kimberly; Bickel, Peter J; Boeck, Max E; Boley, Nathan P; Booth, Benjamin W; Cherbas, Lucy; Cherbas, Peter; Di, Chao; Dobin, Alex; Drenkow, Jorg; Ewing, Brent; Fang, Gang; Fastuca, Megan; Feingold, Elise A; Frankish, Adam; Gao, Guanjun; Good, Peter J; Guigó, Roderic; Hammonds, Ann; Harrow, Jen; Hoskins, Roger A; Howald, Cédric; Hu, Long; Huang, Haiyan; Hubbard, Tim J P; Huynh, Chau; Jha, Sonali; Kasper, Dionna; Kato, Masaomi; Kaufman, Thomas C; Kitchen, Robert R; Ladewig, Erik; Lagarde, Julien.

Nature ; 512(7515): 445-8, 2014 Aug 28.

Artigo em Inglês | MEDLINE | ID: mdl-25164755

RESUMO

The transcriptome is the readout of the genome. Identifying common features in it across distant species can reveal fundamental principles. To this end, the ENCODE and modENCODE consortia have generated large amounts of matched RNA-sequencing data for human, worm and fly. Uniform processing and comprehensive annotation of these data allow comparison across metazoan phyla, extending beyond earlier within-phylum transcriptome comparisons and revealing ancient, conserved features. Specifically, we discover co-expression modules shared across animals, many of which are enriched in developmental genes. Moreover, we use expression patterns to align the stages in worm and fly development and find a novel pairing between worm embryo and fly pupae, in addition to the embryo-to-embryo and larvae-to-larvae pairings. Furthermore, we find that the extent of non-canonical, non-coding transcription is similar in each organism, per base pair. Finally, we find in all three organisms that the gene-expression levels, both coding and non-coding, can be quantitatively predicted from chromatin features at the promoter using a 'universal model' based on a single set of organism-independent parameters.

Assuntos

Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Perfilação da Expressão Gênica , Transcriptoma/genética , Animais , Caenorhabditis elegans/embriologia , Caenorhabditis elegans/crescimento & desenvolvimento , Cromatina/genética , Análise por Conglomerados , Drosophila melanogaster/crescimento & desenvolvimento , Regulação da Expressão Gênica no Desenvolvimento/genética , Histonas/metabolismo , Humanos , Larva/genética , Larva/crescimento & desenvolvimento , Modelos Genéticos , Anotação de Sequência Molecular , Regiões Promotoras Genéticas/genética , Pupa/genética , Pupa/crescimento & desenvolvimento , RNA não Traduzido/genética , Análise de Sequência de RNA

6.

Comparative analysis of pseudogenes across three phyla.

Sisu, Cristina; Pei, Baikang; Leng, Jing; Frankish, Adam; Zhang, Yan; Balasubramanian, Suganthi; Harte, Rachel; Wang, Daifeng; Rutenberg-Schoenberg, Michael; Clark, Wyatt; Diekhans, Mark; Rozowsky, Joel; Hubbard, Tim; Harrow, Jennifer; Gerstein, Mark B.

Proc Natl Acad Sci U S A ; 111(37): 13361-6, 2014 Sep 16.

Artigo em Inglês | MEDLINE | ID: mdl-25157146

RESUMO

Pseudogenes are degraded fossil copies of genes. Here, we report a comparison of pseudogenes spanning three phyla, leveraging the completed annotations of the human, worm, and fly genomes, which we make available as an online resource. We find that pseudogenes are lineage specific, much more so than protein-coding genes, reflecting the different remodeling processes marking each organism's genome evolution. The majority of human pseudogenes are processed, resulting from a retrotranspositional burst at the dawn of the primate lineage. This burst can be seen in the largely uniform distribution of pseudogenes across the genome, their preservation in areas with low recombination rates, and their preponderance in highly expressed gene families. In contrast, worm and fly pseudogenes tell a story of numerous duplication events. In worm, these duplications have been preserved through selective sweeps, so we see a large number of pseudogenes associated with highly duplicated families such as chemoreceptors. However, in fly, the large effective population size and high deletion rate resulted in a depletion of the pseudogene complement. Despite large variations between these species, we also find notable similarities. Overall, we identify a broad spectrum of biochemical activity for pseudogenes, with the majority in each organism exhibiting varying degrees of partial activity. In particular, we identify a consistent amount of transcription (â¼15%) across all species, suggesting a uniform degradation process. Also, we see a uniform decay of pseudogene promoter activity relative to their coding counterparts and identify a number of pseudogenes with conserved upstream sequences and activity, hinting at potential regulatory roles.

Assuntos

Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Filogenia , Pseudogenes/genética , Animais , Evolução Molecular , Estudos de Associação Genética , Humanos , Anotação de Sequência Molecular , Regiões Promotoras Genéticas/genética , Homologia de Sequência do Ácido Nucleico

7.

Analysis of variable retroduplications in human populations suggests coupling of retrotransposition to cell division.

Abyzov, Alexej; Iskow, Rebecca; Gokcumen, Omer; Radke, David W; Balasubramanian, Suganthi; Pei, Baikang; Habegger, Lukas; Lee, Charles; Gerstein, Mark.

Genome Res ; 23(12): 2042-52, 2013 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-24026178

RESUMO

In primates and other animals, reverse transcription of mRNA followed by genomic integration creates retroduplications. Expressed retroduplications are either "retrogenes" coding for functioning proteins, or expressed "processed pseudogenes," which can function as noncoding RNAs. To date, little is known about the variation in retroduplications in terms of their presence or absence across individuals in the human population. We have developed new methodologies that allow us to identify "novel" retroduplications (i.e., those not present in the reference genome), to find their insertion points, and to genotype them. Using these methods, we catalogued and analyzed 174 retroduplication variants in almost one thousand humans, which were sequenced as part of Phase 1 of The 1000 Genomes Project Consortium. The accuracy of our data set was corroborated by (1) multiple lines of sequencing evidence for retroduplication (e.g., depth of coverage in exons vs. introns), (2) experimental validation, and (3) the fact that we can reconstruct a correct phylogenetic tree of human subpopulations based solely on retroduplications. We also show that parent genes of retroduplication variants tend to be expressed at the M-to-G1 transition in the cell cycle and that M-to-G1 expressed genes have more copies of fixed retroduplications than genes expressed at other times. These findings suggest that cell division is coupled to retrotransposition and, perhaps, is even a requirement for it.

Assuntos

Divisão Celular/genética , Duplicação Gênica , Retroelementos/genética , Biologia Computacional/métodos , Evolução Molecular , Genoma Humano , Genótipo , Humanos , Filogenia , Pseudogenes , Reprodutibilidade dos Testes , Análise de Sequência de DNA

8.

GENCODE: the reference human genome annotation for The ENCODE Project.

Harrow, Jennifer; Frankish, Adam; Gonzalez, Jose M; Tapanari, Electra; Diekhans, Mark; Kokocinski, Felix; Aken, Bronwen L; Barrell, Daniel; Zadissa, Amonida; Searle, Stephen; Barnes, If; Bignell, Alexandra; Boychenko, Veronika; Hunt, Toby; Kay, Mike; Mukherjee, Gaurab; Rajan, Jeena; Despacio-Reyes, Gloria; Saunders, Gary; Steward, Charles; Harte, Rachel; Lin, Michael; Howald, Cédric; Tanzer, Andrea; Derrien, Thomas; Chrast, Jacqueline; Walters, Nathalie; Balasubramanian, Suganthi; Pei, Baikang; Tress, Michael; Rodriguez, Jose Manuel; Ezkurdia, Iakes; van Baren, Jeltje; Brent, Michael; Haussler, David; Kellis, Manolis; Valencia, Alfonso; Reymond, Alexandre; Gerstein, Mark; Guigó, Roderic; Hubbard, Tim J.

Genome Res ; 22(9): 1760-74, 2012 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-22955987

RESUMO

The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.

Assuntos

Bases de Dados Genéticas , Genoma Humano , Genômica/métodos , Anotação de Sequência Molecular , Animais , Biologia Computacional/métodos , DNA Complementar/química , DNA Complementar/genética , Evolução Molecular , Éxons , Loci Gênicos , Humanos , Internet , Modelos Moleculares , Fases de Leitura Aberta , Pseudogenes , Controle de Qualidade , Sítios de Splice de RNA , RNA Longo não Codificante , Reprodutibilidade dos Testes , Regiões não Traduzidas

9.

Enhanced transcriptome maps from multiple mouse tissues reveal evolutionary constraint in gene expression.

Pervouchine, Dmitri D; Djebali, Sarah; Breschi, Alessandra; Davis, Carrie A; Barja, Pablo Prieto; Dobin, Alex; Tanzer, Andrea; Lagarde, Julien; Zaleski, Chris; See, Lei-Hoon; Fastuca, Meagan; Drenkow, Jorg; Wang, Huaien; Bussotti, Giovanni; Pei, Baikang; Balasubramanian, Suganthi; Monlong, Jean; Harmanci, Arif; Gerstein, Mark; Beer, Michael A; Notredame, Cedric; Guigó, Roderic; Gingeras, Thomas R.

Nat Commun ; 6: 5903, 2015 Jan 13.

Artigo em Inglês | MEDLINE | ID: mdl-25582907

RESUMO

Mice have been a long-standing model for human biology and disease. Here we characterize, by RNA sequencing, the transcriptional profiles of a large and heterogeneous collection of mouse tissues, augmenting the mouse transcriptome with thousands of novel transcript candidates. Comparison with transcriptome profiles in human cell lines reveals substantial conservation of transcriptional programmes, and uncovers a distinct class of genes with levels of expression that have been constrained early in vertebrate evolution. This core set of genes captures a substantial fraction of the transcriptional output of mammalian cells, and participates in basic functional and structural housekeeping processes common to all cell types. Perturbation of these constrained genes is associated with significant phenotypes including embryonic lethality and cancer. Evolutionary constraint in gene expression levels is not reflected in the conservation of the genomic sequences, but is associated with conserved epigenetic marking, as well as with characteristic post-transcriptional regulatory programme, in which sub-cellular localization and alternative splicing play comparatively large roles.

Assuntos

Evolução Molecular , Regulação da Expressão Gênica , Transcriptoma , Processamento Alternativo , Animais , Evolução Biológica , Linhagem Celular , Epigênese Genética , Perfilação da Expressão Gênica , Biblioteca Gênica , Genoma , Histonas/química , Humanos , Camundongos , Camundongos Endogâmicos C57BL , Modelos Genéticos , Oligonucleotídeos Antissenso , Fenótipo , Análise de Sequência de RNA

10.

Reconstruction of biological networks by incorporating prior knowledge into Bayesian network models.

Pei, Baikang; Shin, Dong-Guk.

J Comput Biol ; 19(12): 1324-34, 2012 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-23210479

RESUMO

Bayesian network model is widely used for reverse engineering of biological network structures. An advantage of this model is its capability to integrate prior knowledge into the model learning process, which can lead to improving the quality of the network reconstruction outcome. Some previous works have explored this area with focus on using prior knowledge of the direct molecular links, except for a few recent ones proposing to examine the effects of molecular orderings. In this study, we propose a Bayesian network model that can integrate both direct links and orderings into the model. Random weights are assigned to these two types of prior knowledge to alleviate bias toward certain types of information. We evaluate our model performance using both synthetic data and biological data for the RAF signaling network, and illustrate the significant improvement on network structure reconstruction of the proposing models over the existing methods. We also examine the correlation between the improvement and the abundance of ordering prior knowledge. To address the issue of generating prior knowledge, we propose an approach to automatically extract potential molecular orderings from knowledge resources such as Kyoto Encyclopedia of Genes and Genomes (KEGG) database and Gene Ontology (GO) annotation.

Assuntos

Teorema de Bayes , Biologia Computacional/métodos , Redes Reguladoras de Genes , Genoma , Modelos Biológicos , Transdução de Sinais , Bases de Dados Genéticas , Sistema de Sinalização das MAP Quinases , Quinases raf/metabolismo

11.

A Bayesian Approach to Pathway Analysis by Integrating Gene-Gene Functional Directions and Microarray Data.

Zhao, Yifang; Chen, Ming-Hui; Pei, Baikang; Rowe, David; Shin, Dong-Guk; Xie, Wangang; Yu, Fang; Kuo, Lynn.

Stat Biosci ; 4(1): 105-131, 2012 May 01.

Artigo em Inglês | MEDLINE | ID: mdl-23482678

RESUMO

Many statistical methods have been developed to screen for differentially expressed genes associated with specific phenotypes in the microarray data. However, it remains a major challenge to synthesize the observed expression patterns with abundant biological knowledge for more complete understanding of the biological functions among genes. Various methods including clustering analysis on genes, neural network, Bayesian network and pathway analysis have been developed toward this goal. In most of these procedures, the activation and inhibition relationships among genes have hardly been utilized in the modeling steps. We propose two novel Bayesian models to integrate the microarray data with the putative pathway structures obtained from the KEGG database and the directional gene-gene interactions in the medical literature. We define the symmetric Kullback-Leibler divergence of a pathway, and use it to identify the pathway(s) most supported by the microarray data. Monte Carlo Markov Chain sampling algorithm is given for posterior computation in the hierarchical model. The proposed method is shown to select the most supported pathway in an illustrative example. Finally, we apply the methodology to a real microarray data set to understand the gene expression profile of osteoblast lineage at defined stages of differentiation. We observe that our method correctly identifies the pathways that are reported to play essential roles in modulating bone mass.

12.

The GENCODE pseudogene resource.

Pei, Baikang; Sisu, Cristina; Frankish, Adam; Howald, Cédric; Habegger, Lukas; Mu, Xinmeng Jasmine; Harte, Rachel; Balasubramanian, Suganthi; Tanzer, Andrea; Diekhans, Mark; Reymond, Alexandre; Hubbard, Tim J; Harrow, Jennifer; Gerstein, Mark B.

Genome Biol ; 13(9): R51, 2012 Sep 26.

Artigo em Inglês | MEDLINE | ID: mdl-22951037

RESUMO

BACKGROUND: Pseudogenes have long been considered as nonfunctional genomic sequences. However, recent evidence suggests that many of them might have some form of biological activity, and the possibility of functionality has increased interest in their accurate annotation and integration with functional genomics data. RESULTS: As part of the GENCODE annotation of the human genome, we present the first genome-wide pseudogene assignment for protein-coding genes, based on both large-scale manual annotation and in silico pipelines. A key aspect of this coupled approach is that it allows us to identify pseudogenes in an unbiased fashion as well as untangle complex events through manual evaluation. We integrate the pseudogene annotations with the extensive ENCODE functional genomics information. In particular, we determine the expression level, transcription-factor and RNA polymerase II binding, and chromatin marks associated with each pseudogene. Based on their distribution, we develop simple statistical models for each type of activity, which we validate with large-scale RT-PCR-Seq experiments. Finally, we compare our pseudogenes with conservation and variation data from primate alignments and the 1000 Genomes project, producing lists of pseudogenes potentially under selection. CONCLUSIONS: At one extreme, some pseudogenes possess conventional characteristics of functionality; these may represent genes that have recently died. On the other hand, we find interesting patterns of partial activity, which may suggest that dead genes are being resurrected as functioning non-coding RNAs. The activity data of each pseudogene are stored in an associated resource, psiDR, which will be useful for the initial identification of potentially functional pseudogenes.

Assuntos

Genoma Humano , Pseudogenes , Transcrição Gênica , Animais , Sítios de Ligação , Cromatina/química , Cromatina/genética , Humanos , Modelos Genéticos , Modelos Estatísticos , Anotação de Sequência Molecular , Filogenia , Primatas , RNA Polimerase II/metabolismo , Sequências Reguladoras de Ácido Nucleico , Seleção Genética , Análise de Sequência de DNA , Fatores de Transcrição/metabolismo

13.

Learning Bayesian networks with integration of indirect prior knowledge.

Pei, Baikang; Rowe, David W; Shin, Dong-Guk.

Int J Data Min Bioinform ; 4(5): 505-19, 2010.

Artigo em Inglês | MEDLINE | ID: mdl-21133038

RESUMO

A Bayesian network model can be used to study the structures of gene regulatory networks. It has the ability to integrate information from both prior knowledge and experimental data. In this study, we propose an approach to efficiently integrate global ordering information into model learning, where the ordering information specifies the indirect relationships among genes. We demonstrate that, compared with a traditional Bayesian network model that uses only local prior knowledge, utilising additional global ordering knowledge can significantly improve the model's performance. The magnitude of this improvement depends on abundance of global ordering information and data quality.

Assuntos

Biologia Computacional/métodos , Redes Reguladoras de Genes/genética , Algoritmos , Teorema de Bayes , Bases de Dados Factuais

14.

Computing consistency between microarray data and known gene regulation relationships.

Shin, Dong-Guk; Kazmi, Saira A; Pei, Baikang; Kim, Yoo-Ah; Maddox, Jeffrey; Nori, Ravi; Wong, Alan; Krueger, Winfried; Rowe, David.

IEEE Trans Inf Technol Biomed ; 13(6): 1075-82, 2009 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-19783507

RESUMO

Microarray experiments produce expression patterns for thousands of genes at once. On the other hand, biomedical literature contains large amounts of gene regulation relationship information accumulated over the years. One obvious requirement is an automated way of comparing microarray data with the collection of known gene regulation relationships. Such an automated comparison is imperative because it can help biologists rapidly understand the context of a given microarray experiment. In addition, the consistency measure can be used to either validate or refute the hypothesis being tested using the microarray experiment. In this paper we present a systematic way of examining the consistency between a given set of microarray data and known gene regulation relationships. We first introduce a simple gene regulation network model with two separate algorithms designed to isolate a maximally consistent network. Subsequently, we extend the model to take into account multiple regulating factors for a single gene while highlighting both consistencies and inconsistencies. We illustrate the effectiveness of our approach with two practical examples, one that picks the peroxisome proliferator-activated receptor (PPAR) pathway as highly consistent from multiple pathways of Kyoto encyclopedia of genes and genomes (KEGG), and another that isolates key regulatory relationships involving nfkb1 and others known for macrophage's counter response to inflammation.

Assuntos

Biologia Computacional/métodos , Redes Reguladoras de Genes , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Algoritmos , Reprodutibilidade dos Testes , Transdução de Sinais , Interface Usuário-Computador

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA