Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
1.
Cell ; 186(7): 1493-1511.e40, 2023 03 30.
Artigo em Inglês | MEDLINE | ID: mdl-37001506

RESUMO

Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of 1,635 open-access datasets from four donors (∼30 tissues × âˆ¼15 assays). The datasets are mapped to matched, diploid genomes with long-read phasing and structural variants, instantiating a catalog of >1 million allele-specific loci. These loci exhibit coordinated activity along haplotypes and are less conserved than corresponding, non-allele-specific ones. Surprisingly, a deep-learning transformer model can predict the allele-specific activity based only on local nucleotide-sequence context, highlighting the importance of transcription-factor-binding motifs particularly sensitive to variants. Furthermore, combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci. It also enables models for transferring known eQTLs to difficult-to-profile tissues (e.g., from skin to heart). Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.


Assuntos
Epigenoma , Locos de Características Quantitativas , Estudo de Associação Genômica Ampla , Genômica , Fenótipo , Polimorfismo de Nucleotídeo Único
2.
Nat Methods ; 21(4): 723-734, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38504114

RESUMO

The ENCODE Consortium's efforts to annotate noncoding cis-regulatory elements (CREs) have advanced our understanding of gene regulatory landscapes. Pooled, noncoding CRISPR screens offer a systematic approach to investigate cis-regulatory mechanisms. The ENCODE4 Functional Characterization Centers conducted 108 screens in human cell lines, comprising >540,000 perturbations across 24.85 megabases of the genome. Using 332 functionally confirmed CRE-gene links in K562 cells, we established guidelines for screening endogenous noncoding elements with CRISPR interference (CRISPRi), including accurate detection of CREs that exhibit variable, often low, transcriptional effects. Benchmarking five screen analysis tools, we find that CASA produces the most conservative CRE calls and is robust to artifacts of low-specificity single guide RNAs. We uncover a subtle DNA strand bias for CRISPRi in transcribed regions with implications for screen design and analysis. Together, we provide an accessible data resource, predesigned single guide RNAs for targeting 3,275,697 ENCODE SCREEN candidate CREs with CRISPRi and screening guidelines to accelerate functional characterization of the noncoding genome.


Assuntos
Sistemas CRISPR-Cas , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Humanos , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas/genética , Sistemas CRISPR-Cas/genética , Genoma , Células K562 , RNA Guia de Sistemas CRISPR-Cas
3.
Genome Res ; 29(6): 1009-1022, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31123080

RESUMO

Caenorhabditis elegans was the first multicellular eukaryotic genome sequenced to apparent completion. Although this assembly employed a standard C. elegans strain (N2), it used sequence data from several laboratories, with DNA propagated in bacteria and yeast. Thus, the N2 assembly has many differences from any C. elegans available today. To provide a more accurate C. elegans genome, we performed long-read assembly of VC2010, a modern strain derived from N2. Our VC2010 assembly has 99.98% identity to N2 but with an additional 1.8 Mb including tandem repeat expansions and genome duplications. For 116 structural discrepancies between N2 and VC2010, 97 structures matching VC2010 (84%) were also found in two outgroup strains, implying deficiencies in N2. Over 98% of N2 genes encoded unchanged products in VC2010; moreover, we predicted ≥53 new genes in VC2010. The recompleted genome of C. elegans should be a valuable resource for genetics, genomics, and systems biology.


Assuntos
Caenorhabditis elegans/genética , Genoma Helmíntico , Genômica , Animais , Proteínas de Caenorhabditis elegans/genética , Biologia Computacional/métodos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Anotação de Sequência Molecular , Reprodutibilidade dos Testes
4.
Nucleic Acids Res ; 48(D1): D882-D889, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31713622

RESUMO

The Encyclopedia of DNA Elements (ENCODE) is an ongoing collaborative research project aimed at identifying all the functional elements in the human and mouse genomes. Data generated by the ENCODE consortium are freely accessible at the ENCODE portal (https://www.encodeproject.org/), which is developed and maintained by the ENCODE Data Coordinating Center (DCC). Since the initial portal release in 2013, the ENCODE DCC has updated the portal to make ENCODE data more findable, accessible, interoperable and reusable. Here, we report on recent updates, including new ENCODE data and assays, ENCODE uniform data processing pipelines, new visualization tools, a dataset cart feature, unrestricted public access to ENCODE data on the cloud (Amazon Web Services open data registry, https://registry.opendata.aws/encode-project/) and more comprehensive tutorials and documentation.


Assuntos
DNA/genética , Bases de Dados Genéticas , Genoma Humano , Software , Animais , Genômica , Humanos , Camundongos
5.
Nucleic Acids Res ; 46(D1): D794-D801, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29126249

RESUMO

The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center has developed the ENCODE Portal database and website as the source for the data and metadata generated by the ENCODE Consortium. Two principles have motivated the design. First, experimental protocols, analytical procedures and the data themselves should be made publicly accessible through a coherent, web-based search and download interface. Second, the same interface should serve carefully curated metadata that record the provenance of the data and justify its interpretation in biological terms. Since its initial release in 2013 and in response to recommendations from consortium members and the wider community of scientists who use the Portal to access ENCODE data, the Portal has been regularly updated to better reflect these design principles. Here we report on these updates, including results from new experiments, uniformly-processed data from other projects, new visualization tools and more comprehensive metadata to describe experiments and analyses. Additionally, the Portal is now home to meta(data) from related projects including Genomics of Gene Regulation, Roadmap Epigenome Project, Model organism ENCODE (modENCODE) and modERN. The Portal now makes available over 13000 datasets and their accompanying metadata and can be accessed at: https://www.encodeproject.org/.


Assuntos
DNA/genética , Bases de Dados Genéticas , Componentes do Gene , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Metadados , Animais , Caenorhabditis elegans/genética , Apresentação de Dados , Conjuntos de Dados como Assunto , Drosophila melanogaster/genética , Previsões , Genoma Humano , Humanos , Camundongos/genética , Interface Usuário-Computador
6.
Nucleic Acids Res ; 44(D1): D726-32, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26527727

RESUMO

The Encyclopedia of DNA Elements (ENCODE) Project is in its third phase of creating a comprehensive catalog of functional elements in the human genome. This phase of the project includes an expansion of assays that measure diverse RNA populations, identify proteins that interact with RNA and DNA, probe regions of DNA hypersensitivity, and measure levels of DNA methylation in a wide range of cell and tissue types to identify putative regulatory elements. To date, results for almost 5000 experiments have been released for use by the scientific community. These data are available for searching, visualization and download at the new ENCODE Portal (www.encodeproject.org). The revamped ENCODE Portal provides new ways to browse and search the ENCODE data based on the metadata that describe the assays as well as summaries of the assays that focus on data provenance. In addition, it is a flexible platform that allows integration of genomic data from multiple projects. The portal experience was designed to improve access to ENCODE data by relying on metadata that allow reusability and reproducibility of the experiments.


Assuntos
Bases de Dados Genéticas , Genoma Humano , Genômica , Animais , DNA/metabolismo , Genes , Humanos , Camundongos , Proteínas/metabolismo , RNA/metabolismo
7.
BMC Genomics ; 17: 274, 2016 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-27036078

RESUMO

BACKGROUND: Identification of locus-locus contacts at the chromatin level provides a valuable foundation for understanding of nuclear architecture and function and a valuable tool for inferring long-range linkage relationships. As one approach to this, chromatin conformation capture-based techniques allow creation of genome spatial organization maps. While such approaches have been available for some time, methodological advances will be of considerable use in minimizing both time and input material required for successful application. RESULTS: Here we report a modified tethered conformation capture protocol that utilizes a series of rapid and efficient molecular manipulations. We applied the method to Caenorhabditis elegans, obtaining chromatin interaction maps that provide a sequence-anchored delineation of salient aspects of Caenorhabditis elegans chromosome structure, demonstrating a high level of consistency in overall chromosome organization between biological samples collected under different conditions. In addition to the application of the method to defining nuclear architecture, we found the resulting chromatin interaction maps to be of sufficient resolution and sensitivity to enable detection of large-scale structural variants such as inversions or translocations. CONCLUSION: Our streamlined protocol provides an accelerated, robust, and broadly applicable means of generating chromatin spatial organization maps and detecting genome rearrangements without a need for cellular or chromatin fractionation.


Assuntos
Caenorhabditis elegans/genética , Cromatina/genética , Mapeamento Cromossômico/métodos , Cromossomos/genética , Animais
8.
Nucleic Acids Res ; 39(Web Server issue): W92-9, 2011 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-21478166

RESUMO

RNA mutational analysis at the secondary-structure level can be useful to a wide-range of biological applications. It can be used to predict an optimal site for performing a nucleotide mutation at the single molecular level, as well as to analyze basic phenomena at the systems level. For the former, as more sequence modification experiments are performed that include site-directed mutagenesis to find and explore functional motifs in RNAs, a pre-processing step that helps guide in planning the experiment becomes vital. For the latter, mutations are generally accepted as a central mechanism by which evolution occurs, and mutational analysis relating to structure should gain a better understanding of system functionality and evolution. In the past several years, the program RNAmute that is structure based and relies on RNA secondary-structure prediction has been developed for assisting in RNA mutational analysis. It has been extended from single-point mutations to treat multiple-point mutations efficiently by initially calculating all suboptimal solutions, after which only the mutations that stabilize the suboptimal solutions and destabilize the optimal one are considered as candidates for being deleterious. The RNAmute web server for mutational analysis is available at http://www.cs.bgu.ac.il/~xrnamute/XRNAmute.


Assuntos
Mutação , RNA/química , Software , Conformação de Ácido Nucleico , Análise de Sequência de RNA , Interface Usuário-Computador
9.
Res Sq ; 2023 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-37503119

RESUMO

The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.

10.
bioRxiv ; 2023 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-37066421

RESUMO

The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.

11.
bioRxiv ; 2023 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-37292896

RESUMO

The majority of mammalian genes encode multiple transcript isoforms that result from differential promoter use, changes in exonic splicing, and alternative 3' end choice. Detecting and quantifying transcript isoforms across tissues, cell types, and species has been extremely challenging because transcripts are much longer than the short reads normally used for RNA-seq. By contrast, long-read RNA-seq (LR-RNA-seq) gives the complete structure of most transcripts. We sequenced 264 LR-RNA-seq PacBio libraries totaling over 1 billion circular consensus reads (CCS) for 81 unique human and mouse samples. We detect at least one full-length transcript from 87.7% of annotated human protein coding genes and a total of 200,000 full-length transcripts, 40% of which have novel exon junction chains. To capture and compute on the three sources of transcript structure diversity, we introduce a gene and transcript annotation framework that uses triplets representing the transcript start site, exon junction chain, and transcript end site of each transcript. Using triplets in a simplex representation demonstrates how promoter selection, splice pattern, and 3' processing are deployed across human tissues, with nearly half of multi-transcript protein coding genes showing a clear bias toward one of the three diversity mechanisms. Evaluated across samples, the predominantly expressed transcript changes for 74% of protein coding genes. In evolution, the human and mouse transcriptomes are globally similar in types of transcript structure diversity, yet among individual orthologous gene pairs, more than half (57.8%) show substantial differences in mechanism of diversification in matching tissues. This initial large-scale survey of human and mouse long-read transcriptomes provides a foundation for further analyses of alternative transcript usage, and is complemented by short-read and microRNA data on the same samples and by epigenome data elsewhere in the ENCODE4 collection.

12.
RNA ; 16(2): 364-74, 2010 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-20040590

RESUMO

Heat shock proteins (HSPs) provide a useful system for studying developmental patterns in the digenetic Leishmania parasites, since their expression is induced in the mammalian life form. Translation regulation plays a key role in control of protein coding genes in trypanosomatids, and is directed exclusively by elements in the 3' untranslated region (UTR). Using sequential deletions of the Leishmania Hsp83 3' UTR (888 nucleotides [nt]), we mapped a region of 150 nt that was required, but not sufficient for preferential translation of a reporter gene at mammalian-like temperatures, suggesting that changes in RNA structure could be involved. An advanced bioinformatics package for prediction of RNA folding (UNAfold) marked the regulatory region on a highly probable structural arm that includes a polypyrimidine tract (PPT). Mutagenesis of this PPT abrogated completely preferential translation of the fused reporter gene. Furthermore, temperature elevation caused the regulatory region to melt more extensively than the same region that lacked the PPT. We propose that at elevated temperatures the regulatory element in the 3' UTR is more accessible to mediators that promote its interaction with the basal translation components at the 5' end during mRNA circularization. Translation initiation of Hsp83 at all temperatures appears to proceed via scanning of the 5' UTR, since a hairpin structure abolishes expression of a fused reporter gene.


Assuntos
Proteínas de Choque Térmico/genética , Proteínas de Choque Térmico/metabolismo , Leishmania/genética , Leishmania/metabolismo , Proteínas de Protozoários/genética , Proteínas de Protozoários/metabolismo , RNA de Protozoário/genética , RNA de Protozoário/metabolismo , Regiões 3' não Traduzidas , Regiões 5' não Traduzidas , Animais , Animais Geneticamente Modificados , Sequência de Bases , Primers do DNA/genética , Genes Reporter , Leishmania mexicana/genética , Leishmania mexicana/metabolismo , Modelos Moleculares , Conformação de Ácido Nucleico , Biossíntese de Proteínas , RNA Mensageiro/química , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , RNA de Protozoário/química , Temperatura
13.
Bioinformatics ; 26(6): 845-6, 2010 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-20106816

RESUMO

SUMMARY: The DNA in eukaryotic cells is packed into the chromatin that is composed of nucleosomes. Positioning of the nucleosome core particles on the sequence is a problem of great interest because of the role nucleosomes play in different cellular processes including gene regulation. Using the sequence structure of 10.4 base DNA repeat presented in our previous works and nucleosome core DNA sequences database, we have derived the complete nucleosome DNA bendability matrix of Caenorhabditis elegans. We have developed a web server named FineStr that allows users to upload genomic sequences in FASTA format and to perform a single-base-resolution nucleosome mapping on them. AVAILABILITY: FineStr server is freely available for use on the web at http:/www.cs.bgu.ac.il/ approximately nucleom. The site contains a help file with explanation regarding the exact usage. CONTACT: gabdank@cs.bgu.ac.il.


Assuntos
Nucleossomos/química , Software , Animais , Caenorhabditis elegans/genética , Cromatina/metabolismo , Internet , Análise de Sequência de DNA
14.
RNA Biol ; 7(1): 90-7, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20061789

RESUMO

Energy minimization methods for RNA secondary structure prediction have been used extensively for studying a variety of biological systems. Here, we demonstrate their applicability in riboswitch studies, exemplified in both the expression platform and aptamer domains. In the expression platform domain, energy minimization methods can be used to predict in silico a unique point mutation positioned in the non-conserved region of the TPP riboswitch that will transform it from a termination to an anti-termination state, thus backing the prediction experimentally. Furthermore, a successive prediction can be made for a compensatory mutation that is positioned over half the sequence length of the riboswitch from the original mutation and that completely overturns the anti-termination effect of the original mutation. This approach can be used to computationally predict rational modifications in riboswitches for both research and practical applications. In the aptamer domain, energy minimization methods can be used when attempting to detect a novel purine riboswitch in eukaryotes based on the consensus sequence and structure of the bacterial guanine binding aptamer. In the process, some interesting candidates are identified, and although they are attractive enough to be tested experimentally, they are not detectable by sequence based methods alone. These brief examples represent the important lessons to be learned as to the strengths and limitations of energy minimization methods. In light of our growing knowledge in the energy minimization field, future challenges can be advanced for the rational design of known riboswitches and the detection of novel riboswitches. Unlike analyses of specific cases, it is stressed that all the results described here are predictive in scope with direct applicability and an attempt to validate the predictions experimentally.


Assuntos
Biologia Computacional/métodos , Sequências Reguladoras de Ácido Ribonucleico/genética , Sequência de Bases , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Mutação Puntual/genética , Termodinâmica , Tiamina Pirofosfato/metabolismo
15.
Curr Protoc Bioinformatics ; 68(1): e89, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31751002

RESUMO

The Encyclopedia of DNA Elements (ENCODE) web portal hosts genomic data generated by the ENCODE Consortium, Genomics of Gene Regulation, The NIH Roadmap Epigenomics Consortium, and the modENCODE and modERN projects. The goal of the ENCODE project is to build a comprehensive map of the functional elements of the human and mouse genomes. Currently, the portal database stores over 500 TB of raw and processed data from over 15,000 experiments spanning assays that measure gene expression, DNA accessibility, DNA and RNA binding, DNA methylation, and 3D chromatin structure across numerous cell lines, tissue types, and differentiation states with selected genetic and molecular perturbations. The ENCODE portal provides unrestricted access to the aforementioned data and relevant metadata as a service to the scientific community. The metadata model captures the details of the experiments, raw and processed data files, and processing pipelines in human and machine-readable form and enables the user to search for specific data either using a web browser or programmatically via REST API. Furthermore, ENCODE data can be freely visualized or downloaded for additional analyses. © 2019 The Authors. Basic Protocol: Query the portal Support Protocol 1: Batch downloading Support Protocol 2: Using the cart to download files Support Protocol 3: Visualize data Alternate Protocol: Query building and programmatic access.


Assuntos
Cromatina/metabolismo , DNA/genética , Bases de Dados Genéticas , Epigenômica/métodos , Animais , Metilação de DNA , Genoma Humano , Humanos , Internet , Metadados , Camundongos , Software
17.
IEEE Trans Nanobioscience ; 6(1): 4-11, 2007 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-17393844

RESUMO

The discovery of natural RNA sensors that respond to a change in the environment by a conformational switch can be utilized for various biotechnological and nanobiotechnological advances. One class of RNA sensors is the riboswitch: an RNA genetic control element that is capable of sensing small molecules, responding to a deviation in ligand concentration with a structural change. Riboswitches are modularly built from smaller components. Computational methods can potentially be utilized in assembling these building block components and offering improvements in the biochemical design process. We describe a computational procedure to design RNA switches from building blocks with favorable properties. To achieve maximal throughput for genetic control purposes, future designer RNA switches can be assembled based on a computerized preprocessing buildup of the constituent domains, namely the aptamer and the expression platform in the case of a synthetic riboswitch. Conformational switching is enabled by the RNA versatility to possess two highly stable states that are energetically close to each other but topologically distinct, separated by an energy barrier between them. Initially, computer simulations can produce a list of short sequences that switch between two conformers when trigerred by point mutations or temperature. The short sequences should possess an additional desirable property; when these selected small RNA switch segments are attached to various aptamers, the ligand binding mechanism should replace the aforementioned event triggers, which will no longer be effective for crossing the energy barrier. In the assembled RNA sequence, energy minimization folding predictions should then show no difference between the folded structure of the entire sequence relative to the folded structure of each of its constituents. Moreover, energy minimization methods applied on the entire sequence could aid at this preprocessing stage by exhibiting high mutational robustness to capture the stability of the formed hairpin in the expression platform. The above computer-assisted assembly procedure together with application specific considerations may further be tailored for therapeutic gene regulation. Index Terms-Design of RNA switches, energy minimization methods, RNA folding predictions.


Assuntos
Técnicas Biossensoriais/instrumentação , Desenho Assistido por Computador , Computadores Moleculares , RNA não Traduzido/química , Processamento de Sinais Assistido por Computador/instrumentação , Termografia/instrumentação , Transdutores , Técnicas Biossensoriais/métodos , Desenho de Equipamento , Análise de Falha de Equipamento , Termografia/métodos
18.
G3 (Bethesda) ; 7(10): 3295-3303, 2017 10 05.
Artigo em Inglês | MEDLINE | ID: mdl-28801508

RESUMO

Investigations aimed at defining the 3D configuration of eukaryotic chromosomes have consistently encountered an endogenous population of chromosome-derived circular genomic DNA, referred to as extrachromosomal circular DNA (eccDNA). While the production, distribution, and activities of eccDNAs remain understudied, eccDNA formation from specific regions of the linear genome has profound consequences on the regulatory and coding capabilities for these regions. Here, we define eccDNA distributions in Caenorhabditis elegans and in three human cell types, utilizing a set of DNA topology-dependent approaches for enrichment and characterization. The use of parallel biophysical, enzymatic, and informatic approaches provides a comprehensive profiling of eccDNA robust to isolation and analysis methodology. Results in human and nematode systems provide quantitative analysis of the eccDNA loci at both unique and repetitive regions. Our studies converge on and support a consistent picture, in which endogenous genomic DNA circles are present in normal physiological states, and in which the circles come from both coding and noncoding genomic regions. Prominent among the coding regions generating DNA circles are several genes known to produce a diversity of protein isoforms, with mucin proteins and titin as specific examples.


Assuntos
Caenorhabditis elegans/genética , DNA Circular , Animais , Linhagem Celular , Fibroblastos , Granulócitos , Humanos , Masculino , Espermatócitos
19.
PLoS One ; 12(4): e0175310, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28403240

RESUMO

The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a comprehensive catalog of functional elements initiated shortly after the completion of the Human Genome Project. The current database exceeds 6500 experiments across more than 450 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the H. sapiens and M. musculus genomes. All ENCODE experimental data, metadata, and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage, unified processing, and distribution to community resources and the scientific community. As the volume of data increases, the identification and organization of experimental details becomes increasingly intricate and demands careful curation. The ENCODE DCC has created a general purpose software system, known as SnoVault, that supports metadata and file submission, a database used for metadata storage, web pages for displaying the metadata and a robust API for querying the metadata. The software is fully open-source, code and installation instructions can be found at: http://github.com/ENCODE-DCC/snovault/ (for the generic database) and http://github.com/ENCODE-DCC/encoded/ to store genomic data in the manner of ENCODE. The core database engine, SnoVault (which is completely independent of ENCODE, genomic data, or bioinformatic data) has been released as a separate Python package.


Assuntos
Bases de Dados Genéticas , Genômica/métodos , Metadados , Software , Animais , DNA/genética , Genoma , Humanos , Camundongos
20.
J Biomol Struct Dyn ; 24(2): 163-9, 2006 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-16928139

RESUMO

From recent developments of the early evolution theory it follows that the earliest mRNAs were short ( approximately 20 nt) (G+C)-rich polynucleotides. These short sequences could form hairpins, which would be of high evolutionary advantage because of stability and uniqueness of their conformations. Due to mutations accumulated during billions of years of evolution, the speculated earliest hairpins would largely lose the initial complementarities. Some of the original complementary base-to-base contacts, however, may have survived. Computational analysis of modern prokaryotic mRNA sequences reveals excess population of the expected short range complementarities. The derived earliest mRNA hairpin size fully corresponds to the predicted size of ancient coding duplexes. The repertoire of the surviving hairpins traced in modern mRNA confirms duplex structure of the earliest mRNA, suggested by the early molecular evolution theory.


Assuntos
Evolução Molecular , Conformação de Ácido Nucleico , RNA Mensageiro/química , RNA Mensageiro/genética , Bactérias/química , Bactérias/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA