Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
Nature ; 478(7370): 476-82, 2011 Oct 12.
Artículo en Inglés | MEDLINE | ID: mdl-21993624

RESUMEN

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.


Asunto(s)
Evolución Molecular , Genoma Humano/genética , Genoma/genética , Mamíferos/genética , Animales , Enfermedad , Exones/genética , Genómica , Salud , Humanos , Anotación de Secuencia Molecular , Filogenia , ARN/clasificación , ARN/genética , Selección Genética/genética , Alineación de Secuencia , Análisis de Secuencia de ADN
2.
Nature ; 447(7141): 167-77, 2007 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-17495919

RESUMEN

We report a high-quality draft of the genome sequence of the grey, short-tailed opossum (Monodelphis domestica). As the first metatherian ('marsupial') species to be sequenced, the opossum provides a unique perspective on the organization and evolution of mammalian genomes. Distinctive features of the opossum chromosomes provide support for recent theories about genome evolution and function, including a strong influence of biased gene conversion on nucleotide sequence composition, and a relationship between chromosomal characteristics and X chromosome inactivation. Comparison of opossum and eutherian genomes also reveals a sharp difference in evolutionary innovation between protein-coding and non-coding functional elements. True innovation in protein-coding genes seems to be relatively rare, with lineage-specific differences being largely due to diversification and rapid turnover in gene families involved in environmental interactions. In contrast, about 20% of eutherian conserved non-coding elements (CNEs) are recent inventions that postdate the divergence of Eutheria and Metatheria. A substantial proportion of these eutherian-specific CNEs arose from sequence inserted by transposable elements, pointing to transposons as a major creative force in the evolution of mammalian gene regulation.


Asunto(s)
Evolución Molecular , Genoma/genética , Genómica , Zarigüeyas/genética , Animales , Composición de Base , Secuencia Conservada/genética , Elementos Transponibles de ADN/genética , Humanos , Polimorfismo de Nucleótido Simple/genética , Biosíntesis de Proteínas , Sintenía/genética , Inactivación del Cromosoma X/genética
3.
Nature ; 438(7069): 803-19, 2005 Dec 08.
Artículo en Inglés | MEDLINE | ID: mdl-16341006

RESUMEN

Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health.


Asunto(s)
Perros/genética , Evolución Molecular , Genoma/genética , Genómica , Haplotipos/genética , Animales , Secuencia Conservada/genética , Enfermedades de los Perros/genética , Perros/clasificación , Femenino , Humanos , Hibridación Genética , Masculino , Ratones , Mutagénesis/genética , Polimorfismo de Nucleótido Simple/genética , Ratas , Elementos de Nucleótido Esparcido Corto/genética , Sintenía/genética
4.
Proc Natl Acad Sci U S A ; 104(49): 19428-33, 2007 Dec 04.
Artículo en Inglés | MEDLINE | ID: mdl-18040051

RESUMEN

Although the Human Genome Project was completed 4 years ago, the catalog of human protein-coding genes remains a matter of controversy. Current catalogs list a total of approximately 24,500 putative protein-coding genes. It is broadly suspected that a large fraction of these entries are functionally meaningless ORFs present by chance in RNA transcripts, because they show no evidence of evolutionary conservation with mouse or dog. However, there is currently no scientific justification for excluding ORFs simply because they fail to show evolutionary conservation: the alternative hypothesis is that most of these ORFs are actually valid human genes that reflect gene innovation in the primate lineage or gene loss in the other lineages. Here, we reject this hypothesis by carefully analyzing the nonconserved ORFs-specifically, their properties in other primates. We show that the vast majority of these ORFs are random occurrences. The analysis yields, as a by-product, a major revision of the current human catalogs, cutting the number of protein-coding genes to approximately 20,500. Specifically, it suggests that nonconserved ORFs should be added to the human gene catalog only if there is clear evidence of an encoded protein. It also provides a principled methodology for evaluating future proposed additions to the human gene catalog. Finally, the results indicate that there has been relatively little true innovation in mammalian protein-coding genes.


Asunto(s)
Código Genético , Genoma Humano/genética , Genómica , Sistemas de Lectura Abierta/genética , Proteínas/genética , Animales , Secuencia de Bases , Elementos Transponibles de ADN/genética , Perros , Genes/genética , Humanos , Ratones , Datos de Secuencia Molecular , Seudogenes/genética , Análisis de Secuencia de ADN
5.
J Am Med Inform Assoc ; 25(3): 267-274, 2018 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-29040639

RESUMEN

OBJECTIVE: We describe a detailed solution for maintaining high-capacity, data-intensive network flows (eg, 10, 40, 100 Gbps+) in a scientific, medical context while still adhering to security and privacy laws and regulations. MATERIALS AND METHODS: High-end networking, packet-filter firewalls, network intrusion-detection systems. RESULTS: We describe a "Medical Science DMZ" concept as an option for secure, high-volume transport of large, sensitive datasets between research institutions over national research networks, and give 3 detailed descriptions of implemented Medical Science DMZs. DISCUSSION: The exponentially increasing amounts of "omics" data, high-quality imaging, and other rapidly growing clinical datasets have resulted in the rise of biomedical research "Big Data." The storage, analysis, and network resources required to process these data and integrate them into patient diagnoses and treatments have grown to scales that strain the capabilities of academic health centers. Some data are not generated locally and cannot be sustained locally, and shared data repositories such as those provided by the National Library of Medicine, the National Cancer Institute, and international partners such as the European Bioinformatics Institute are rapidly growing. The ability to store and compute using these data must therefore be addressed by a combination of local, national, and industry resources that exchange large datasets. Maintaining data-intensive flows that comply with the Health Insurance Portability and Accountability Act (HIPAA) and other regulations presents a new challenge for biomedical research. We describe a strategy that marries performance and security by borrowing from and redefining the concept of a Science DMZ, a framework that is used in physical sciences and engineering research to manage high-capacity data flows. CONCLUSION: By implementing a Medical Science DMZ architecture, biomedical researchers can leverage the scale provided by high-performance computer and cloud storage facilities and national high-speed research networks while preserving privacy and meeting regulatory requirements.

6.
J Med Chem ; 60(9): 3594-3605, 2017 05 11.
Artículo en Inglés | MEDLINE | ID: mdl-28252959

RESUMEN

Miniaturization and parallel processing play an important role in the evolution of many technologies. We demonstrate the application of miniaturized high-throughput experimentation methods to resolve synthetic chemistry challenges on the frontlines of a lead optimization effort to develop diacylglycerol acyltransferase (DGAT1) inhibitors. Reactions were performed on ∼1 mg scale using glass microvials providing a miniaturized high-throughput experimentation capability that was used to study a challenging SNAr reaction. The availability of robust synthetic chemistry conditions discovered in these miniaturized investigations enabled the development of structure-activity relationships that ultimately led to the discovery of soluble, selective, and potent inhibitors of DGAT1.


Asunto(s)
Diacilglicerol O-Acetiltransferasa/antagonistas & inhibidores , Inhibidores Enzimáticos/química , Inhibidores Enzimáticos/farmacología , Cromatografía Liquida , Espectrometría de Masas , Espectroscopía de Protones por Resonancia Magnética
7.
J Am Med Inform Assoc ; 23(6): 1199-1201, 2016 11.
Artículo en Inglés | MEDLINE | ID: mdl-27136944

RESUMEN

OBJECTIVE: We describe use cases and an institutional reference architecture for maintaining high-capacity, data-intensive network flows (e.g., 10, 40, 100 Gbps+) in a scientific, medical context while still adhering to security and privacy laws and regulations. MATERIALS AND METHODS: High-end networking, packet filter firewalls, network intrusion detection systems. RESULTS: We describe a "Medical Science DMZ" concept as an option for secure, high-volume transport of large, sensitive data sets between research institutions over national research networks. DISCUSSION: The exponentially increasing amounts of "omics" data, the rapid increase of high-quality imaging, and other rapidly growing clinical data sets have resulted in the rise of biomedical research "big data." The storage, analysis, and network resources required to process these data and integrate them into patient diagnoses and treatments have grown to scales that strain the capabilities of academic health centers. Some data are not generated locally and cannot be sustained locally, and shared data repositories such as those provided by the National Library of Medicine, the National Cancer Institute, and international partners such as the European Bioinformatics Institute are rapidly growing. The ability to store and compute using these data must therefore be addressed by a combination of local, national, and industry resources that exchange large data sets. Maintaining data-intensive flows that comply with HIPAA and other regulations presents a new challenge for biomedical research. Recognizing this, we describe a strategy that marries performance and security by borrowing from and redefining the concept of a "Science DMZ"-a framework that is used in physical sciences and engineering research to manage high-capacity data flows. CONCLUSION: By implementing a Medical Science DMZ architecture, biomedical researchers can leverage the scale provided by high-performance computer and cloud storage facilities and national high-speed research networks while preserving privacy and meeting regulatory requirements.


Asunto(s)
Redes de Comunicación de Computadores , Seguridad Computacional , Metodologías Computacionales , Seguridad Computacional/legislación & jurisprudencia , Confidencialidad/legislación & jurisprudencia , Regulación Gubernamental , Health Insurance Portability and Accountability Act , Sistemas de Registros Médicos Computarizados/legislación & jurisprudencia , Estados Unidos
8.
J Med Chem ; 57(2): 477-94, 2014 Jan 23.
Artículo en Inglés | MEDLINE | ID: mdl-24383452

RESUMEN

Systematic methods that speed-up the assignment of absolute configuration using vibrational circular dichrosim (VCD) and simplify its usage will advance this technique into a robust platform technology. Applying VCD to pharmaceutically relevant compounds has been handled in an ad hoc fashion, relying on fragment analysis and technical shortcuts to reduce the computational time required. We leverage a large computational infrastructure to provide adequate conformational exploration which enables an accurate assignment of absolute configuration. We describe a systematic approach for rapid calculation of VCD/IR spectra and comparison with corresponding measured spectra and apply this approach to assign the correct stereochemistry of nine test cases. We suggest moving away from the fragment approach when making VCD assignments. In addition to enabling faster and more reliable VCD assignments of absolute configuration, the ability to rapidly explore conformational space and sample conformations of complex molecules will have applicability in other areas of drug discovery.


Asunto(s)
Dicroismo Circular/métodos , Conformación Molecular , Preparaciones Farmacéuticas/química , Alquinos , Aprepitant , Azetidinas/química , Benzoxazinas/química , Alcanfor/química , Biología Computacional , Monoterpenos Ciclohexánicos , Ciclopropanos , Descubrimiento de Drogas/métodos , Ezetimiba , Ibuprofeno/química , Monoterpenos/química , Morfolinas/química , Teoría Cuántica , Simvastatina/química , Distribuciones Estadísticas , Estereoisomerismo
9.
Org Lett ; 11(15): 3194-7, 2009 Aug 06.
Artículo en Inglés | MEDLINE | ID: mdl-19572567

RESUMEN

Treatment of omega-epoxynitriles with hydroxylamine affords cyclic aminonitrones in a single step and with high stereoselectivity. The scope of this novel transformation was explored in a series of examples. The aminonitrone products were shown to be useful substrates for further selective elaboration.


Asunto(s)
Inhibidores de Integrasa VIH/química , Pirimidinonas/química , Cristalografía por Rayos X , Ciclización , Diseño de Fármacos , Inhibidores de Integrasa VIH/síntesis química , Estructura Molecular , Pirimidinonas/síntesis química , Pirrolidinonas/química , Raltegravir Potásico
10.
Genome Res ; 17(6): 760-74, 2007 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-17567995

RESUMEN

A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy). We describe the quantitative and qualitative trade-offs concomitant with alignment method choice and the levels of technical error that need to be accounted for in applications that require multisequence alignments. Using the generated alignments, we identified constrained regions using three different methods. While the different constraint-detecting methods are in general agreement, there are important discrepancies relating to both the underlying alignments and the specific algorithms. However, by integrating the results across the alignments and constraint-detecting methods, we produced constraint annotations that were found to be robust based on multiple independent measures. Analyses of these annotations illustrate that most classes of experimentally annotated functional elements are enriched for constrained sequences; however, large portions of each class (with the exception of protein-coding sequences) do not overlap constrained regions. The latter elements might not be under primary sequence constraint, might not be constrained across all mammals, or might have expendable molecular functions. Conversely, 40% of the constrained sequences do not overlap any of the functional elements that have been experimentally identified. Together, these findings demonstrate and quantify how many genomic functional elements await basic molecular characterization.


Asunto(s)
Evolución Molecular , Genoma Humano , Mamíferos/genética , Sistemas de Lectura Abierta , Filogenia , Alineación de Secuencia , Animales , Proyecto Genoma Humano , Humanos
11.
Cell ; 125(2): 315-26, 2006 Apr 21.
Artículo en Inglés | MEDLINE | ID: mdl-16630819

RESUMEN

The most highly conserved noncoding elements (HCNEs) in mammalian genomes cluster within regions enriched for genes encoding developmentally important transcription factors (TFs). This suggests that HCNE-rich regions may contain key regulatory controls involved in development. We explored this by examining histone methylation in mouse embryonic stem (ES) cells across 56 large HCNE-rich loci. We identified a specific modification pattern, termed "bivalent domains," consisting of large regions of H3 lysine 27 methylation harboring smaller regions of H3 lysine 4 methylation. Bivalent domains tend to coincide with TF genes expressed at low levels. We propose that bivalent domains silence developmental genes in ES cells while keeping them poised for activation. We also found striking correspondences between genome sequence and histone methylation in ES cells, which become notably weaker in differentiated cells. These results highlight the importance of DNA sequence in defining the initial epigenetic landscape and suggest a novel chromatin-based mechanism for maintaining pluripotency.


Asunto(s)
Cromatina/química , Regulación del Desarrollo de la Expresión Génica , Histonas/metabolismo , Conformación de Ácido Nucleico , Células Madre/fisiología , Animales , Diferenciación Celular , Células Cultivadas , Cromatina/metabolismo , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo , Epigénesis Genética , Perfilación de la Expresión Génica , Histonas/química , Proteínas de Homeodominio/genética , Proteínas de Homeodominio/metabolismo , Masculino , Metilación , Ratones , Ratones Endogámicos C57BL , Proteína Homeótica Nanog , Factor 3 de Transcripción de Unión a Octámeros/genética , Factor 3 de Transcripción de Unión a Octámeros/metabolismo , Análisis de Secuencia por Matrices de Oligonucleótidos , Células Madre/citología
12.
Bioinformatics ; 20(3): 426-7, 2004 Feb 12.
Artículo en Inglés | MEDLINE | ID: mdl-14960472

RESUMEN

Multiple sequence alignment remains a crucial method for understanding the function of groups of related nucleic acid and protein sequences. However, it is known that automatic multiple sequence alignments can often be improved by manual editing. Therefore, tools are needed to view and edit multiple sequence alignments. Due to growth in the sequence databases, multiple sequence alignments can often be large and difficult to view efficiently. The Jalview Java alignment editor is presented here, which enables fast viewing and editing of large multiple sequence alignments.


Asunto(s)
Documentación , Hipermedia , Almacenamiento y Recuperación de la Información/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Interfaz Usuario-Computador , Algoritmos , Sistemas de Administración de Bases de Datos , Procesamiento de Texto
13.
Genome Res ; 14(5): 971-5, 2004 May.
Artículo en Inglés | MEDLINE | ID: mdl-15123594

RESUMEN

Ensembl is a software project to automatically annotate large eukaryotic genomes and release them freely into the public domain. The project currently automatically annotates 10 complete genomes. This makes very large demands on compute resources, due to the vast number of sequence comparisons that need to be executed. To circumvent the financial outlay often associated with classical supercomputing environments, farms of multiple, lower-cost machines have now become the norm and have been deployed successfully with this project. The architecture and design of farms containing hundreds of compute nodes is complex and nontrivial to implement. This study will define and explain some of the essential elements to consider when designing such systems. Server architecture and network infrastructure are discussed with a particular emphasis on solutions that worked and those that did not (often with fairly spectacular consequences). The aim of the study is to give the reader, who may be implementing a large-scale biocompute project, an insight into some of the pitfalls that may be waiting ahead.


Asunto(s)
Biología Computacional/métodos , Programas Informáticos , Sistemas de Computación , Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Sistemas en Línea , Diseño de Software
14.
Nature ; 420(6915): 520-62, 2002 Dec 05.
Artículo en Inglés | MEDLINE | ID: mdl-12466850

RESUMEN

The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.


Asunto(s)
Cromosomas de los Mamíferos/genética , Evolución Molecular , Genoma , Ratones/genética , Mapeo Físico de Cromosoma , Animales , Composición de Base , Secuencia Conservada/genética , Islas de CpG/genética , Regulación de la Expresión Génica , Genes/genética , Variación Genética/genética , Genoma Humano , Genómica , Humanos , Ratones/clasificación , Ratones Noqueados , Ratones Transgénicos , Modelos Animales , Familia de Multigenes/genética , Mutagénesis , Neoplasias/genética , Proteoma/genética , Seudogenes/genética , Sitios de Carácter Cuantitativo/genética , ARN no Traducido/genética , Secuencias Repetitivas de Ácidos Nucleicos/genética , Selección Genética , Análisis de Secuencia de ADN , Cromosomas Sexuales/genética , Especificidad de la Especie , Sintenía
15.
Genome Res ; 14(5): 925-8, 2004 May.
Artículo en Inglés | MEDLINE | ID: mdl-15078858

RESUMEN

Ensembl (http://www.ensembl.org/) is a bioinformatics project to organize biological information around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of individual genomes, and of the synteny and orthology relationships between them. It is also a framework for integration of any biological data that can be mapped onto features derived from the genomic sequence. Ensembl is available as an interactive Web site, a set of flat files, and as a complete, portable open source software system for handling genomes. All data are provided without restriction, and code is freely available. Ensembl's aims are to continue to "widen" this biological integration to include other model organisms relevant to understanding human biology as they become available; to "deepen" this integration to provide an ever more seamless linkage between equivalent components in different species; and to provide further classification of functional elements in the genome that have been previously elusive.


Asunto(s)
Biología Computacional/tendencias
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA