Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Nucleic Acids Res ; 48(D1): D835-D844, 2020 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-31777943

RESUMEN

ClinVar is a freely available, public archive of human genetic variants and interpretations of their relationships to diseases and other conditions, maintained at the National Institutes of Health (NIH). Submitted interpretations of variants are aggregated and made available on the ClinVar website (https://www.ncbi.nlm.nih.gov/clinvar/), and as downloadable files via FTP and through programmatic tools such as NCBI's E-utilities. The default view on the ClinVar website, the Variation page, was recently redesigned. The new layout includes several new sections that make it easier to find submitted data as well as summary data such as all diseases and citations reported for the variant. The new design also better represents more complex data such as haplotypes and genotypes, as well as variants that are in ClinVar as part of a haplotype or genotype but have no interpretation for the single variant. ClinVar's variant-centric XML had its production release in April 2019. The ClinVar website and E-utilities both have been updated to support the VCV (variation in ClinVar) accession numbers found in the variant-centric XML file. ClinVar's search engine has been fine-tuned for improved retrieval of search results.


Asunto(s)
Bases de Datos Genéticas , Enfermedad/genética , Variación Genética/genética , Genoma Humano , Genómica , Haplotipos , Humanos , Internet , National Library of Medicine (U.S.) , Motor de Búsqueda , Estados Unidos
2.
Bioinformatics ; 36(6): 1902-1907, 2020 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-31738401

RESUMEN

MOTIVATION: Normalizing sequence variants on a reference, projecting them across congruent sequences and aggregating their diverse representations are critical to the elucidation of the genetic basis of disease and biological function. Inconsistent representation of variants among variant callers, local databases and tools result in discrepancies that complicate analysis. NCBI's genetic variation resources, dbSNP and ClinVar, require a robust, scalable set of principles to manage asserted sequence variants. RESULTS: The SPDI data model defines variants as a sequence of four attributes: sequence, position, deletion and insertion, and can be applied to nucleotide and protein variants. NCBI web services convert representations among HGVS, VCF and SPDI and provide two functions to aggregate variants. One, based on the NCBI Variant Overprecision Correction Algorithm, returns a unique, normalized representation termed the 'Contextual Allele'. The SPDI data model, with its four operations, defines exactly the reference subsequence affected by the variant, even in repeat regions, such as homopolymer and other sequence repeats. The second function projects variants across congruent sequences and depends on an alignment dataset of non-assembly NCBI RefSeq sequences (prefixed NM, NR and NG), as well as inter- and intra-assembly-associated genomic sequences (NCs, NTs and NWs), supporting robust projection of variants across congruent sequences and assembly versions. The variant is projected to all congruent Contextual Alleles. One of these Contextual Alleles, typically the allele based on the latest assembly version, represents the entire set, is designated the unique 'Canonical Allele' and is used directly to aggregate variants across congruent sequences. AVAILABILITY AND IMPLEMENTATION: The SPDI services are available for open access at: https://api.ncbi.nlm.nih.gov/variation/v0. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Bases de Datos Genéticas , Genómica , Algoritmos , Genoma , Vocabulario Controlado
3.
Nucleic Acids Res ; 47(D1): D23-D28, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30395293

RESUMEN

The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 38 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. New resources released in the past year include PubMed Labs and a new sequence database search. Resources that were updated in the past year include PubMed, PMC, Bookshelf, genome data viewer, Assembly, prokaryotic genomes, Genome, BioProject, dbSNP, dbVar, BLAST databases, igBLAST, iCn3D and PubChem. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.


Asunto(s)
Biotecnología/organización & administración , Bases de Datos Genéticas , Animales , Biotecnología/métodos , Bases de Datos de Compuestos Químicos , Humanos , Programas Informáticos , Estados Unidos/epidemiología , Navegador Web
4.
Nucleic Acids Res ; 46(D1): D1062-D1067, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29165669

RESUMEN

ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/) is a freely available, public archive of human genetic variants and interpretations of their significance to disease, maintained at the National Institutes of Health. Interpretations of the clinical significance of variants are submitted by clinical testing laboratories, research laboratories, expert panels and other groups. ClinVar aggregates data by variant-disease pairs, and by variant (or set of variants). Data aggregated by variant are accessible on the website, in an improved set of variant call format files and as a new comprehensive XML report. ClinVar recently started accepting submissions that are focused primarily on providing phenotypic information for individuals who have had genetic testing. Submissions may come from clinical providers providing their own interpretation of the variant ('provider interpretation') or from groups such as patient registries that primarily provide phenotypic information from patients ('phenotyping only'). ClinVar continues to make improvements to its search and retrieval functions. Several new fields are now indexed for more precise searching, and filters allow the user to narrow down a large set of search results.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Enfermedad/genética , Variación Genética , Humanos , Fenotipo
5.
Proc Natl Acad Sci U S A ; 113(30): E4276-85, 2016 07 26.
Artículo en Inglés | MEDLINE | ID: mdl-27402764

RESUMEN

The genetic information in mammalian mitochondrial DNA is densely packed; there are no introns and only one sizeable noncoding, or control, region containing key cis-elements for its replication and expression. Many molecules of mitochondrial DNA bear a third strand of DNA, known as "7S DNA," which forms a displacement (D-) loop in the control region. Here we show that many other molecules contain RNA as a third strand. The RNA of these R-loops maps to the control region of the mitochondrial DNA and is complementary to 7S DNA. Ribonuclease H1 is essential for mitochondrial DNA replication; it degrades RNA hybridized to DNA, so the R-loop is a potential substrate. In cells with a pathological variant of ribonuclease H1 associated with mitochondrial disease, R-loops are of low abundance, and there is mitochondrial DNA aggregation. These findings implicate ribonuclease H1 and RNA in the physical segregation of mitochondrial DNA, perturbation of which represents a previously unidentified disease mechanism.


Asunto(s)
ADN Mitocondrial/genética , Mitocondrias/genética , Mutación , Ribonucleasa H/genética , Animales , Línea Celular Tumoral , Células Cultivadas , Replicación del ADN , ADN Mitocondrial/química , ADN Mitocondrial/metabolismo , Femenino , Células HEK293 , Humanos , Masculino , Ratones , Mitocondrias/metabolismo , Enfermedades Mitocondriales/genética , Enfermedades Mitocondriales/metabolismo , Conformación de Ácido Nucleico , Ribonucleasa H/metabolismo
6.
Proc Natl Acad Sci U S A ; 112(30): 9334-9, 2015 Jul 28.
Artículo en Inglés | MEDLINE | ID: mdl-26162680

RESUMEN

Encoding ribonuclease H1 (RNase H1) degrades RNA hybridized to DNA, and its function is essential for mitochondrial DNA maintenance in the developing mouse. Here we define the role of RNase H1 in mitochondrial DNA replication. Analysis of replicating mitochondrial DNA in embryonic fibroblasts lacking RNase H1 reveals retention of three primers in the major noncoding region (NCR) and one at the prominent lagging-strand initiation site termed Ori-L. Primer retention does not lead immediately to depletion, as the persistent RNA is fully incorporated in mitochondrial DNA. However, the retained primers present an obstacle to the mitochondrial DNA polymerase γ in subsequent rounds of replication and lead to the catastrophic generation of a double-strand break at the origin when the resulting gapped molecules are copied. Hence, the essential role of RNase H1 in mitochondrial DNA replication is the removal of primers at the origin of replication.


Asunto(s)
Cartilla de ADN/química , Replicación del ADN , ADN Mitocondrial/química , Ribonucleasa H/química , Animales , Línea Celular , ADN/química , Exones , Fibroblastos/metabolismo , Genotipo , Homocigoto , Ratones , Ratones Noqueados , Mitocondrias/metabolismo , Nucleótidos/química , ARN/química , ARN Mitocondrial , Origen de Réplica
7.
G3 (Bethesda) ; 9(8): 2447-2461, 2019 08 08.
Artículo en Inglés | MEDLINE | ID: mdl-31151998

RESUMEN

Inferring subject ancestry using genetic data is an important step in genetic association studies, required for dealing with population stratification. It has become more challenging to infer subject ancestry quickly and accurately since large amounts of genotype data, collected from millions of subjects by thousands of studies using different methods, are accessible to researchers from repositories such as the database of Genotypes and Phenotypes (dbGaP) at the National Center for Biotechnology Information (NCBI). Study-reported populations submitted to dbGaP are often not harmonized across studies or may be missing. Widely-used methods for ancestry prediction assume that most markers are genotyped in all subjects, but this assumption is unrealistic if one wants to combine studies that used different genotyping platforms. To provide ancestry inference and visualization across studies, we developed a new method, GRAF-pop, of ancestry prediction that is robust to missing genotypes and allows researchers to visualize predicted population structure in color and in three dimensions. When genotypes are dense, GRAF-pop is comparable in quality and running time to existing ancestry inference methods EIGENSTRAT, FastPCA, and FlashPCA2, all of which rely on principal components analysis (PCA). When genotypes are not dense, GRAF-pop gives much better ancestry predictions than the PCA-based methods. GRAF-pop employs basic geometric and probabilistic methods; the visualized ancestry predictions have a natural geometric interpretation, which is lacking in PCA-based methods. Since February 2018, GRAF-pop has been successfully incorporated into the dbGaP quality control process to identify inconsistencies between study-reported and computationally predicted populations and to provide harmonized population values in all new dbGaP submissions amenable to population prediction, based on marker genotypes. Plots, produced by GRAF-pop, of summary population predictions are available on dbGaP study pages, and the software, is available at https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/Software.cgi.


Asunto(s)
Bases de Datos Genéticas , Estudios de Asociación Genética/métodos , Programas Informáticos , Algoritmos , Análisis por Conglomerados , Genética de Población , Estudio de Asociación del Genoma Completo , Humanos , Análisis de Componente Principal , Reproducibilidad de los Resultados
8.
J Mol Biol ; 354(3): 706-21, 2005 Dec 02.
Artículo en Inglés | MEDLINE | ID: mdl-16269154

RESUMEN

To adequately deal with the inherent complexity of interactions between protein side-chains, we develop and describe here a novel method for characterizing protein packing within a fold family. Instead of approaching side-chain interactions absolutely from one residue to another, we instead consider the relative interactions of contacting residue pairs. The basic element, the pair-wise relative contact, is constructed from a sequence alignment and contact analysis of a set of structures and consists of a cluster of similarly oriented, interacting, side-chain pairs. To demonstrate this construct's usefulness in analyzing protein structure, we used the pair-wise relative contacts to analyze two sets of protein structures as defined by SCOP: the diverse globin-like superfamily (126 structures) and the more uniform heme binding globin family (a 94 structure subset of the globin-like superfamily). The superfamily structure set produced 1266 unique pair-wise relative contacts, whereas the family structure subset gave 1001 unique pair-wise relative contacts. For both sets, we show that these constructs can be used to accurately and automatically differentiate between fold classes. Furthermore, these pair-wise relative contacts correlate well with sequence identity and thus provide a direct relationship between changes in sequence and changes in structure. To capture the complexity of protein packing, these pair-wise relative contacts can be superimposed around a single residue to create a multi-body construct called a relative packing group. Construction of convex hulls around the individual packing groups provides a measure of the variation in packing around a residue and defines an approximate volume of space occupied by the groups interacting with a residue. We find that these relative packing groups are useful in understanding the structural quality of sequence or structure alignments. Moreover, they provide context to calculate a value for structural randomness, which is important in properly assessing the quality of a structural alignment. The results of this study provide the framework for future analysis for correlating sequence changes to specific structure changes.


Asunto(s)
Globinas/química , Globinas/metabolismo , Algoritmos , Secuencia de Aminoácidos , Animales , Globinas/clasificación , Globinas/genética , Humanos , Modelos Moleculares , Datos de Secuencia Molecular , Filogenia , Pliegue de Proteína , Estructura Terciaria de Proteína , Alineación de Secuencia
9.
Protein Sci ; 13(6): 1636-50, 2004 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-15152094

RESUMEN

We have investigated some of the basic principles that influence generation of protein structures using a fragment-based, random insertion method. We tested buildup methods and fragment library quality for accuracy in constructing a set of known structures. The parameters most influential in the construction procedure are bond and torsion angles with minor inaccuracies in bond angles alone causing >6 A CalphaRMSD for a 150-residue protein. Idealization to a standard set of values corrects this problem, but changes the torsion angles and does not work for every structure. Alternatively, we found using Cartesian coordinates instead of torsion angles did not reduce performance and can potentially increase speed and accuracy. Under conditions simulating ab initio structure prediction, fragment library quality can be suboptimal and still produce near-native structures. Using various clustering criteria, we created a number of libraries and used them to predict a set of native structures based on nonnative fragments. Local CalphaRMSD fit of fragments, library size, and takeoff/landing angle criteria weakly influence the accuracy of the models. Based on a fragment's minimal perturbation upon insertion into a known structure, a seminative fragment library was created that produced more accurate structures with fragments that were less similar to native fragments than the other sets. These results suggest that fragments need only contain native-like subsections, which when correctly overlapped, can recreate a native-like model. For fragment-based, random insertion methods used in protein structure prediction and design, our findings help to define the parameters this method needs to generate near-native structures.


Asunto(s)
Simulación por Computador , Fragmentos de Péptidos/química , Biblioteca de Péptidos , Proteínas/química , Modelos Moleculares , Estructura Terciaria de Proteína
10.
Mol Cell Biol ; 30(21): 5123-34, 2010 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-20823270

RESUMEN

RNase H1 in mammalian cells is present in nuclei and mitochondria. Its absence in mitochondria results in embryonic lethality due to the failure to amplify mitochondrial DNA (mtDNA). Dual localization to mitochondria and nuclei results from differential translation initiation at two in-frame AUGs (M1 and M27) of a single mRNA. Here we show that expression levels of the two isoforms depend on the efficiency of translation initiation at each AUG codon and on the presence of a short upstream open reading frame (uORF) resulting in the mitochondrial isoform being about 10% as abundant as the nuclear form. Translation initiation at the M1 AUG is restricted by the uORF, while expression of the nuclear isoform requires reinitiation of ribosomes at the M27 AUG after termination of uORF translation or new initiation by ribosomes skipping the uORF and the M1 AUG. Such translational organization of RNase H1 allows tight control of expression of RNase H1 in mitochondria, where its excess or absence can lead to cell death, without affecting the expression of the nuclear RNase H1.


Asunto(s)
Codón/genética , Sistemas de Lectura Abierta/genética , Ribonucleasa H/genética , Ribonucleasa H/metabolismo , Secuencia de Aminoácidos , Animales , Secuencia de Bases , Línea Celular , Núcleo Celular/enzimología , ADN Mitocondrial/genética , Humanos , Técnicas In Vitro , Isoenzimas/química , Isoenzimas/genética , Isoenzimas/metabolismo , Hígado/enzimología , Ratones , Mitocondrias/enzimología , Modelos Biológicos , Datos de Secuencia Molecular , Iniciación de la Cadena Peptídica Traduccional , Estructura Terciaria de Proteína , ARN Mensajero/genética , Proteínas Recombinantes/química , Proteínas Recombinantes/genética , Proteínas Recombinantes/metabolismo , Ribonucleasa H/química , Homología de Secuencia de Aminoácido
11.
J Mol Biol ; 397(5): 1144-55, 2010 Apr 16.
Artículo en Inglés | MEDLINE | ID: mdl-20184890

RESUMEN

We demonstrate, using transmission electron microscopy and immunopurification with an antibody specific for RNA/DNA hybrid, that intact mitochondrial DNA replication intermediates are essentially duplex throughout their length but contain extensive RNA tracts on one strand. However, the extent of preservation of RNA in such molecules is highly dependent on the preparative method used. These findings strongly support the strand-coupled model of mitochondrial DNA replication involving RNA incorporation throughout the lagging strand.


Asunto(s)
Replicación del ADN , ADN Mitocondrial/química , Animales , ADN , Humanos , Mamíferos , Conformación de Ácido Nucleico , Hibridación de Ácido Nucleico , ARN
12.
J Struct Biol ; 153(2): 103-12, 2006 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-16377205

RESUMEN

An essential step in understanding the molecular basis of protein-protein interactions is the accurate identification of inter-protein contacts. We evaluate a number of common methods used in analyzing protein-protein interfaces: a Voronoi polyhedra-based approach, changes in solvent accessible surface area (DeltaSASA) and various radial cutoffs (closest atom, Cbeta, and centroid). First, we compared the Voronoi polyhedra-based analysis to the DeltaSASA and show that using Voronoi polyhedra finds knob-in-hole contacts. To assess the accuracy between the Voronoi polyhedra-based approach and the various radial cutoff methods, two sets of data were used: a small set of 75 experimental mutants and a larger one of 592 structures of protein-protein interfaces. In an assessment using the small set, the Voronoi polyhedra-based methods, a solvent accessible surface area method, and the closest atom radial method identified 100% of the direct contacts defined by mutagenesis data, but only the Voronoi polyhedra-based method found no false positives. The other radial methods were not able to find all of the direct contacts even using a cutoff of 9A. With the larger set of structures, we compared the overall number contacts using the Voronoi polyhedra-based method as a standard. All the radial methods using a 6-A cutoff identified more interactions, but these putative contacts included many false positives as well as missed many false negatives. While radial cutoffs are quicker to calculate as well as to implement, this result highlights why radial cutoff methods do not have the proper resolution to detail the non-homogeneous packing within protein interfaces, and suggests an inappropriate bias in pair-wise contact potentials. Of the radial cutoff methods, using the closest atom approach exhibits the best approximation to the more intensive Voronoi calculation. Our version of the Voronoi polyhedra-based method QContacts is available at .


Asunto(s)
Estudios de Evaluación como Asunto , Modelos Moleculares , Bases de Datos de Proteínas , Modelos Químicos , Mutación , Unión Proteica , Proteínas/química , Solventes/química , Agua/química
13.
J Comput Chem ; 26(10): 1063-8, 2005 Jul 30.
Artículo en Inglés | MEDLINE | ID: mdl-15898109

RESUMEN

Many applications require a method for translating a large list of bond angles and bond lengths to precise atomic Cartesian coordinates. This simple but computationally consuming task occurs ubiquitously in modeling proteins, DNA, and other polymers as well as in many other fields such as robotics. To find an optimal method, algorithms can be compared by a number of operations, speed, intrinsic numerical stability, and parallelization. We discuss five established methods for growing a protein backbone by serial chain extension from bond angles and bond lengths. We introduce the Natural Extension Reference Frame (NeRF) method developed for Rosetta's chain extension subroutine, as well as an improved implementation. In comparison to traditional two-step rotations, vector algebra, or Quaternion product algorithms, the NeRF algorithm is superior for this application: it requires 47% fewer floating point operations, demonstrates the best intrinsic numerical stability, and offers prospects for parallel processor acceleration. The NeRF formalism factors the mathematical operations of chain extension into two independent terms with orthogonal subsets of the dependent variables; the apparent irreducibility of these factors hint that the minimal operation set may have been identified. Benchmarks are made on Intel Pentium and Motorola PowerPC CPUs.


Asunto(s)
Algoritmos , Modelos Moleculares , Biosíntesis de Proteínas , Conformación Proteica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA