Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 14 de 14
Filter
1.
Nucleic Acids Res ; 48(D1): D835-D844, 2020 01 08.
Article in English | MEDLINE | ID: mdl-31777943

ABSTRACT

ClinVar is a freely available, public archive of human genetic variants and interpretations of their relationships to diseases and other conditions, maintained at the National Institutes of Health (NIH). Submitted interpretations of variants are aggregated and made available on the ClinVar website (https://www.ncbi.nlm.nih.gov/clinvar/), and as downloadable files via FTP and through programmatic tools such as NCBI's E-utilities. The default view on the ClinVar website, the Variation page, was recently redesigned. The new layout includes several new sections that make it easier to find submitted data as well as summary data such as all diseases and citations reported for the variant. The new design also better represents more complex data such as haplotypes and genotypes, as well as variants that are in ClinVar as part of a haplotype or genotype but have no interpretation for the single variant. ClinVar's variant-centric XML had its production release in April 2019. The ClinVar website and E-utilities both have been updated to support the VCV (variation in ClinVar) accession numbers found in the variant-centric XML file. ClinVar's search engine has been fine-tuned for improved retrieval of search results.


Subject(s)
Databases, Genetic , Disease/genetics , Genetic Variation/genetics , Genome, Human , Genomics , Haplotypes , Humans , Internet , National Library of Medicine (U.S.) , Search Engine , United States
2.
Bioinformatics ; 36(6): 1902-1907, 2020 03 01.
Article in English | MEDLINE | ID: mdl-31738401

ABSTRACT

MOTIVATION: Normalizing sequence variants on a reference, projecting them across congruent sequences and aggregating their diverse representations are critical to the elucidation of the genetic basis of disease and biological function. Inconsistent representation of variants among variant callers, local databases and tools result in discrepancies that complicate analysis. NCBI's genetic variation resources, dbSNP and ClinVar, require a robust, scalable set of principles to manage asserted sequence variants. RESULTS: The SPDI data model defines variants as a sequence of four attributes: sequence, position, deletion and insertion, and can be applied to nucleotide and protein variants. NCBI web services convert representations among HGVS, VCF and SPDI and provide two functions to aggregate variants. One, based on the NCBI Variant Overprecision Correction Algorithm, returns a unique, normalized representation termed the 'Contextual Allele'. The SPDI data model, with its four operations, defines exactly the reference subsequence affected by the variant, even in repeat regions, such as homopolymer and other sequence repeats. The second function projects variants across congruent sequences and depends on an alignment dataset of non-assembly NCBI RefSeq sequences (prefixed NM, NR and NG), as well as inter- and intra-assembly-associated genomic sequences (NCs, NTs and NWs), supporting robust projection of variants across congruent sequences and assembly versions. The variant is projected to all congruent Contextual Alleles. One of these Contextual Alleles, typically the allele based on the latest assembly version, represents the entire set, is designated the unique 'Canonical Allele' and is used directly to aggregate variants across congruent sequences. AVAILABILITY AND IMPLEMENTATION: The SPDI services are available for open access at: https://api.ncbi.nlm.nih.gov/variation/v0. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Databases, Genetic , Genomics , Algorithms , Genome , Vocabulary, Controlled
3.
Nucleic Acids Res ; 47(D1): D23-D28, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30395293

ABSTRACT

The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 38 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. New resources released in the past year include PubMed Labs and a new sequence database search. Resources that were updated in the past year include PubMed, PMC, Bookshelf, genome data viewer, Assembly, prokaryotic genomes, Genome, BioProject, dbSNP, dbVar, BLAST databases, igBLAST, iCn3D and PubChem. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.


Subject(s)
Biotechnology/organization & administration , Databases, Genetic , Animals , Biotechnology/methods , Databases, Chemical , Humans , Software , United States/epidemiology , Web Browser
4.
Nucleic Acids Res ; 46(D1): D1062-D1067, 2018 01 04.
Article in English | MEDLINE | ID: mdl-29165669

ABSTRACT

ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/) is a freely available, public archive of human genetic variants and interpretations of their significance to disease, maintained at the National Institutes of Health. Interpretations of the clinical significance of variants are submitted by clinical testing laboratories, research laboratories, expert panels and other groups. ClinVar aggregates data by variant-disease pairs, and by variant (or set of variants). Data aggregated by variant are accessible on the website, in an improved set of variant call format files and as a new comprehensive XML report. ClinVar recently started accepting submissions that are focused primarily on providing phenotypic information for individuals who have had genetic testing. Submissions may come from clinical providers providing their own interpretation of the variant ('provider interpretation') or from groups such as patient registries that primarily provide phenotypic information from patients ('phenotyping only'). ClinVar continues to make improvements to its search and retrieval functions. Several new fields are now indexed for more precise searching, and filters allow the user to narrow down a large set of search results.


Subject(s)
Databases, Nucleic Acid , Disease/genetics , Genetic Variation , Humans , Phenotype
5.
Proc Natl Acad Sci U S A ; 113(30): E4276-85, 2016 07 26.
Article in English | MEDLINE | ID: mdl-27402764

ABSTRACT

The genetic information in mammalian mitochondrial DNA is densely packed; there are no introns and only one sizeable noncoding, or control, region containing key cis-elements for its replication and expression. Many molecules of mitochondrial DNA bear a third strand of DNA, known as "7S DNA," which forms a displacement (D-) loop in the control region. Here we show that many other molecules contain RNA as a third strand. The RNA of these R-loops maps to the control region of the mitochondrial DNA and is complementary to 7S DNA. Ribonuclease H1 is essential for mitochondrial DNA replication; it degrades RNA hybridized to DNA, so the R-loop is a potential substrate. In cells with a pathological variant of ribonuclease H1 associated with mitochondrial disease, R-loops are of low abundance, and there is mitochondrial DNA aggregation. These findings implicate ribonuclease H1 and RNA in the physical segregation of mitochondrial DNA, perturbation of which represents a previously unidentified disease mechanism.


Subject(s)
DNA, Mitochondrial/genetics , Mitochondria/genetics , Mutation , Ribonuclease H/genetics , Animals , Cell Line, Tumor , Cells, Cultured , DNA Replication , DNA, Mitochondrial/chemistry , DNA, Mitochondrial/metabolism , Female , HEK293 Cells , Humans , Male , Mice , Mitochondria/metabolism , Mitochondrial Diseases/genetics , Mitochondrial Diseases/metabolism , Nucleic Acid Conformation , Ribonuclease H/metabolism
6.
Proc Natl Acad Sci U S A ; 112(30): 9334-9, 2015 Jul 28.
Article in English | MEDLINE | ID: mdl-26162680

ABSTRACT

Encoding ribonuclease H1 (RNase H1) degrades RNA hybridized to DNA, and its function is essential for mitochondrial DNA maintenance in the developing mouse. Here we define the role of RNase H1 in mitochondrial DNA replication. Analysis of replicating mitochondrial DNA in embryonic fibroblasts lacking RNase H1 reveals retention of three primers in the major noncoding region (NCR) and one at the prominent lagging-strand initiation site termed Ori-L. Primer retention does not lead immediately to depletion, as the persistent RNA is fully incorporated in mitochondrial DNA. However, the retained primers present an obstacle to the mitochondrial DNA polymerase γ in subsequent rounds of replication and lead to the catastrophic generation of a double-strand break at the origin when the resulting gapped molecules are copied. Hence, the essential role of RNase H1 in mitochondrial DNA replication is the removal of primers at the origin of replication.


Subject(s)
DNA Primers/chemistry , DNA Replication , DNA, Mitochondrial/chemistry , Ribonuclease H/chemistry , Animals , Cell Line , DNA/chemistry , Exons , Fibroblasts/metabolism , Genotype , Homozygote , Mice , Mice, Knockout , Mitochondria/metabolism , Nucleotides/chemistry , RNA/chemistry , RNA, Mitochondrial , Replication Origin
7.
Sci Data ; 11(1): 732, 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38969627

ABSTRACT

To explore complex biological questions, it is often necessary to access various data types from public data repositories. As the volume and complexity of biological sequence data grow, public repositories face significant challenges in ensuring that the data is easily discoverable and usable by the biological research community. To address these challenges, the National Center for Biotechnology Information (NCBI) has created NCBI Datasets. This resource provides straightforward, comprehensive, and scalable access to biological sequences, annotations, and metadata for a wide range of taxa. Following the FAIR (Findable, Accessible, Interoperable, and Reusable) data management principles, NCBI Datasets offers user-friendly web interfaces, command-line tools, and documented APIs, empowering researchers to access NCBI data seamlessly. The data is delivered as packages of sequences and metadata, thus facilitating improved data retrieval, sharing, and usability in research. Moreover, this data delivery method fosters effective data attribution and promotes its further reuse. This paper outlines the current scope of data accessible through NCBI Datasets and explains various options for exploring and downloading the data.


Subject(s)
Metadata , Databases, Genetic , United States , Information Storage and Retrieval
8.
G3 (Bethesda) ; 9(8): 2447-2461, 2019 08 08.
Article in English | MEDLINE | ID: mdl-31151998

ABSTRACT

Inferring subject ancestry using genetic data is an important step in genetic association studies, required for dealing with population stratification. It has become more challenging to infer subject ancestry quickly and accurately since large amounts of genotype data, collected from millions of subjects by thousands of studies using different methods, are accessible to researchers from repositories such as the database of Genotypes and Phenotypes (dbGaP) at the National Center for Biotechnology Information (NCBI). Study-reported populations submitted to dbGaP are often not harmonized across studies or may be missing. Widely-used methods for ancestry prediction assume that most markers are genotyped in all subjects, but this assumption is unrealistic if one wants to combine studies that used different genotyping platforms. To provide ancestry inference and visualization across studies, we developed a new method, GRAF-pop, of ancestry prediction that is robust to missing genotypes and allows researchers to visualize predicted population structure in color and in three dimensions. When genotypes are dense, GRAF-pop is comparable in quality and running time to existing ancestry inference methods EIGENSTRAT, FastPCA, and FlashPCA2, all of which rely on principal components analysis (PCA). When genotypes are not dense, GRAF-pop gives much better ancestry predictions than the PCA-based methods. GRAF-pop employs basic geometric and probabilistic methods; the visualized ancestry predictions have a natural geometric interpretation, which is lacking in PCA-based methods. Since February 2018, GRAF-pop has been successfully incorporated into the dbGaP quality control process to identify inconsistencies between study-reported and computationally predicted populations and to provide harmonized population values in all new dbGaP submissions amenable to population prediction, based on marker genotypes. Plots, produced by GRAF-pop, of summary population predictions are available on dbGaP study pages, and the software, is available at https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/Software.cgi.


Subject(s)
Databases, Genetic , Genetic Association Studies/methods , Software , Algorithms , Cluster Analysis , Genetics, Population , Genome-Wide Association Study , Humans , Principal Component Analysis , Reproducibility of Results
9.
J Mol Biol ; 354(3): 706-21, 2005 Dec 02.
Article in English | MEDLINE | ID: mdl-16269154

ABSTRACT

To adequately deal with the inherent complexity of interactions between protein side-chains, we develop and describe here a novel method for characterizing protein packing within a fold family. Instead of approaching side-chain interactions absolutely from one residue to another, we instead consider the relative interactions of contacting residue pairs. The basic element, the pair-wise relative contact, is constructed from a sequence alignment and contact analysis of a set of structures and consists of a cluster of similarly oriented, interacting, side-chain pairs. To demonstrate this construct's usefulness in analyzing protein structure, we used the pair-wise relative contacts to analyze two sets of protein structures as defined by SCOP: the diverse globin-like superfamily (126 structures) and the more uniform heme binding globin family (a 94 structure subset of the globin-like superfamily). The superfamily structure set produced 1266 unique pair-wise relative contacts, whereas the family structure subset gave 1001 unique pair-wise relative contacts. For both sets, we show that these constructs can be used to accurately and automatically differentiate between fold classes. Furthermore, these pair-wise relative contacts correlate well with sequence identity and thus provide a direct relationship between changes in sequence and changes in structure. To capture the complexity of protein packing, these pair-wise relative contacts can be superimposed around a single residue to create a multi-body construct called a relative packing group. Construction of convex hulls around the individual packing groups provides a measure of the variation in packing around a residue and defines an approximate volume of space occupied by the groups interacting with a residue. We find that these relative packing groups are useful in understanding the structural quality of sequence or structure alignments. Moreover, they provide context to calculate a value for structural randomness, which is important in properly assessing the quality of a structural alignment. The results of this study provide the framework for future analysis for correlating sequence changes to specific structure changes.


Subject(s)
Globins/chemistry , Globins/metabolism , Algorithms , Amino Acid Sequence , Animals , Globins/classification , Globins/genetics , Humans , Models, Molecular , Molecular Sequence Data , Phylogeny , Protein Folding , Protein Structure, Tertiary , Sequence Alignment
10.
Protein Sci ; 13(6): 1636-50, 2004 Jun.
Article in English | MEDLINE | ID: mdl-15152094

ABSTRACT

We have investigated some of the basic principles that influence generation of protein structures using a fragment-based, random insertion method. We tested buildup methods and fragment library quality for accuracy in constructing a set of known structures. The parameters most influential in the construction procedure are bond and torsion angles with minor inaccuracies in bond angles alone causing >6 A CalphaRMSD for a 150-residue protein. Idealization to a standard set of values corrects this problem, but changes the torsion angles and does not work for every structure. Alternatively, we found using Cartesian coordinates instead of torsion angles did not reduce performance and can potentially increase speed and accuracy. Under conditions simulating ab initio structure prediction, fragment library quality can be suboptimal and still produce near-native structures. Using various clustering criteria, we created a number of libraries and used them to predict a set of native structures based on nonnative fragments. Local CalphaRMSD fit of fragments, library size, and takeoff/landing angle criteria weakly influence the accuracy of the models. Based on a fragment's minimal perturbation upon insertion into a known structure, a seminative fragment library was created that produced more accurate structures with fragments that were less similar to native fragments than the other sets. These results suggest that fragments need only contain native-like subsections, which when correctly overlapped, can recreate a native-like model. For fragment-based, random insertion methods used in protein structure prediction and design, our findings help to define the parameters this method needs to generate near-native structures.


Subject(s)
Computer Simulation , Peptide Fragments/chemistry , Peptide Library , Proteins/chemistry , Models, Molecular , Protein Structure, Tertiary
11.
Mol Cell Biol ; 30(21): 5123-34, 2010 Nov.
Article in English | MEDLINE | ID: mdl-20823270

ABSTRACT

RNase H1 in mammalian cells is present in nuclei and mitochondria. Its absence in mitochondria results in embryonic lethality due to the failure to amplify mitochondrial DNA (mtDNA). Dual localization to mitochondria and nuclei results from differential translation initiation at two in-frame AUGs (M1 and M27) of a single mRNA. Here we show that expression levels of the two isoforms depend on the efficiency of translation initiation at each AUG codon and on the presence of a short upstream open reading frame (uORF) resulting in the mitochondrial isoform being about 10% as abundant as the nuclear form. Translation initiation at the M1 AUG is restricted by the uORF, while expression of the nuclear isoform requires reinitiation of ribosomes at the M27 AUG after termination of uORF translation or new initiation by ribosomes skipping the uORF and the M1 AUG. Such translational organization of RNase H1 allows tight control of expression of RNase H1 in mitochondria, where its excess or absence can lead to cell death, without affecting the expression of the nuclear RNase H1.


Subject(s)
Codon/genetics , Open Reading Frames/genetics , Ribonuclease H/genetics , Ribonuclease H/metabolism , Amino Acid Sequence , Animals , Base Sequence , Cell Line , Cell Nucleus/enzymology , DNA, Mitochondrial/genetics , Humans , In Vitro Techniques , Isoenzymes/chemistry , Isoenzymes/genetics , Isoenzymes/metabolism , Liver/enzymology , Mice , Mitochondria/enzymology , Models, Biological , Molecular Sequence Data , Peptide Chain Initiation, Translational , Protein Structure, Tertiary , RNA, Messenger/genetics , Recombinant Proteins/chemistry , Recombinant Proteins/genetics , Recombinant Proteins/metabolism , Ribonuclease H/chemistry , Sequence Homology, Amino Acid
12.
J Mol Biol ; 397(5): 1144-55, 2010 Apr 16.
Article in English | MEDLINE | ID: mdl-20184890

ABSTRACT

We demonstrate, using transmission electron microscopy and immunopurification with an antibody specific for RNA/DNA hybrid, that intact mitochondrial DNA replication intermediates are essentially duplex throughout their length but contain extensive RNA tracts on one strand. However, the extent of preservation of RNA in such molecules is highly dependent on the preparative method used. These findings strongly support the strand-coupled model of mitochondrial DNA replication involving RNA incorporation throughout the lagging strand.


Subject(s)
DNA Replication , DNA, Mitochondrial/chemistry , Animals , DNA , Humans , Mammals , Nucleic Acid Conformation , Nucleic Acid Hybridization , RNA
13.
J Struct Biol ; 153(2): 103-12, 2006 Feb.
Article in English | MEDLINE | ID: mdl-16377205

ABSTRACT

An essential step in understanding the molecular basis of protein-protein interactions is the accurate identification of inter-protein contacts. We evaluate a number of common methods used in analyzing protein-protein interfaces: a Voronoi polyhedra-based approach, changes in solvent accessible surface area (DeltaSASA) and various radial cutoffs (closest atom, Cbeta, and centroid). First, we compared the Voronoi polyhedra-based analysis to the DeltaSASA and show that using Voronoi polyhedra finds knob-in-hole contacts. To assess the accuracy between the Voronoi polyhedra-based approach and the various radial cutoff methods, two sets of data were used: a small set of 75 experimental mutants and a larger one of 592 structures of protein-protein interfaces. In an assessment using the small set, the Voronoi polyhedra-based methods, a solvent accessible surface area method, and the closest atom radial method identified 100% of the direct contacts defined by mutagenesis data, but only the Voronoi polyhedra-based method found no false positives. The other radial methods were not able to find all of the direct contacts even using a cutoff of 9A. With the larger set of structures, we compared the overall number contacts using the Voronoi polyhedra-based method as a standard. All the radial methods using a 6-A cutoff identified more interactions, but these putative contacts included many false positives as well as missed many false negatives. While radial cutoffs are quicker to calculate as well as to implement, this result highlights why radial cutoff methods do not have the proper resolution to detail the non-homogeneous packing within protein interfaces, and suggests an inappropriate bias in pair-wise contact potentials. Of the radial cutoff methods, using the closest atom approach exhibits the best approximation to the more intensive Voronoi calculation. Our version of the Voronoi polyhedra-based method QContacts is available at .


Subject(s)
Evaluation Studies as Topic , Models, Molecular , Databases, Protein , Models, Chemical , Mutation , Protein Binding , Proteins/chemistry , Solvents/chemistry , Water/chemistry
14.
J Comput Chem ; 26(10): 1063-8, 2005 Jul 30.
Article in English | MEDLINE | ID: mdl-15898109

ABSTRACT

Many applications require a method for translating a large list of bond angles and bond lengths to precise atomic Cartesian coordinates. This simple but computationally consuming task occurs ubiquitously in modeling proteins, DNA, and other polymers as well as in many other fields such as robotics. To find an optimal method, algorithms can be compared by a number of operations, speed, intrinsic numerical stability, and parallelization. We discuss five established methods for growing a protein backbone by serial chain extension from bond angles and bond lengths. We introduce the Natural Extension Reference Frame (NeRF) method developed for Rosetta's chain extension subroutine, as well as an improved implementation. In comparison to traditional two-step rotations, vector algebra, or Quaternion product algorithms, the NeRF algorithm is superior for this application: it requires 47% fewer floating point operations, demonstrates the best intrinsic numerical stability, and offers prospects for parallel processor acceleration. The NeRF formalism factors the mathematical operations of chain extension into two independent terms with orthogonal subsets of the dependent variables; the apparent irreducibility of these factors hint that the minimal operation set may have been identified. Benchmarks are made on Intel Pentium and Motorola PowerPC CPUs.


Subject(s)
Algorithms , Models, Molecular , Protein Biosynthesis , Protein Conformation
SELECTION OF CITATIONS
SEARCH DETAIL