Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 67
Filter
1.
PLoS Biol ; 22(5): e3002405, 2024 May.
Article in English | MEDLINE | ID: mdl-38713717

ABSTRACT

We report a new visualization tool for analysis of whole-genome assembly-assembly alignments, the Comparative Genome Viewer (CGV) (https://ncbi.nlm.nih.gov/genome/cgv/). CGV visualizes pairwise same-species and cross-species alignments provided by National Center for Biotechnology Information (NCBI) using assembly alignment algorithms developed by us and others. Researchers can examine large structural differences spanning chromosomes, such as inversions or translocations. Users can also navigate to regions of interest, where they can detect and analyze smaller-scale deletions and rearrangements within specific chromosome or gene regions. RefSeq or user-provided gene annotation is displayed where available. CGV currently provides approximately 800 alignments from over 350 animal, plant, and fungal species. CGV and related NCBI viewers are undergoing active development to further meet needs of the research community in comparative genome visualization.


Subject(s)
Genome , Software , Animals , Genome/genetics , Sequence Alignment/methods , Genomics/methods , Algorithms , United States , Humans , Eukaryota/genetics , Databases, Genetic , National Library of Medicine (U.S.) , Molecular Sequence Annotation/methods
2.
Cell Genom ; 4(4): 100527, 2024 Apr 10.
Article in English | MEDLINE | ID: mdl-38537634

ABSTRACT

The seventh iteration of the reference genome assembly for Rattus norvegicus-mRatBN7.2-corrects numerous misplaced segments and reduces base-level errors by approximately 9-fold and increases contiguity by 290-fold compared with its predecessor. Gene annotations are now more complete, improving the mapping precision of genomic, transcriptomic, and proteomics datasets. We jointly analyzed 163 short-read whole-genome sequencing datasets representing 120 laboratory rat strains and substrains using mRatBN7.2. We defined ∼20.0 million sequence variations, of which 18,700 are predicted to potentially impact the function of 6,677 genes. We also generated a new rat genetic map from 1,893 heterogeneous stock rats and annotated transcription start sites and alternative polyadenylation sites. The mRatBN7.2 assembly, along with the extensive analysis of genomic variations among rat strains, enhances our understanding of the rat genome, providing researchers with an expanded resource for studies involving rats.


Subject(s)
Genome , Genomics , Rats , Animals , Genome/genetics , Molecular Sequence Annotation , Whole Genome Sequencing , Genetic Variation/genetics
3.
Genome Biol ; 25(1): 60, 2024 Feb 26.
Article in English | MEDLINE | ID: mdl-38409096

ABSTRACT

Assembled genome sequences are being generated at an exponential rate. Here we present FCS-GX, part of NCBI's Foreign Contamination Screen (FCS) tool suite, optimized to identify and remove contaminant sequences in new genomes. FCS-GX screens most genomes in 0.1-10 min. Testing FCS-GX on artificially fragmented genomes demonstrates high sensitivity and specificity for diverse contaminant species. We used FCS-GX to screen 1.6 million GenBank assemblies and identified 36.8 Gbp of contamination, comprising 0.16% of total bases, with half from 161 assemblies. We updated assemblies in NCBI RefSeq to reduce detected contamination to 0.01% of bases. FCS-GX is available at https://github.com/ncbi/fcs/ or https://doi.org/10.5281/zenodo.10651084 .


Subject(s)
Databases, Nucleic Acid , Genome , Software
4.
BMC Biol ; 22(1): 16, 2024 Jan 25.
Article in English | MEDLINE | ID: mdl-38273363

ABSTRACT

BACKGROUND: Understanding genome organization and evolution is important for species involved in transmission of human diseases, such as mosquitoes. Anophelinae and Culicinae subfamilies of mosquitoes show striking differences in genome sizes, sex chromosome arrangements, behavior, and ability to transmit pathogens. However, the genomic basis of these differences is not fully understood. METHODS: In this study, we used a combination of advanced genome technologies such as Oxford Nanopore Technology sequencing, Hi-C scaffolding, Bionano, and cytogenetic mapping to develop an improved chromosome-scale genome assembly for the West Nile vector Culex quinquefasciatus. RESULTS: We then used this assembly to annotate odorant receptors, odorant binding proteins, and transposable elements. A genomic region containing male-specific sequences on chromosome 1 and a polymorphic inversion on chromosome 3 were identified in the Cx. quinquefasciatus genome. In addition, the genome of Cx. quinquefasciatus was compared with the genomes of other mosquitoes such as malaria vectors An. coluzzi and An. albimanus, and the vector of arboviruses Ae. aegypti. Our work confirms significant expansion of the two chemosensory gene families in Cx. quinquefasciatus, as well as a significant increase and relocation of the transposable elements in both Cx. quinquefasciatus and Ae. aegypti relative to the Anophelines. Phylogenetic analysis clarifies the divergence time between the mosquito species. Our study provides new insights into chromosomal evolution in mosquitoes and finds that the X chromosome of Anophelinae and the sex-determining chromosome 1 of Culicinae have a significantly higher rate of evolution than autosomes. CONCLUSION: The improved Cx. quinquefasciatus genome assembly uncovered new details of mosquito genome evolution and has the potential to speed up the development of novel vector control strategies.


Subject(s)
Aedes , Culex , Animals , Humans , Male , Phylogeny , DNA Transposable Elements/genetics , Mosquito Vectors/genetics , Culex/genetics , Aedes/genetics , Chromosomes , Evolution, Molecular
5.
Nucleic Acids Res ; 52(D1): D33-D43, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37994677

ABSTRACT

The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, SciENcv, the NIH Comparative Genomics Resource (CGR), NCBI Virus, SRA, RefSeq, foreign contamination screening tools, Taxonomy, iCn3D, ClinVar, GTR, MedGen, dbSNP, ALFA, ClinicalTrials.gov, Pathogen Detection, antimicrobial resistance resources, and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.


Subject(s)
Databases, Genetic , National Library of Medicine (U.S.) , Biotechnology/instrumentation , Databases, Nucleic Acid , Internet , United States
6.
bioRxiv ; 2023 Nov 29.
Article in English | MEDLINE | ID: mdl-38077029

ABSTRACT

We report a new visualization tool for analysis of whole genome assembly-assembly alignments, the Comparative Genome Viewer (CGV) (https://ncbi.nlm.nih.gov/genome/cgv/). CGV visualizes pairwise same-species and cross-species alignments provided by NCBI using assembly alignment algorithms developed by us and others. Researchers can examine the alignments between the two assemblies using two alternate views: a chromosome ideogram-based view or a 2D genome dotplot. Whole genome alignment views expose large structural differences spanning chromosomes, such as inversions or translocations. Users can also navigate to regions of interest, where they can detect and analyze smaller-scale deletions and rearrangements within specific chromosome or gene regions. RefSeq or user-provided gene annotation is displayed in the ideogram view where available. CGV currently provides approximately 700 alignments from over 300 animal, plant, and fungal species. CGV and related NCBI viewers are undergoing active development to further meet needs of the research community in comparative genome visualization.

7.
Nature ; 622(7981): 41-47, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37794265

ABSTRACT

Scientists have been trying to identify every gene in the human genome since the initial draft was published in 2001. In the years since, much progress has been made in identifying protein-coding genes, currently estimated to number fewer than 20,000, with an ever-expanding number of distinct protein-coding isoforms. Here we review the status of the human gene catalogue and the efforts to complete it in recent years. Beside the ongoing annotation of protein-coding genes, their isoforms and pseudogenes, the invention of high-throughput RNA sequencing and other technological breakthroughs have led to a rapid growth in the number of reported non-coding RNA genes. For most of these non-coding RNAs, the functional relevance is currently unclear; we look at recent advances that offer paths forward to identifying their functions and towards eventually completing the human gene catalogue. Finally, we examine the need for a universal annotation standard that includes all medically significant genes and maintains their relationships with different reference genomes for the use of the human gene catalogue in clinical settings.


Subject(s)
Genes , Genome, Human , Molecular Sequence Annotation , Protein Isoforms , Humans , Genome, Human/genetics , Molecular Sequence Annotation/standards , Molecular Sequence Annotation/trends , Protein Isoforms/genetics , Human Genome Project , Pseudogenes , RNA/genetics
8.
bioRxiv ; 2023 06 06.
Article in English | MEDLINE | ID: mdl-37292984

ABSTRACT

Assembled genome sequences are being generated at an exponential rate. Here we present FCS-GX, part of NCBI's Foreign Contamination Screen (FCS) tool suite, optimized to identify and remove contaminant sequences in new genomes. FCS-GX screens most genomes in 0.1-10 minutes. Testing FCS-GX on artificially fragmented genomes demonstrates sensitivity >95% for diverse contaminant species and specificity >99.93%. We used FCS-GX to screen 1.6 million GenBank assemblies and identified 36.8 Gbp of contamination (0.16% of total bases), with half from 161 assemblies. We updated assemblies in NCBI RefSeq to reduce detected contamination to 0.01% of bases. FCS-GX is available at https://github.com/ncbi/fcs/.

9.
Surgeon ; 21(6): e323-e327, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37244775

ABSTRACT

TITLE: Losing your head? An evaluation of the readability and reliability of patient information available online for Avascular Necrosis of the Head of Femur. BACKGROUND: Avascular necrosis of the head of femur commonly affects patients with an average age of 58.3 years, and is generally managed in the elective setting, allowing patients a timeframe to research their diagnosis and management options. The aim of this study is to evaluate the readability and reliability of the information available online to patients about this condition. METHODS: Google, Bing and Yahoo internet search engines were utilised, using the search terms "Avascular necrosis head of femur" and "hip avascular necrosis", with the first 30 URLs selected for analysis. Readability was assessed using an online readability calculator to produce 3 scores (Gunning FOG, Flesch Kincaid Grade and Flesch Reading Ease). Information quality was assessed using a HONcode detection web-extension and the JAMA benchmark criteria. RESULTS: 86 webpages were identified for inclusion for assessment. CONCLUSION: The majority of the information available online about avascular necrosis of the head of the femur is not at an appropriate reading level for the general population, and less than 20% of the most accessible information available online is accredited to be of sufficient quality to be providing advice to patients. Medical professionals must work together to improve health literacy among the patients encountered, and ensure recommendation of only reliable and accessible sources of information should patients ask for guidance on finding these resources.


Subject(s)
Comprehension , Health Literacy , Humans , Middle Aged , Reproducibility of Results , Femur , Necrosis , Internet
10.
ACS Synth Biol ; 12(5): 1546-1561, 2023 05 19.
Article in English | MEDLINE | ID: mdl-37134273

ABSTRACT

Cotranscriptionally encoded RNA strand displacement (ctRSD) circuits are an emerging tool for programmable molecular computation, with potential applications spanning in vitro diagnostics to continuous computation inside living cells. In ctRSD circuits, RNA strand displacement components are continuously produced together via transcription. These RNA components can be rationally programmed through base pairing interactions to execute logic and signaling cascades. However, the small number of ctRSD components characterized to date limits circuit size and capabilities. Here, we characterize over 200 ctRSD gate sequences, exploring different input, output, and toehold sequences and changes to other design parameters, including domain lengths, ribozyme sequences, and the order in which gate strands are transcribed. This characterization provides a library of sequence domains for engineering ctRSD components, i.e., a toolkit, enabling circuits with up to 4-fold more inputs than previously possible. We also identify specific failure modes and systematically develop design approaches that reduce the likelihood of failure across different gate sequences. Lastly, we show the ctRSD gate design is robust to changes in transcriptional encoding, opening a broad design space for applications in more complex environments. Together, these results deliver an expanded toolkit and design approaches for building ctRSD circuits that will dramatically extend capabilities and potential applications.


Subject(s)
DNA , RNA , RNA/genetics , Base Pairing , Signal Transduction
11.
bioRxiv ; 2023 Sep 28.
Article in English | MEDLINE | ID: mdl-37214860

ABSTRACT

The seventh iteration of the reference genome assembly for Rattus norvegicus-mRatBN7.2-corrects numerous misplaced segments and reduces base-level errors by approximately 9-fold and increases contiguity by 290-fold compared to its predecessor. Gene annotations are now more complete, significantly improving the mapping precision of genomic, transcriptomic, and proteomics data sets. We jointly analyzed 163 short-read whole genome sequencing datasets representing 120 laboratory rat strains and substrains using mRatBN7.2. We defined ~20.0 million sequence variations, of which 18.7 thousand are predicted to potentially impact the function of 6,677 genes. We also generated a new rat genetic map from 1,893 heterogeneous stock rats and annotated transcription start sites and alternative polyadenylation sites. The mRatBN7.2 assembly, along with the extensive analysis of genomic variations among rat strains, enhances our understanding of the rat genome, providing researchers with an expanded resource for studies involving rats.

12.
ArXiv ; 2023 Mar 24.
Article in English | MEDLINE | ID: mdl-36994150

ABSTRACT

Scientists have been trying to identify all of the genes in the human genome since the initial draft of the genome was published in 2001. Over the intervening years, much progress has been made in identifying protein-coding genes, and the estimated number has shrunk to fewer than 20,000, although the number of distinct protein-coding isoforms has expanded dramatically. The invention of high-throughput RNA sequencing and other technological breakthroughs have led to an explosion in the number of reported non-coding RNA genes, although most of them do not yet have any known function. A combination of recent advances offers a path forward to identifying these functions and towards eventually completing the human gene catalogue. However, much work remains to be done before we have a universal annotation standard that includes all medically significant genes, maintains their relationships with different reference genomes, and describes clinically relevant genetic variants.

13.
Nucleic Acids Res ; 51(D1): D29-D38, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36370100

ABSTRACT

The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases. New resources include the Comparative Genome Resource (CGR) and the BLAST ClusteredNR database. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, IgBLAST, GDV, RefSeq, NCBI Virus, GenBank type assemblies, iCn3D, ClinVar, GTR, dbGaP, ALFA, ClinicalTrials.gov, Pathogen Detection, antimicrobial resistance resources, and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.


Subject(s)
Databases, Genetic , Databases, Nucleic Acid , United States , National Library of Medicine (U.S.) , Sequence Alignment , Biotechnology , Internet
14.
Hypertension ; 80(1): 138-146, 2023 01.
Article in English | MEDLINE | ID: mdl-36330812

ABSTRACT

BACKGROUND: We report the creation and evaluation of a de novo assembly of the genome of the spontaneously hypertensive rat, the most widely used model of human cardiovascular disease. METHODS: The genome is assembled from long read sequencing (PacBio HiFi and continuous long read data [CLR]) and scaffolded with long-range structural information obtained from Bionano optical maps and proximity ligation sequencing proximity analysis of the genome. The genome assembly was polished with Illumina short reads. Completeness of the assembly was investigated using Benchmarking Universal Single Copy Orthologs analysis. The genome assembly was also evaluated with the rat reference gene set, using NCBI automated protocols. We also generated orthogonal single molecule transcript sequence reads (Iso-Seq) from 8 tissues and used them to validate the coding assembly, to annotate the assembly with RNA transcripts representing unique full length transcript isoforms for each gene and to determine whether divergences between RefSeq sequences and the assembly were attributable to assembly errors or polymorphisms. RESULTS: The assembly analysis indicates that this assembly is comparable in contiguity and completeness to the current rat reference assembly, while the use of HiFi sequencing yields an assembly that is more correct at the single base level. Synteny analysis was performed to uncover the extent of synteny and the presence and distribution of chromosomal rearrangements between the reference and this assembly. CONCLUSION: The resulting genome assembly is reference quality and captures significant structural variation.


Subject(s)
Stroke , Humans , Rats , Animals , Rats, Inbred SHR , Stroke/genetics
15.
J Clin Orthop Trauma ; 34: 102021, 2022 Nov.
Article in English | MEDLINE | ID: mdl-36147379

ABSTRACT

Background: Tapered, fluted, titanium (TFT) stems have shown good clinical outcomes in revision total hip arthroplasty (rTHA), however concerns exist regarding early subsidence. This study compares subsidence between a modern monoblock 3-degree and a modular 2-degree TFT stem in rTHA. Methods: A retrospective, international multicentre comparative study was conducted including 64 rTHA in 63 patients. A monoblock TFT stem was used in 37 cases and a modular TFT stem was used in 27 cases. Patient demographics, Paprosky femoral bone loss classification, bicortical contact and stem subsidence were recorded at minimum four week follow up. Results: There was no statistically significant difference in overall subsidence (p = 0.318) or the rate of subsidence >10 mm between stems. Mean subsidence was 2.13 mm in the monoblock group and 3.15 mm in the modular group. Two stems subsided >10 mm: one in each group. There was no difference in bicortical contact between groups (p = 0.98). No re-revisions were performed. Conclusions: We found no difference in subsidence between the two stems. Surgeons may consider the use of monoblock stems in rTHA as they have comparably low rates of subsidence and eliminate the small but potentially catastrophic risk of implant fracture at modular junctions associated with modular stems.

16.
Nature ; 604(7905): 310-315, 2022 04.
Article in English | MEDLINE | ID: mdl-35388217

ABSTRACT

Comprehensive genome annotation is essential to understand the impact of clinically relevant variants. However, the absence of a standard for clinical reporting and browser display complicates the process of consistent interpretation and reporting. To address these challenges, Ensembl/GENCODE1 and RefSeq2 launched a joint initiative, the Matched Annotation from NCBI and EMBL-EBI (MANE) collaboration, to converge on human gene and transcript annotation and to jointly define a high-value set of transcripts and corresponding proteins. Here, we describe the MANE transcript sets for use as universal standards for variant reporting and browser display. The MANE Select set identifies a representative transcript for each human protein-coding gene, whereas the MANE Plus Clinical set provides additional transcripts at loci where the Select transcripts alone are not sufficient to report all currently known clinical variants. Each MANE transcript represents an exact match between the exonic sequences of an Ensembl/GENCODE transcript and its counterpart in RefSeq such that the identifiers can be used synonymously. We have now released MANE Select transcripts for 97% of human protein-coding genes, including all American College of Medical Genetics and Genomics Secondary Findings list v3.0 (ref. 3) genes. MANE transcripts are accessible from major genome browsers and key resources. Widespread adoption of these transcript sets will increase the consistency of reporting, facilitate the exchange of data regardless of the annotation source and help to streamline clinical interpretation.


Subject(s)
Computational Biology , Databases, Genetic , Genomics , Genome , Humans , Information Dissemination , Molecular Sequence Annotation , National Library of Medicine (U.S.) , United States
17.
Oral Maxillofac Surg ; 26(3): 393-400, 2022 Sep.
Article in English | MEDLINE | ID: mdl-34505955

ABSTRACT

PURPOSE: The purpose of this study was to provide a novel report on the head and neck injuries from the sport of wrestling and their characteristics in the USA. MATERIALS AND METHODS: This is a 20-year retrospective cross-sectional study conducted using the National Electronic Injury Surveillance System (NEISS). Reports were included in the analysis if the injury stemmed from combat with another person. The predictor variables were obtained from both patient and injury characteristics. The principal outcome variable was admission rate, which was used to proxy the severity of the injury at hand. Bivariate analysis (i.e., chi-square and independent sample tests) was used to determine if an association existed between two variables of interest. RESULTS: The final sample in our study consisted of 4485 cases of craniomaxillofacial injuries secondary to wrestling. The increase in injuries from the year 2000 to 2019 was significant (P < 0.05). The average age of patients was 15.73 (range: 3 to 59 years old). Virtually all of the injuries occurred in males (95.6%). The majority of patients was under the age of 18 (82.3%). With regard to race, white wrestlers (57.1%) comprised the majority of patients. Insight into race was not available for 1245 patients (27.8%). Most wrestling-related injuries took place during the winter season (60.6%). Concussion was the most common primary diagnosis (29.0%). The head (57.1%) was the most commonly injured craniomaxillofacial region. The most common setting in which the injury took place was a place of recreation/sports (49.9%). Among the mechanisms of injuries, the take-down (26.5%) was the most common. Patients who were thrown/taken down (5.04%) were significantly more likely to get admitted (P < 0.01) relative to patients who were injured otherwise (2.6%). Similarly, patients who fell/tripped (6.6%) were significantly more likely to get admitted (P < 0.05) relative to patients who were injured otherwise (3.1%). While cases of concussion (6.0%) were significantly more likely to get admitted (P < 0.01) relative to other cases, cases of contusions/abrasions (0.6%) were significantly less likely to get admitted (P < 0.01) relative to other cases. Similar to contusions/abrasions (0.2%), lacerations were significantly less likely to get admitted (P < 0.01) relative to other cases. Patients aged 12-18 (P < 0.01) were most likely to suffer concussions, whereas patients aged 19-34 (P < 0.01) were least likely to suffer concussions. In contrast to concussions, patients aged 12-18 (P < 0.01) were least likely to suffer lacerations, whereas patients aged 19-34 (P < 0.01) were most likely to suffer lacerations. Patients aged 6-11 (P < 0.01) were most likely to be thrown/taken-down whereas patients aged 19-34 (P < 0.01) were least likely to be thrown. Patients aged 19-34 (P < 0.01) were most likely to be collided against intentionally, while patients aged 6-11 (P < 0.01) were least likely to be collided against intentionally. Patients aged 34 years or older were most likely to fall/trip, while patients aged 12-18 (P < 0.01) were least likely to fall/trip. CONCLUSIONS: Certain types of injuries that occur during wrestling are more or less common depending on the age groups involved in the sport. Concussions were the most common injury incurred overall, and the head is the most commonly affected craniomaxillofacial area. Take-downs were the most likely mechanism of injury to lead to hospital admissions. The average number of wrestling injuries increased over 20 years being analyzed in this study. Future studies should investigate methods to lessen concussions in wrestling, decrease the number of illegal moves performed, and look into ways to mitigate harm from take-downs, given the increasing number of injuries acquired from this sport.


Subject(s)
Athletic Injuries , Brain Concussion , Contusions , Lacerations , Wrestling , Adolescent , Adult , Athletic Injuries/diagnosis , Athletic Injuries/epidemiology , Brain Concussion/epidemiology , Brain Concussion/etiology , Child , Child, Preschool , Contusions/epidemiology , Cross-Sectional Studies , Electronics , Humans , Lacerations/epidemiology , Male , Middle Aged , Retrospective Studies , Wrestling/injuries , Young Adult
18.
Nucleic Acids Res ; 50(D1): D20-D26, 2022 01 07.
Article in English | MEDLINE | ID: mdl-34850941

ABSTRACT

The National Center for Biotechnology Information (NCBI) produces a variety of online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for the most of these databases. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, RefSeq, SRA, Virus, dbSNP, dbVar, ClinicalTrials.gov, MMDB, iCn3D and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.


Subject(s)
Biotechnology/trends , Databases, Genetic/trends , Databases, Chemical , Databases, Nucleic Acid , Databases, Protein , Humans , Internet , National Library of Medicine (U.S.) , PubMed , United States
19.
Genome Res ; 32(1): 175-188, 2022 01.
Article in English | MEDLINE | ID: mdl-34876495

ABSTRACT

Eukaryotic genomes contain many nongenic elements that function in gene regulation, chromosome organization, recombination, repair, or replication, and mutation of those elements can affect genome function and cause disease. Although numerous epigenomic studies provide high coverage of gene regulatory regions, those data are not usually exposed in traditional genome annotation and can be difficult to access and interpret without field-specific expertise. The National Center for Biotechnology Information (NCBI) therefore provides RefSeq Functional Elements (RefSeqFEs), which represent experimentally validated human and mouse nongenic elements derived from the literature. The curated data set is comprised of richly annotated sequence records, descriptive records in the NCBI Gene database, reference genome feature annotation, and activity-based interactions between nongenic regions, target genes, and each other. The data set provides succinct functional details and transparent experimental evidence, leverages data from multiple experimental sources, is readily accessible and adaptable, and uses a flexible data model. The data have multiple uses for basic functional discovery, bioinformatics studies, genetic variant interpretation; as known positive controls for epigenomic data evaluation; and as reference standards for functional interactions. Comparisons to other gene regulatory data sets show that the RefSeqFE data set includes a wider range of feature types representing more areas of biology, but it is comparatively smaller and subject to data selection biases. RefSeqFEs thus provide an alternative and complementary resource for experimentally assayed functional elements, with future data set growth expected.


Subject(s)
Computational Biology , Genome , Animals , Databases, Genetic , Eukaryota/genetics , Humans , Mice , Reference Standards
SELECTION OF CITATIONS
SEARCH DETAIL
...