Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 61
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Cell ; 183(3): 702-716.e14, 2020 10 29.
Artigo em Inglês | MEDLINE | ID: mdl-33125890

RESUMO

The cellular complexity and scale of the early liver have constrained analyses examining its emergence during organogenesis. To circumvent these issues, we analyzed 45,334 single-cell transcriptomes from embryonic day (E)7.5, when endoderm progenitors are specified, to E10.5 liver, when liver parenchymal and non-parenchymal cell lineages emerge. Our data detail divergence of vascular and sinusoidal endothelia, including a distinct transcriptional profile for sinusoidal endothelial specification by E8.75. We characterize two distinct mesothelial cell types as well as early hepatic stellate cells and reveal distinct spatiotemporal distributions for these populations. We capture transcriptional profiles for hepatoblast specification and migration, including the emergence of a hepatomesenchymal cell type and evidence for hepatoblast collective cell migration. Further, we identify cell-cell interactions during the organization of the primitive sinusoid. This study provides a comprehensive atlas of liver lineage establishment from the endoderm and mesoderm through to the organization of the primitive sinusoid at single-cell resolution.


Assuntos
Linhagem da Célula/genética , Fígado/citologia , Fígado/metabolismo , Análise de Célula Única , Transcriptoma/genética , Animais , Movimento Celular , Embrião de Mamíferos/citologia , Endotélio/citologia , Mesoderma/citologia , Camundongos , Transdução de Sinais , Células-Tronco/citologia
2.
Cell ; 172(5): 897-909.e21, 2018 02 22.
Artigo em Inglês | MEDLINE | ID: mdl-29474918

RESUMO

X-linked Dystonia-Parkinsonism (XDP) is a Mendelian neurodegenerative disease that is endemic to the Philippines and is associated with a founder haplotype. We integrated multiple genome and transcriptome assembly technologies to narrow the causal mutation to the TAF1 locus, which included a SINE-VNTR-Alu (SVA) retrotransposition into intron 32 of the gene. Transcriptome analyses identified decreased expression of the canonical cTAF1 transcript among XDP probands, and de novo assembly across multiple pluripotent stem-cell-derived neuronal lineages discovered aberrant TAF1 transcription that involved alternative splicing and intron retention (IR) in proximity to the SVA that was anti-correlated with overall TAF1 expression. CRISPR/Cas9 excision of the SVA rescued this XDP-specific transcriptional signature and normalized TAF1 expression in probands. These data suggest an SVA-mediated aberrant transcriptional mechanism associated with XDP and may provide a roadmap for layered technologies and integrated assembly-based analyses for other unsolved Mendelian disorders.


Assuntos
Distúrbios Distônicos/genética , Doenças Genéticas Ligadas ao Cromossomo X/genética , Genoma Humano , Transcriptoma/genética , Processamento Alternativo/genética , Elementos Alu/genética , Sequência de Bases , Sistemas CRISPR-Cas/genética , Estudos de Coortes , Família , Feminino , Loci Gênicos , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala , Histona Acetiltransferases/genética , Histona Acetiltransferases/metabolismo , Humanos , Células-Tronco Pluripotentes Induzidas/metabolismo , Íntrons/genética , Masculino , Repetições Minissatélites/genética , Modelos Genéticos , Degeneração Neural/genética , Degeneração Neural/patologia , Células-Tronco Neurais/metabolismo , Neurônios/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Elementos Nucleotídeos Curtos e Dispersos , Fatores Associados à Proteína de Ligação a TATA/genética , Fatores Associados à Proteína de Ligação a TATA/metabolismo , Fator de Transcrição TFIID/genética , Fator de Transcrição TFIID/metabolismo
3.
Nature ; 569(7756): 361-367, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-30959515

RESUMO

Here we delineate the ontogeny of the mammalian endoderm by generating 112,217 single-cell transcriptomes, which represent all endoderm populations within the mouse embryo until midgestation. We use graph-based approaches to model differentiating cells, which provides a spatio-temporal characterization of developmental trajectories and defines the transcriptional architecture that accompanies the emergence of the first (primitive or extra-embryonic) endodermal population and its sister pluripotent (embryonic) epiblast lineage. We uncover a relationship between descendants of these two lineages, in which epiblast cells differentiate into endoderm at two distinct time points-before and during gastrulation. Trajectories of endoderm cells were mapped as they acquired embryonic versus extra-embryonic fates and as they spatially converged within the nascent gut endoderm, which revealed these cells to be globally similar but retain aspects of their lineage history. We observed the regionalized identity of cells along the anterior-posterior axis of the emergent gut tube, which reflects their embryonic or extra-embryonic origin, and the coordinated patterning of these cells into organ-specific territories.


Assuntos
Endoderma/citologia , Endoderma/embriologia , Intestinos/citologia , Intestinos/embriologia , Análise de Célula Única , Animais , Blastocisto/citologia , Padronização Corporal , Diferenciação Celular , Linhagem da Célula , Feminino , Gastrulação , Masculino , Camundongos
4.
Genome Res ; 29(4): 635-645, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30894395

RESUMO

Large-scale population analyses coupled with advances in technology have demonstrated that the human genome is more diverse than originally thought. To date, this diversity has largely been uncovered using short-read whole-genome sequencing. However, these short-read approaches fail to give a complete picture of a genome. They struggle to identify structural events, cannot access repetitive regions, and fail to resolve the human genome into haplotypes. Here, we describe an approach that retains long range information while maintaining the advantages of short reads. Starting from ∼1 ng of high molecular weight DNA, we produce barcoded short-read libraries. Novel informatic approaches allow for the barcoded short reads to be associated with their original long molecules producing a novel data type known as "Linked-Reads". This approach allows for simultaneous detection of small and large variants from a single library. In this manuscript, we show the advantages of Linked-Reads over standard short-read approaches for reference-based analysis. Linked-Reads allow mapping to 38 Mb of sequence not accessible to short reads, adding sequence in 423 difficult-to-sequence genes including disease-relevant genes STRC, SMN1, and SMN2 Both Linked-Read whole-genome and whole-exome sequencing identify complex structural variations, including balanced events and single exon deletions and duplications. Further, Linked-Reads extend the region of high-confidence calls by 68.9 Mb. The data presented here show that Linked-Reads provide a scalable approach for comprehensive genome analysis that is not possible using short reads alone.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Polimorfismo Genético , Sequenciamento Completo do Genoma/métodos , Linhagem Celular , Genoma Humano , Humanos , Peptídeos e Proteínas de Sinalização Intercelular , Proteínas de Membrana/genética , Proteína 1 de Sobrevivência do Neurônio Motor/genética , Proteína 2 de Sobrevivência do Neurônio Motor/genética
5.
Nature ; 581(7809): 385-386, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32461645
6.
BMC Genomics ; 21(1): 259, 2020 Mar 30.
Artigo em Inglês | MEDLINE | ID: mdl-32228451

RESUMO

BACKGROUND: The olive fruit fly, Bactrocera oleae, is the most important pest in the olive fruit agribusiness industry. This is because female flies lay their eggs in the unripe fruits and upon hatching the larvae feed on the fruits thus destroying them. The lack of a high-quality genome and other genomic and transcriptomic data has hindered progress in understanding the fly's biology and proposing alternative control methods to pesticide use. RESULTS: Genomic DNA was sequenced from male and female Demokritos strain flies, maintained in the laboratory for over 45 years. We used short-, mate-pair-, and long-read sequencing technologies to generate a combined male-female genome assembly (GenBank accession GCA_001188975.2). Genomic DNA sequencing from male insects using 10x Genomics linked-reads technology followed by mate-pair and long-read scaffolding and gap-closing generated a highly contiguous 489 Mb genome with a scaffold N50 of 4.69 Mb and L50 of 30 scaffolds (GenBank accession GCA_001188975.4). RNA-seq data generated from 12 tissues and/or developmental stages allowed for genome annotation. Short reads from both males and females and the chromosome quotient method enabled identification of Y-chromosome scaffolds which were extensively validated by PCR. CONCLUSIONS: The high-quality genome generated represents a critical tool in olive fruit fly research. We provide an extensive RNA-seq data set, and genome annotation, critical towards gaining an insight into the biology of the olive fruit fly. In addition, elucidation of Y-chromosome sequences will advance our understanding of the Y-chromosome's organization, function and evolution and is poised to provide avenues for sterile insect technique approaches.


Assuntos
Tephritidae/genética , Cromossomo Y/genética , Cromossomo Y/metabolismo , Animais , Feminino , Genoma de Inseto/genética , Masculino , Reação em Cadeia da Polimerase
7.
Genome Res ; 27(5): 757-767, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-28381613

RESUMO

Determining the genome sequence of an organism is challenging, yet fundamental to understanding its biology. Over the past decade, thousands of human genomes have been sequenced, contributing deeply to biomedical research. In the vast majority of cases, these have been analyzed by aligning sequence reads to a single reference genome, biasing the resulting analyses, and in general, failing to capture sequences novel to a given genome. Some de novo assemblies have been constructed free of reference bias, but nearly all were constructed by merging homologous loci into single "consensus" sequences, generally absent from nature. These assemblies do not correctly represent the diploid biology of an individual. In exactly two cases, true diploid de novo assemblies have been made, at great expense. One was generated using Sanger sequencing, and one using thousands of clone pools. Here, we demonstrate a straightforward and low-cost method for creating true diploid de novo assemblies. We make a single library from ∼1 ng of high molecular weight DNA, using the 10x Genomics microfluidic platform to partition the genome. We applied this technique to seven human samples, generating low-cost HiSeq X data, then assembled these using a new "pushbutton" algorithm, Supernova. Each computation took 2 d on a single server. Each yielded contigs longer than 100 kb, phase blocks longer than 2.5 Mb, and scaffolds longer than 15 Mb. Our method provides a scalable capability for determining the actual diploid genome sequence in a sample, opening the door to new approaches in genomic biology and medicine.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Diploide , Análise de Sequência de DNA/métodos , Genoma Humano , Biblioteca Genômica , Humanos , Microfluídica/métodos , Software
8.
Genome Res ; 27(5): 849-864, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-28396521

RESUMO

The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Genoma Humano , Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Mapeamento de Sequências Contíguas/normas , Genômica/normas , Haploidia , Haplótipos , Humanos , Polimorfismo Genético , Padrões de Referência , Análise de Sequência de DNA/normas
9.
Nucleic Acids Res ; 44(D1): D73-80, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26578580

RESUMO

The NCBI Assembly database (www.ncbi.nlm.nih.gov/assembly/) provides stable accessioning and data tracking for genome assembly data. The model underlying the database can accommodate a range of assembly structures, including sets of unordered contig or scaffold sequences, bacterial genomes consisting of a single complete chromosome, or complex structures such as a human genome with modeled allelic variation. The database provides an assembly accession and version to unambiguously identify the set of sequences that make up a particular version of an assembly, and tracks changes to updated genome assemblies. The Assembly database reports metadata such as assembly names, simple statistical reports of the assembly (number of contigs and scaffolds, contiguity metrics such as contig N50, total sequence length and total gap length) as well as the assembly update history. The Assembly database also tracks the relationship between an assembly submitted to the International Nucleotide Sequence Database Consortium (INSDC) and the assembly represented in the NCBI RefSeq project. Users can find assemblies of interest by querying the Assembly Resource directly or by browsing available assemblies for a particular organism. Links in the Assembly Resource allow users to easily download sequence and annotations for current versions of genome assemblies from the NCBI genomes FTP site.


Assuntos
Bases de Dados de Ácidos Nucleicos , Genômica , Animais , Genoma , Humanos , Internet , Camundongos
10.
Genome Res ; 24(12): 2066-76, 2014 12.
Artigo em Inglês | MEDLINE | ID: mdl-25373144

RESUMO

A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly.


Assuntos
Genoma Humano , Haplótipos , Mola Hidatiforme/genética , Alelos , Mapeamento Cromossômico , Cromossomos Artificiais Bacterianos , Biologia Computacional/métodos , Feminino , Genômica/métodos , Heterozigoto , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Polimorfismo de Nucleotídeo Único , Gravidez , Sequências Repetitivas de Ácido Nucleico , Duplicações Segmentares Genômicas , Análise de Sequência de DNA
12.
Nucleic Acids Res ; 42(Database issue): D980-5, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24234437

RESUMO

ClinVar (http://www.ncbi.nlm.nih.gov/clinvar/) provides a freely available archive of reports of relationships among medically important variants and phenotypes. ClinVar accessions submissions reporting human variation, interpretations of the relationship of that variation to human health and the evidence supporting each interpretation. The database is tightly coupled with dbSNP and dbVar, which maintain information about the location of variation on human assemblies. ClinVar is also based on the phenotypic descriptions maintained in MedGen (http://www.ncbi.nlm.nih.gov/medgen). Each ClinVar record represents the submitter, the variation and the phenotype, i.e. the unit that is assigned an accession of the format SCV000000000.0. The submitter can update the submission at any time, in which case a new version is assigned. To facilitate evaluation of the medical importance of each variant, ClinVar aggregates submissions with the same variation/phenotype combination, adds value from other NCBI databases, assigns a distinct accession of the format RCV000000000.0 and reports if there are conflicting clinical interpretations. Data in ClinVar are available in multiple formats, including html, download as XML, VCF or tab-delimited subsets. Data from ClinVar are provided as annotation tracks on genomic RefSeqs and are used in tools such as Variation Reporter (http://www.ncbi.nlm.nih.gov/variation/tools/reporter), which reports what is known about variation based on user-supplied locations.


Assuntos
Bases de Dados Genéticas , Variação Genética , Fenótipo , Genoma Humano , Genômica , Humanos , Internet
13.
Nucleic Acids Res ; 41(Database issue): D1070-8, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23193260

RESUMO

The National Center for Biotechnology Information (NCBI) Clone DB (http://www.ncbi.nlm.nih.gov/clone/) is an integrated resource providing information about and facilitating access to clones, which serve as valuable research reagents in many fields, including genome sequencing and variation analysis. Clone DB represents an expansion and replacement of the former NCBI Clone Registry and has records for genomic and cell-based libraries and clones representing more than 100 different eukaryotic taxa. Records provide details of library construction, associated sequences, map positions and information about resource distribution. Clone DB is indexed in the NCBI Entrez system and can be queried by fields that include organism, clone name, gene name and sequence identifier. Whenever possible, genomic clones are mapped to reference assemblies and their map positions provided in clone records. Clones mapping to specific genomic regions can also be searched for using the NCBI Clone Finder tool, which accepts queries based on sequence coordinates or features such as gene or transcript names. Clone DB makes reports of library, clone and placement data on its FTP site available for download. With Clone DB, users now have available to them a centralized resource that provides them with the tools they will need to make use of these important research reagents.


Assuntos
Clonagem Molecular , Bases de Dados de Ácidos Nucleicos , Biblioteca Gênica , Animais , Mapeamento Cromossômico , Humanos , Internet , Camundongos , Análise de Sequência de DNA , Integração de Sistemas
14.
Nucleic Acids Res ; 41(Database issue): D936-41, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23193291

RESUMO

Much has changed in the last two years at DGVa (http://www.ebi.ac.uk/dgva) and dbVar (http://www.ncbi.nlm.nih.gov/dbvar). We are now processing direct submissions rather than only curating data from the literature and our joint study catalog includes data from over 100 studies in 11 organisms. Studies from human dominate with data from control and case populations, tumor samples as well as three large curated studies derived from multiple sources. During the processing of these data, we have made improvements to our data model, submission process and data representation. Additionally, we have made significant improvements in providing access to these data via web and FTP interfaces.


Assuntos
Bases de Dados de Ácidos Nucleicos , Variação Estrutural do Genoma , Genótipo , Humanos , Internet , Fenótipo
15.
Nucleic Acids Res ; 40(Database issue): D13-25, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22140104

RESUMO

In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Website. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Probe, Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.


Assuntos
Bases de Dados como Assunto , Bases de Dados Genéticas , Bases de Dados de Proteínas , Expressão Gênica , Genômica , Internet , Modelos Moleculares , National Library of Medicine (U.S.) , Publicações Periódicas como Assunto , PubMed , Alinhamento de Sequência , Análise de Sequência de DNA , Análise de Sequência de Proteína , Análise de Sequência de RNA , Bibliotecas de Moléculas Pequenas , Estados Unidos
16.
Am J Hum Genet ; 86(5): 749-64, 2010 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-20466091

RESUMO

Chromosomal microarray (CMA) is increasingly utilized for genetic testing of individuals with unexplained developmental delay/intellectual disability (DD/ID), autism spectrum disorders (ASD), or multiple congenital anomalies (MCA). Performing CMA and G-banded karyotyping on every patient substantially increases the total cost of genetic testing. The International Standard Cytogenomic Array (ISCA) Consortium held two international workshops and conducted a literature review of 33 studies, including 21,698 patients tested by CMA. We provide an evidence-based summary of clinical cytogenetic testing comparing CMA to G-banded karyotyping with respect to technical advantages and limitations, diagnostic yield for various types of chromosomal aberrations, and issues that affect test interpretation. CMA offers a much higher diagnostic yield (15%-20%) for genetic testing of individuals with unexplained DD/ID, ASD, or MCA than a G-banded karyotype ( approximately 3%, excluding Down syndrome and other recognizable chromosomal syndromes), primarily because of its higher sensitivity for submicroscopic deletions and duplications. Truly balanced rearrangements and low-level mosaicism are generally not detectable by arrays, but these are relatively infrequent causes of abnormal phenotypes in this population (<1%). Available evidence strongly supports the use of CMA in place of G-banded karyotyping as the first-tier cytogenetic diagnostic test for patients with DD/ID, ASD, or MCA. G-banded karyotype analysis should be reserved for patients with obvious chromosomal syndromes (e.g., Down syndrome), a family history of chromosomal rearrangement, or a history of multiple miscarriages.


Assuntos
Transtornos Cromossômicos/genética , Anormalidades Congênitas/genética , Deficiências do Desenvolvimento/genética , Criança , Bandeamento Cromossômico , Humanos , Cariotipagem
17.
Nucleic Acids Res ; 39(Database issue): D38-51, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21097890

RESUMO

In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Electronic PCR, OrfFinder, Splign, ProSplign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), IBIS, Biosystems, Peptidome, OMSSA, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.


Assuntos
Bases de Dados Genéticas , Bases de Dados de Proteínas , Expressão Gênica , Genômica , National Library of Medicine (U.S.) , Estrutura Terciária de Proteína , PubMed , Alinhamento de Sequência , Análise de Sequência de DNA , Análise de Sequência de RNA , Software , Integração de Sistemas , Estados Unidos
18.
PLoS Biol ; 7(5): e1000112, 2009 May 05.
Artigo em Inglês | MEDLINE | ID: mdl-19468303

RESUMO

The mouse (Mus musculus) is the premier animal model for understanding human disease and development. Here we show that a comprehensive understanding of mouse biology is only possible with the availability of a finished, high-quality genome assembly. The finished clone-based assembly of the mouse strain C57BL/6J reported here has over 175,000 fewer gaps and over 139 Mb more of novel sequence, compared with the earlier MGSCv3 draft genome assembly. In a comprehensive analysis of this revised genome sequence, we are now able to define 20,210 protein-coding genes, over a thousand more than predicted in the human genome (19,042 genes). In addition, we identified 439 long, non-protein-coding RNAs with evidence for transcribed orthologs in human. We analyzed the complex and repetitive landscape of 267 Mb of sequence that was missing or misassembled in the previously published assembly, and we provide insights into the reasons for its resistance to sequencing and assembly by whole-genome shotgun approaches. Duplicated regions within newly assembled sequence tend to be of more recent ancestry than duplicates in the published draft, correcting our initial understanding of recent evolution on the mouse lineage. These duplicates appear to be largely composed of sequence regions containing transposable elements and duplicated protein-coding genes; of these, some may be fixed in the mouse population, but at least 40% of segmentally duplicated sequences are copy number variable even among laboratory mouse strains. Mouse lineage-specific regions contain 3,767 genes drawn mainly from rapidly-changing gene families associated with reproductive functions. The finished mouse genome assembly, therefore, greatly improves our understanding of rodent-specific biology and allows the delineation of ancestral biological functions that are shared with human from derived functions that are not.


Assuntos
Biologia Computacional/métodos , Genoma/genética , Animais , Bases de Dados Genéticas , Duplicação Gênica , Genoma/fisiologia , Humanos , Camundongos
19.
Nucleic Acids Res ; 38(Database issue): D5-16, 2010 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19910364

RESUMO

In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, Reference Sequence, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Peptidome, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Bases de Dados de Ácidos Nucleicos , Algoritmos , Animais , Biologia Computacional/tendências , Bases de Dados de Proteínas , Genoma Bacteriano , Genoma Viral , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , National Institutes of Health (U.S.) , National Library of Medicine (U.S.) , Software , Estados Unidos
20.
Science ; 376(6588): 34-35, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35357937

RESUMO

A near-complete sequence outlines a path for a more inclusive reference.


Assuntos
Genoma Humano , Sequência de Bases , Humanos , Análise de Sequência de DNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA