Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 81
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Proc Natl Acad Sci U S A ; 114(44): E9197-E9205, 2017 10 31.
Artículo en Inglés | MEDLINE | ID: mdl-29078285

RESUMEN

Nucleosomes represent the basic building block of chromatin and provide an important mechanism by which cellular processes are controlled. The locations of nucleosomes across the genome are not random but instead depend on both the underlying DNA sequence and the dynamic action of other proteins within the nucleus. These processes are central to cellular function, and the molecular details of the interplay between DNA sequence and nucleosome dynamics remain poorly understood. In this work, we investigate this interplay in detail by relying on a molecular model, which permits development of a comprehensive picture of the underlying free energy surfaces and the corresponding dynamics of nucleosome repositioning. The mechanism of nucleosome repositioning is shown to be strongly linked to DNA sequence and directly related to the binding energy of a given DNA sequence to the histone core. It is also demonstrated that chromatin remodelers can override DNA-sequence preferences by exerting torque, and the histone H4 tail is then identified as a key component by which DNA-sequence, histone modifications, and chromatin remodelers could in fact be coupled.


Asunto(s)
Silenciador del Gen/fisiología , Nucleosomas/genética , Cromatina/genética , Ensamble y Desensamble de Cromatina/genética , Simulación por Computador , ADN/genética , Genoma/genética , Histonas/genética , Modelos Moleculares
2.
Proc Natl Acad Sci U S A ; 114(51): 13400-13405, 2017 12 19.
Artículo en Inglés | MEDLINE | ID: mdl-29203667

RESUMEN

Very large DNA molecules enable comprehensive analysis of complex genomes, such as human, cancer, and plants because they span across sequence repeats and complex somatic events. When physically manipulated, or analyzed as single molecules, long polyelectrolytes are problematic because of mechanical considerations that include shear-mediated breakage, dealing with the massive size of these coils, or the length of stretched DNAs using common experimental techniques and fluidic devices. Accordingly, we harness analyte "issues" as exploitable advantages by our invention and characterization of the "molecular gate," which controls and synchronizes formation of stretched DNA molecules as DNA dumbbells within nanoslit geometries. Molecular gate geometries comprise micro- and nanoscale features designed to synergize very low ionic strength conditions in ways we show effectively create an "electrostatic bottle." This effect greatly enhances molecular confinement within large slit geometries and supports facile, synchronized electrokinetic loading of nanoslits, even without dumbbell formation. Device geometries were considered at the molecular and continuum scales through computer simulations, which also guided our efforts to optimize design and functionalities. In addition, we show that the molecular gate may govern DNA separations because DNA molecules can be electrokinetically triggered, by varying applied voltage, to enter slits in a size-dependent manner. Lastly, mapping the Mesoplasmaflorum genome, via synchronized dumbbell formation, validates our nascent approach as a viable starting point for advanced development that will build an integrated system capable of large-scale genome analysis.


Asunto(s)
ADN/química , Genómica/métodos , Microfluídica/métodos , Imagen Individual de Molécula/métodos , Entomoplasmataceae/genética , Genómica/instrumentación , Microfluídica/instrumentación , Imagen Individual de Molécula/instrumentación , Electricidad Estática
3.
Biophys J ; 117(11): 2047-2053, 2019 12 03.
Artículo en Inglés | MEDLINE | ID: mdl-31409480

RESUMEN

It is now rare to find biological, or genetic investigations that do not rely on the tools, data, and thinking drawn from the genomic sciences. Much of this revolution is powered by contemporary sequencing approaches that readily deliver large, genome-wide data sets that not only provide genetic insights but also uniquely report molecular outcomes from experiments that biophysicists are increasingly using for potentiating structural and mechanistic investigations. In this perspective, I describe a path of how biophysical thinking greatly contributed to this revolution in ways that parallel advancements in computer science through discussion of several key inventions, described as "foundational devices." These discussions also point at the future of how biophysics and the genomic sciences may become more finely integrated for empowering new measurement paradigms for biological investigations.


Asunto(s)
Biofisica , Genómica , Dispositivos Laboratorio en un Chip
4.
Nucleic Acids Res ; 44(1): e6, 2016 Jan 08.
Artículo en Inglés | MEDLINE | ID: mdl-26264666

RESUMEN

Fluorescent proteins that also bind DNA molecules are useful reagents for a broad range of biological applications because they can be optically localized and tracked within cells, or provide versatile labels for in vitro experiments. We report a novel design for a fluorescent, DNA-binding protein (FP-DBP) that completely 'paints' entire DNA molecules, whereby sequence-independent DNA binding is accomplished by linking a fluorescent protein to two small peptides (KWKWKKA) using lysine for binding to the DNA phosphates, and tryptophan for intercalating between DNA bases. Importantly, this ubiquitous binding motif enables fluorescent proteins (Kd = 14.7 µM) to confluently stain DNA molecules and such binding is reversible via pH shifts. These proteins offer useful robust advantages for single DNA molecule studies: lack of fluorophore mediated photocleavage and staining that does not perturb polymer contour lengths. Accordingly, we demonstrate confluent staining of naked DNA molecules presented within microfluidic devices, or localized within live bacterial cells.


Asunto(s)
Proteínas de Unión al ADN/metabolismo , ADN/metabolismo , Proteínas Luminiscentes/metabolismo , Imagen Molecular , Proteínas Recombinantes de Fusión , Proteínas de Unión al ADN/genética , Escherichia coli/genética , Escherichia coli/metabolismo , Humanos , Proteínas Luminiscentes/genética , Microscopía Fluorescente , Imagen Molecular/métodos
5.
PLoS Genet ; 11(12): e1005686, 2015 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-26641089

RESUMEN

Many loci in the human genome harbor complex genomic structures that can result in susceptibility to genomic rearrangements leading to various genomic disorders. Nephronophthisis 1 (NPHP1, MIM# 256100) is an autosomal recessive disorder that can be caused by defects of NPHP1; the gene maps within the human 2q13 region where low copy repeats (LCRs) are abundant. Loss of function of NPHP1 is responsible for approximately 85% of the NPHP1 cases-about 80% of such individuals carry a large recurrent homozygous NPHP1 deletion that occurs via nonallelic homologous recombination (NAHR) between two flanking directly oriented ~45 kb LCRs. Published data revealed a non-pathogenic inversion polymorphism involving the NPHP1 gene flanked by two inverted ~358 kb LCRs. Using optical mapping and array-comparative genomic hybridization, we identified three potential novel structural variant (SV) haplotypes at the NPHP1 locus that may protect a haploid genome from the NPHP1 deletion. Inter-species comparative genomic analyses among primate genomes revealed massive genomic changes during evolution. The aggregated data suggest that dynamic genomic rearrangements occurred historically within the NPHP1 locus and generated SV haplotypes observed in the human population today, which may confer differential susceptibility to genomic instability and the NPHP1 deletion within a personal genome. Our study documents diverse SV haplotypes at a complex LCR-laden human genomic region. Comparative analyses provide a model for how this complex region arose during primate evolution, and studies among humans suggest that intra-species polymorphism may potentially modulate an individual's susceptibility to acquiring disease-associated alleles.


Asunto(s)
Proteínas Adaptadoras Transductoras de Señales/genética , Evolución Molecular , Genoma Humano , Enfermedades Renales Quísticas/congénito , Proteínas de la Membrana/genética , Alelos , Animales , Hibridación Genómica Comparativa , Proteínas del Citoesqueleto , Dosificación de Gen , Reordenamiento Génico , Variación Estructural del Genoma , Haplotipos , Humanos , Enfermedades Renales Quísticas/genética , Enfermedades Renales Quísticas/patología , Primates
6.
Proc Natl Acad Sci U S A ; 112(25): 7689-94, 2015 Jun 23.
Artículo en Inglés | MEDLINE | ID: mdl-26056298

RESUMEN

Multiple myeloma (MM), a malignancy of plasma cells, is characterized by widespread genomic heterogeneity and, consequently, differences in disease progression and drug response. Although recent large-scale sequencing studies have greatly improved our understanding of MM genomes, our knowledge about genomic structural variation in MM is attenuated due to the limitations of commonly used sequencing approaches. In this study, we present the application of optical mapping, a single-molecule, whole-genome analysis system, to discover new structural variants in a primary MM genome. Through our analysis, we have identified and characterized widespread structural variation in this tumor genome. Additionally, we describe our efforts toward comprehensive characterization of genome structure and variation by integrating our findings from optical mapping with those from DNA sequencing-based genomic analysis. Finally, by studying this MM genome at two time points during tumor progression, we have demonstrated an increase in mutational burden with tumor progression at all length scales of variation.


Asunto(s)
Variaciones en el Número de Copia de ADN , Mieloma Múltiple/genética , ADN/genética , Humanos , Pérdida de Heterocigocidad , Mieloma Múltiple/patología , Polimorfismo de Nucleótido Simple
7.
BMC Genomics ; 18(1): 667, 2017 Aug 29.
Artículo en Inglés | MEDLINE | ID: mdl-28851275

RESUMEN

BACKGROUND: The ascomycete fungus Colletotrichum higginsianum causes anthracnose disease of brassica crops and the model plant Arabidopsis thaliana. Previous versions of the genome sequence were highly fragmented, causing errors in the prediction of protein-coding genes and preventing the analysis of repetitive sequences and genome architecture. RESULTS: Here, we re-sequenced the genome using single-molecule real-time (SMRT) sequencing technology and, in combination with optical map data, this provided a gapless assembly of all twelve chromosomes except for the ribosomal DNA repeat cluster on chromosome 7. The more accurate gene annotation made possible by this new assembly revealed a large repertoire of secondary metabolism (SM) key genes (89) and putative biosynthetic pathways (77 SM gene clusters). The two mini-chromosomes differed from the ten core chromosomes in being repeat- and AT-rich and gene-poor but were significantly enriched with genes encoding putative secreted effector proteins. Transposable elements (TEs) were found to occupy 7% of the genome by length. Certain TE families showed a statistically significant association with effector genes and SM cluster genes and were transcriptionally active at particular stages of fungal development. All 24 subtelomeres were found to contain one of three highly-conserved repeat elements which, by providing sites for homologous recombination, were probably instrumental in four segmental duplications. CONCLUSION: The gapless genome of C. higginsianum provides access to repeat-rich regions that were previously poorly assembled, notably the mini-chromosomes and subtelomeres, and allowed prediction of the complete SM gene repertoire. It also provides insights into the potential role of TEs in gene and genome evolution and host adaptation in this asexual pathogen.


Asunto(s)
Cromosomas Fúngicos/genética , Colletotrichum/genética , Colletotrichum/metabolismo , Elementos Transponibles de ADN/genética , Genómica , Familia de Multigenes/genética , Recombinación Homóloga/genética , Anotación de Secuencia Molecular , Filogenia , Mutación Puntual/genética
8.
Bioinformatics ; 32(7): 1016-22, 2016 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-26637292

RESUMEN

MOTIVATION: The Optical Mapping System discovers structural variants and potentiates sequence assembly of genomes via scaffolding and comparisons that globally validate or correct sequence assemblies. Despite its utility, there are few publicly available tools for aligning optical mapping datasets. RESULTS: Here we present software, named 'Maligner', for the alignment of both single molecule restriction maps (Rmaps) and in silico restriction maps of sequence contigs to a reference. Maligner provides two modes of alignment: an efficient, sensitive dynamic programming implementation that scales to large eukaryotic genomes, and a faster indexed based implementation for finding alignments with unmatched sites in the reference but not the query. We compare our software to other publicly available tools on Rmap datasets and show that Maligner finds more correct alignments in comparable runtime. Lastly, we introduce the M-Score statistic for normalizing alignment scores across restriction maps and demonstrate its utility for selecting high quality alignments. AVAILABILITY AND IMPLEMENTATION: The Maligner software is written in C ++ and is available at https://github.com/LeeMendelowitz/maligner under the GNU General Public License. CONTACT: mpop@umiacs.umd.edu.


Asunto(s)
Algoritmos , Simulación por Computador , Genoma , Mapeo Restrictivo , Alineación de Secuencia , Análisis de Secuencia de ADN , Programas Informáticos
9.
BMC Genomics ; 16: 644, 2015 Aug 28.
Artículo en Inglés | MEDLINE | ID: mdl-26314885

RESUMEN

BACKGROUND: The cattle (Bos taurus) genome was originally selected for sequencing due to its economic importance and unique biology as a model organism for understanding other ruminants, or mammals. Currently, there are two cattle genome sequence assemblies (UMD3.1 and Btau4.6) from groups using dissimilar assembly algorithms, which were complemented by genetic and physical map resources. However, past comparisons between these assemblies revealed substantial differences. Consequently, such discordances have engendered ambiguities when using reference sequence data, impacting genomic studies in cattle and motivating construction of a new optical map resource--BtOM1.0--to guide comparisons and improvements to the current sequence builds. Accordingly, our comprehensive comparisons of BtOM1.0 against the UMD3.1 and Btau4.6 sequence builds tabulate large-to-immediate scale discordances requiring mediation. RESULTS: The optical map, BtOM1.0, spanning the B. taurus genome (Hereford breed, L1 Dominette 01449) was assembled from an optical map dataset consisting of 2,973,315 (439 X; raw dataset size before assembly) single molecule optical maps (Rmaps; 1 Rmap = 1 restriction mapped DNA molecule) generated by the Optical Mapping System. The BamHI map spans 2,575.30 Mb and comprises 78 optical contigs assembled by a combination of iterative (using the reference sequence: UMD3.1) and de novo assembly techniques. BtOM1.0 is a high-resolution physical map featuring an average restriction fragment size of 8.91 Kb. Comparisons of BtOM1.0 vs. UMD3.1, or Btau4.6, revealed that Btau4.6 presented far more discordances (7,463) vs. UMD3.1 (4,754). Overall, we found that Btau4.6 presented almost double the number of discordances than UMD3.1 across most of the 6 categories of sequence vs. map discrepancies, which are: COMPLEX (misassembly), DELs (extraneous sequences), INSs (missing sequences), ITs (Inverted/Translocated sequences), ECs (extra restriction cuts) and MCs (missing restriction cuts). CONCLUSION: Alignments of UMD3.1 and Btau4.6 to BtOM1.0 reveal discordances commensurate with previous reports, and affirm the NCBI's current designation of UMD3.1 sequence assembly as the "reference assembly" and the Btau4.6 as the "alternate assembly." The cattle genome optical map, BtOM1.0, when used as a comprehensive and largely independent guide, will greatly assist improvements to existing sequence builds, and later serve as an accurate physical scaffold for studies concerning the comparative genomics of cattle breeds.


Asunto(s)
Mapeo Cromosómico , Genoma , Genómica , Animales , Bovinos , Mapeo Cromosómico/métodos , Biología Computacional/métodos , Conjuntos de Datos como Asunto , Orden Génico , Genómica/métodos
10.
BMC Genomics ; 15: 312, 2014 Apr 27.
Artículo en Inglés | MEDLINE | ID: mdl-24767513

RESUMEN

BACKGROUND: Medicago truncatula, a close relative of alfalfa, is a preeminent model for studying nitrogen fixation, symbiosis, and legume genomics. The Medicago sequencing project began in 2003 with the goal to decipher sequences originated from the euchromatic portion of the genome. The initial sequencing approach was based on a BAC tiling path, culminating in a BAC-based assembly (Mt3.5) as well as an in-depth analysis of the genome published in 2011. RESULTS: Here we describe a further improved and refined version of the M. truncatula genome (Mt4.0) based on de novo whole genome shotgun assembly of a majority of Illumina and 454 reads using ALLPATHS-LG. The ALLPATHS-LG scaffolds were anchored onto the pseudomolecules on the basis of alignments to both the optical map and the genotyping-by-sequencing (GBS) map. The Mt4.0 pseudomolecules encompass ~360 Mb of actual sequences spanning 390 Mb of which ~330 Mb align perfectly with the optical map, presenting a drastic improvement over the BAC-based Mt3.5 which only contained 70% sequences (~250 Mb) of the current version. Most of the sequences and genes that previously resided on the unanchored portion of Mt3.5 have now been incorporated into the Mt4.0 pseudomolecules, with the exception of ~28 Mb of unplaced sequences. With regard to gene annotation, the genome has been re-annotated through our gene prediction pipeline, which integrates EST, RNA-seq, protein and gene prediction evidences. A total of 50,894 genes (31,661 high confidence and 19,233 low confidence) are included in Mt4.0 which overlapped with ~82% of the gene loci annotated in Mt3.5. Of the remaining genes, 14% of the Mt3.5 genes have been deprecated to an "unsupported" status and 4% are absent from the Mt4.0 predictions. CONCLUSIONS: Mt4.0 and its associated resources, such as genome browsers, BLAST-able datasets and gene information pages, can be found on the JCVI Medicago web site (http://www.jcvi.org/medicago). The assembly and annotation has been deposited in GenBank (BioProject: PRJNA10791). The heavily curated chromosomal sequences and associated gene models of Medicago will serve as a better reference for legume biology and comparative genomics.


Asunto(s)
Genoma de Planta , Medicago truncatula/genética , Cromosomas Artificiales Bacterianos
11.
Biotechniques ; 2024 Apr 24.
Artículo en Inglés | MEDLINE | ID: mdl-38655877

RESUMEN

Large DNA molecules (>20 kb) are difficult analytes prone to breakage during serial manipulations and cannot be 'rescued' as full-length amplicons. Accordingly, to present, modify and analyze arrays of large, single DNA molecules, we created an easily realizable approach offering gentle confinement conditions or immobilization via spermidine condensation for controlled delivery of reagents that support live imaging by epifluorescence microscopy termed 'Gel-Stacks.' Molecules are locally confined between two hydrogel surfaces without covalent tethering to support time-lapse imaging and multistep workflows that accommodate large DNA molecules. With a thin polyacrylamide gel layer covalently bound to a glass surface as the base and swappable, reagent-infused, agarose slabs on top, DNA molecules are stably presented for imaging during reagent delivery by passive diffusion.

12.
BMC Genomics ; 14: 505, 2013 Jul 26.
Artículo en Inglés | MEDLINE | ID: mdl-23885787

RESUMEN

BACKGROUND: Solid tumors present a panoply of genomic alterations, from single base changes to the gain or loss of entire chromosomes. Although aberrations at the two extremes of this spectrum are readily defined, comprehensive discernment of the complex and disperse mutational spectrum of cancer genomes remains a significant challenge for current genome analysis platforms. In this context, high throughput, single molecule platforms like Optical Mapping offer a unique perspective. RESULTS: Using measurements from large ensembles of individual DNA molecules, we have discovered genomic structural alterations in the solid tumor oligodendroglioma. Over a thousand structural variants were identified in each tumor sample, without any prior hypotheses, and often in genomic regions deemed intractable by other technologies. These findings were then validated by comprehensive comparisons to variants reported in external and internal databases, and by selected experimental corroborations. Alterations range in size from under 5 kb to hundreds of kilobases, and comprise insertions, deletions, inversions and compound events. Candidate mutations were scored at sub-genic resolution and unambiguously reveal structural details at aberrant loci. CONCLUSIONS: The Optical Mapping system provides a rich description of the complex genomes of solid tumors, including sequence level aberrations, structural alterations and copy number variants that power generation of functional hypotheses for oligodendroglioma genetics.


Asunto(s)
Genómica/métodos , Oligodendroglioma/genética , Mapeo Físico de Cromosoma/métodos , Adulto , Anciano , Secuencia de Bases , Cromosomas Humanos Par 1/genética , Cromosomas Humanos Par 19/genética , Variaciones en el Número de Copia de ADN/genética , Femenino , Humanos , Mutación , Reacción en Cadena de la Polimerasa , Polimorfismo de Nucleótido Simple/genética , Reproducibilidad de los Resultados
13.
BMC Genomics ; 14: 257, 2013 Apr 16.
Artículo en Inglés | MEDLINE | ID: mdl-23590730

RESUMEN

BACKGROUND: Paired-tag sequencing approaches are commonly used for the analysis of genome structure. However, mammalian genomes have a complex organization with a variety of repetitive elements that complicate comprehensive genome-wide analyses. RESULTS: Here, we systematically assessed the utility of paired-end and mate-pair (MP) next-generation sequencing libraries with insert sizes ranging from 170 bp to 25 kb, for genome coverage and for improving scaffolding of a mammalian genome (Rattus norvegicus). Despite a lower library complexity, large insert MP libraries (20 or 25 kb) provided very high physical genome coverage and were found to efficiently span repeat elements in the genome. Medium-sized (5, 8 or 15 kb) MP libraries were much more efficient for genome structure analysis than the more commonly used shorter insert paired-end and 3 kb MP libraries. Furthermore, the combination of medium- and large insert libraries resulted in a 3-fold increase in N50 in scaffolding processes. Finally, we show that our data can be used to evaluate and improve contig order and orientation in the current rat reference genome assembly. CONCLUSIONS: We conclude that applying combinations of mate-pair libraries with insert sizes that match the distributions of repetitive elements improves contig scaffolding and can contribute to the finishing of draft genomes.


Asunto(s)
Biblioteca de Genes , Genoma , Ratas/genética , Análisis de Secuencia de ADN/métodos , Animales , Secuencia de Bases , Mapeo Contig/métodos , Secuencias Repetitivas Esparcidas/genética
14.
Proc Natl Acad Sci U S A ; 107(24): 10848-53, 2010 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-20534489

RESUMEN

Variation in genome structure is an important source of human genetic polymorphism: It affects a large proportion of the genome and has a variety of phenotypic consequences relevant to health and disease. In spite of this, human genome structure variation is incompletely characterized due to a lack of approaches for discovering a broad range of structural variants in a global, comprehensive fashion. We addressed this gap with Optical Mapping, a high-throughput, high-resolution single-molecule system for studying genome structure. We used Optical Mapping to create genome-wide restriction maps of a complete hydatidiform mole and three lymphoblast-derived cell lines, and we validated the approach by demonstrating a strong concordance with existing methods. We also describe thousands of new variants with sizes ranging from kb to Mb.


Asunto(s)
Genoma Humano , Mapeo de Restricción Óptica/métodos , Algoritmos , Línea Celular , Línea Celular Tumoral , Femenino , Variación Genética , Estudio de Asociación del Genoma Completo , Humanos , Mola Hidatiforme/genética , Linfocitos/metabolismo , Mapeo de Restricción Óptica/estadística & datos numéricos , Embarazo , Neoplasias Uterinas/genética
15.
BMC Med Inform Decis Mak ; 13: 60, 2013 May 25.
Artículo en Inglés | MEDLINE | ID: mdl-23705639

RESUMEN

BACKGROUND: Falls among the elderly are a major public health concern. Therefore, the possibility of a modeling technique which could better estimate fall probability is both timely and needed. Using biomedical, pharmacological and demographic variables as predictors, latent class analysis (LCA) is demonstrated as a tool for the prediction of falls among community dwelling elderly. METHODS: Using a retrospective data-set a two-step LCA modeling approach was employed. First, we looked for the optimal number of latent classes for the seven medical indicators, along with the patients' prescription medication and three covariates (age, gender, and number of medications). Second, the appropriate latent class structure, with the covariates, were modeled on the distal outcome (fall/no fall). The default estimator was maximum likelihood with robust standard errors. The Pearson chi-square, likelihood ratio chi-square, BIC, Lo-Mendell-Rubin Adjusted Likelihood Ratio test and the bootstrap likelihood ratio test were used for model comparisons. RESULTS: A review of the model fit indices with covariates shows that a six-class solution was preferred. The predictive probability for latent classes ranged from 84% to 97%. Entropy, a measure of classification accuracy, was good at 90%. Specific prescription medications were found to strongly influence group membership. CONCLUSIONS: In conclusion the LCA method was effective at finding relevant subgroups within a heterogenous at-risk population for falling. This study demonstrated that LCA offers researchers a valuable tool to model medical data.


Asunto(s)
Accidentes por Caídas , Modelos Teóricos , Medicamentos bajo Prescripción/efectos adversos , Anciano , Anciano de 80 o más Años , Prescripciones de Medicamentos/estadística & datos numéricos , Femenino , Humanos , Masculino , Características de la Residencia , Estudios Retrospectivos , Medición de Riesgo
16.
GigaByte ; 2023: gigabyte93, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37753479

RESUMEN

While Bacterial Artificial Chromosomes libraries were once a key resource for the genomic community, they have been obviated, for sequencing purposes, by long-read technologies. Such libraries may now serve as a valuable resource for manipulating and assembling large genomic constructs. To enhance accessibility and comparison, we have developed a BAC restriction map database. Using information from the National Center for Biotechnology Information's cloneDB FTP site, we constructed a database containing the restriction maps for both uniquely placed and insert-sequenced BACs from 11 libraries covering the recognition sequences of the available restriction enzymes. Along with the database, we generated a set of Python functions to reconstruct the database and more easily access the information within. This data is valuable for researchers simply using BACs, as well as those working with larger sections of the genome in terms of synthetic genes, large-scale editing, and mapping.

17.
BMC Bioinformatics ; 13: 189, 2012 Aug 02.
Artículo en Inglés | MEDLINE | ID: mdl-22856673

RESUMEN

BACKGROUND: Genome assembly is difficult due to repeated sequences within the genome, which create ambiguities and cause the final assembly to be broken up into many separate sequences (contigs). Long range linking information, such as mate-pairs or mapping data, is necessary to help assembly software resolve repeats, thereby leading to a more complete reconstruction of genomes. Prior work has used optical maps for validating assemblies and scaffolding contigs, after an initial assembly has been produced. However, optical maps have not previously been used within the genome assembly process. Here, we use optical map information within the popular de Bruijn graph assembly paradigm to eliminate paths in the de Bruijn graph which are not consistent with the optical map and help determine the correct reconstruction of the genome. RESULTS: We developed a new algorithm called AGORA: Assembly Guided by Optical Restriction Alignment. AGORA is the first algorithm to use optical map information directly within the de Bruijn graph framework to help produce an accurate assembly of a genome that is consistent with the optical map information provided. Our simulations on bacterial genomes show that AGORA is effective at producing assemblies closely matching the reference sequences.Additionally, we show that noise in the optical map can have a strong impact on the final assembly quality for some complex genomes, and we also measure how various characteristics of the starting de Bruijn graph may impact the quality of the final assembly. Lastly, we show that a proper choice of restriction enzyme for the optical map may substantially improve the quality of the final assembly. CONCLUSIONS: Our work shows that optical maps can be used effectively to assemble genomes within the de Bruijn graph assembly framework. Our experiments also provide insights into the characteristics of the mapping data that most affect the performance of our algorithm, indicating the potential benefit of more accurate optical mapping technologies, such as nano-coding.


Asunto(s)
Algoritmos , Mapeo Cromosómico/métodos , Genómica/métodos , Biología Computacional/métodos , Simulación por Computador , Genoma Bacteriano , Yersinia pestis/genética
18.
BMC Genomics ; 13: 89, 2012 Mar 12.
Artículo en Inglés | MEDLINE | ID: mdl-22409516

RESUMEN

BACKGROUND: The genome of Mycobacterium avium subspecies paratuberculosis (MAP) is remarkably homogeneous among the genomes of bovine, human and wildlife isolates. However, previous work in our laboratories with the bovine K-10 strain has revealed substantial differences compared to sheep isolates. To systematically characterize all genomic differences that may be associated with the specific hosts, we sequenced the genomes of three U.S. sheep isolates and also obtained an optical map. RESULTS: Our analysis of one of the isolates, MAP S397, revealed a genome 4.8 Mb in size with 4,700 open reading frames (ORFs). Comparative analysis of the MAP S397 isolate showed it acquired approximately 10 large sequence regions that are shared with the human M. avium subsp. hominissuis strain 104 and lost 2 large regions that are present in the bovine strain. In addition, optical mapping defined the presence of 7 large inversions between the bovine and ovine genomes (~ 2.36 Mb). Whole-genome sequencing of 2 additional sheep strains of MAP (JTC1074 and JTC7565) further confirmed genomic homogeneity of the sheep isolates despite the presence of polymorphisms on the nucleotide level. CONCLUSIONS: Comparative sequence analysis employed here provided a better understanding of the host association, evolution of members of the M. avium complex and could help in deciphering the phenotypic differences observed among sheep and cattle strains of MAP. A similar approach based on whole-genome sequencing combined with optical mapping could be employed to examine closely related pathogens. We propose an evolutionary scenario for M. avium complex strains based on these genome sequences.


Asunto(s)
Genoma Bacteriano , Mycobacterium avium subsp. paratuberculosis/genética , Análisis de Secuencia de ADN , Animales , Bovinos , Mapeo Cromosómico , Evolución Molecular , Eliminación de Gen , Orden Génico , Interacciones Huésped-Patógeno , Humanos , Anotación de Secuencia Molecular , Mutagénesis Insercional , Mycobacterium avium subsp. paratuberculosis/aislamiento & purificación , Sistemas de Lectura Abierta , Polimorfismo Genético , Alineación de Secuencia , Ovinos/microbiología
19.
PLoS Biol ; 7(5): e1000112, 2009 May 05.
Artículo en Inglés | MEDLINE | ID: mdl-19468303

RESUMEN

The mouse (Mus musculus) is the premier animal model for understanding human disease and development. Here we show that a comprehensive understanding of mouse biology is only possible with the availability of a finished, high-quality genome assembly. The finished clone-based assembly of the mouse strain C57BL/6J reported here has over 175,000 fewer gaps and over 139 Mb more of novel sequence, compared with the earlier MGSCv3 draft genome assembly. In a comprehensive analysis of this revised genome sequence, we are now able to define 20,210 protein-coding genes, over a thousand more than predicted in the human genome (19,042 genes). In addition, we identified 439 long, non-protein-coding RNAs with evidence for transcribed orthologs in human. We analyzed the complex and repetitive landscape of 267 Mb of sequence that was missing or misassembled in the previously published assembly, and we provide insights into the reasons for its resistance to sequencing and assembly by whole-genome shotgun approaches. Duplicated regions within newly assembled sequence tend to be of more recent ancestry than duplicates in the published draft, correcting our initial understanding of recent evolution on the mouse lineage. These duplicates appear to be largely composed of sequence regions containing transposable elements and duplicated protein-coding genes; of these, some may be fixed in the mouse population, but at least 40% of segmentally duplicated sequences are copy number variable even among laboratory mouse strains. Mouse lineage-specific regions contain 3,767 genes drawn mainly from rapidly-changing gene families associated with reproductive functions. The finished mouse genome assembly, therefore, greatly improves our understanding of rodent-specific biology and allows the delineation of ancestral biological functions that are shared with human from derived functions that are not.


Asunto(s)
Biología Computacional/métodos , Genoma/genética , Animales , Bases de Datos Genéticas , Duplicación de Gen , Genoma/fisiología , Humanos , Ratones
20.
PLoS Genet ; 5(11): e1000711, 2009 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-19936062

RESUMEN

About 85% of the maize genome consists of highly repetitive sequences that are interspersed by low-copy, gene-coding sequences. The maize community has dealt with this genomic complexity by the construction of an integrated genetic and physical map (iMap), but this resource alone was not sufficient for ensuring the quality of the current sequence build. For this purpose, we constructed a genome-wide, high-resolution optical map of the maize inbred line B73 genome containing >91,000 restriction sites (averaging 1 site/ approximately 23 kb) accrued from mapping genomic DNA molecules. Our optical map comprises 66 contigs, averaging 31.88 Mb in size and spanning 91.5% (2,103.93 Mb/ approximately 2,300 Mb) of the maize genome. A new algorithm was created that considered both optical map and unfinished BAC sequence data for placing 60/66 (2,032.42 Mb) optical map contigs onto the maize iMap. The alignment of optical maps against numerous data sources yielded comprehensive results that proved revealing and productive. For example, gaps were uncovered and characterized within the iMap, the FPC (fingerprinted contigs) map, and the chromosome-wide pseudomolecules. Such alignments also suggested amended placements of FPC contigs on the maize genetic map and proactively guided the assembly of chromosome-wide pseudomolecules, especially within complex genomic regions. Lastly, we think that the full integration of B73 optical maps with the maize iMap would greatly facilitate maize sequence finishing efforts that would make it a valuable reference for comparative studies among cereals, or other maize inbred lines and cultivars.


Asunto(s)
Genoma de Planta/genética , Zea mays/genética , Algoritmos , Secuencia de Bases , Cromosomas Artificiales Bacterianos/genética , Mapeo Contig , Datos de Secuencia Molecular , Fenómenos Ópticos , Mapeo Físico de Cromosoma , Alineación de Secuencia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA