Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 19 de 19
1.
J Am Heart Assoc ; 13(5): e031154, 2024 Mar 05.
Article En | MEDLINE | ID: mdl-38420755

BACKGROUND: Identifying novel molecular drivers of disease progression in heart failure (HF) is a high-priority goal that may provide new therapeutic targets to improve patient outcomes. The authors investigated the relationship between plasma proteins and adverse outcomes in HF and their putative causal role using Mendelian randomization. METHODS AND RESULTS: The authors measured 4776 plasma proteins among 1964 participants with HF with a reduced left ventricular ejection fraction enrolled in PHFS (Penn Heart Failure Study). Assessed were the observational relationship between plasma proteins and (1) all-cause death or (2) death or HF-related hospital admission (DHFA). The authors replicated nominally significant associations in the Washington University HF registry (N=1080). Proteins significantly associated with outcomes were the subject of 2-sample Mendelian randomization and colocalization analyses. After correction for multiple testing, 243 and 126 proteins were found to be significantly associated with death and DHFA, respectively. These included small ubiquitin-like modifier 2 (standardized hazard ratio [sHR], 1.56; P<0.0001), growth differentiation factor-15 (sHR, 1.68; P<0.0001) for death, A disintegrin and metalloproteinase with thrombospondin motifs-like protein (sHR, 1.40; P<0.0001), and pulmonary-associated surfactant protein C (sHR, 1.24; P<0.0001) for DHFA. In pathway analyses, top canonical pathways associated with death and DHFA included fibrotic, inflammatory, and coagulation pathways. Genomic analyses provided evidence of nominally significant associations between levels of 6 genetically predicted proteins with DHFA and 11 genetically predicted proteins with death. CONCLUSIONS: This study implicates multiple novel proteins in HF and provides preliminary evidence of associations between genetically predicted plasma levels of 17 candidate proteins and the risk for adverse outcomes in human HF.


Heart Failure , Proteomics , Humans , Blood Proteins , Stroke Volume , Ventricular Function, Left , Mendelian Randomization Analysis
2.
Genome Med ; 15(1): 94, 2023 11 09.
Article En | MEDLINE | ID: mdl-37946251

BACKGROUND: Whole genome sequencing is increasingly being used for the diagnosis of patients with rare diseases. However, the diagnostic yields of many studies, particularly those conducted in a healthcare setting, are often disappointingly low, at 25-30%. This is in part because although entire genomes are sequenced, analysis is often confined to in silico gene panels or coding regions of the genome. METHODS: We undertook WGS on a cohort of 122 unrelated rare disease patients and their relatives (300 genomes) who had been pre-screened by gene panels or arrays. Patients were recruited from a broad spectrum of clinical specialties. We applied a bioinformatics pipeline that would allow comprehensive analysis of all variant types. We combined established bioinformatics tools for phenotypic and genomic analysis with our novel algorithms (SVRare, ALTSPLICE and GREEN-DB) to detect and annotate structural, splice site and non-coding variants. RESULTS: Our diagnostic yield was 43/122 cases (35%), although 47/122 cases (39%) were considered solved when considering novel candidate genes with supporting functional data into account. Structural, splice site and deep intronic variants contributed to 20/47 (43%) of our solved cases. Five genes that are novel, or were novel at the time of discovery, were identified, whilst a further three genes are putative novel disease genes with evidence of causality. We identified variants of uncertain significance in a further fourteen candidate genes. The phenotypic spectrum associated with RMND1 was expanded to include polymicrogyria. Two patients with secondary findings in FBN1 and KCNQ1 were confirmed to have previously unidentified Marfan and long QT syndromes, respectively, and were referred for further clinical interventions. Clinical diagnoses were changed in six patients and treatment adjustments made for eight individuals, which for five patients was considered life-saving. CONCLUSIONS: Genome sequencing is increasingly being considered as a first-line genetic test in routine clinical settings and can make a substantial contribution to rapidly identifying a causal aetiology for many patients, shortening their diagnostic odyssey. We have demonstrated that structural, splice site and intronic variants make a significant contribution to diagnostic yield and that comprehensive analysis of the entire genome is essential to maximise the value of clinical genome sequencing.


Genetic Variation , Rare Diseases , Humans , Rare Diseases/diagnosis , Rare Diseases/genetics , Whole Genome Sequencing , Genetic Testing , Mutation , Cell Cycle Proteins
3.
Nature ; 622(7982): 329-338, 2023 Oct.
Article En | MEDLINE | ID: mdl-37794186

The Pharma Proteomics Project is a precompetitive biopharmaceutical consortium characterizing the plasma proteomic profiles of 54,219 UK Biobank participants. Here we provide a detailed summary of this initiative, including technical and biological validations, insights into proteomic disease signatures, and prediction modelling for various demographic and health indicators. We present comprehensive protein quantitative trait locus (pQTL) mapping of 2,923 proteins that identifies 14,287 primary genetic associations, of which 81% are previously undescribed, alongside ancestry-specific pQTL mapping in non-European individuals. The study provides an updated characterization of the genetic architecture of the plasma proteome, contextualized with projected pQTL discovery rates as sample sizes and proteomic assay coverages increase over time. We offer extensive insights into trans pQTLs across multiple biological domains, highlight genetic influences on ligand-receptor interactions and pathway perturbations across a diverse collection of cytokines and complement networks, and illustrate long-range epistatic effects of ABO blood group and FUT2 secretor status on proteins with gastrointestinal tissue-enriched expression. We demonstrate the utility of these data for drug discovery by extending the genetic proxied effects of protein targets, such as PCSK9, on additional endpoints, and disentangle specific genes and proteins perturbed at loci associated with COVID-19 susceptibility. This public-private partnership provides the scientific community with an open-access proteomics resource of considerable breadth and depth to help to elucidate the biological mechanisms underlying proteo-genomic discoveries and accelerate the development of biomarkers, predictive models and therapeutics1.


Biological Specimen Banks , Blood Proteins , Databases, Factual , Genomics , Health , Proteome , Proteomics , Humans , ABO Blood-Group System/genetics , Blood Proteins/analysis , Blood Proteins/genetics , COVID-19/genetics , Drug Discovery , Epistasis, Genetic , Fucosyltransferases/metabolism , Genetic Predisposition to Disease , Plasma/chemistry , Proprotein Convertase 9/metabolism , Proteome/analysis , Proteome/genetics , Public-Private Sector Partnerships , Quantitative Trait Loci , United Kingdom , Galactoside 2-alpha-L-fucosyltransferase
4.
Circ Heart Fail ; 15(9): e009693, 2022 09.
Article En | MEDLINE | ID: mdl-36126144

BACKGROUND: The TOPCAT trial (Treatment of Preserved Cardiac Function Heart Failure With an Aldosterone Antagonist Trial) suggested clinical benefits of spironolactone treatment among patients with heart failure with preserved ejection fraction enrolled in the Americas. However, a comprehensive assessment of biologic pathways impacted by spironolactone therapy in heart failure with preserved ejection fraction has not been performed. METHODS: We conducted aptamer-based proteomic analysis utilizing 5284 modified aptamers to 4928 unique proteins on plasma samples from TOPCAT participants from the Americas (n=164 subjects with paired samples at baseline and 1 year) to identify proteins and pathways impacted by spironolactone therapy in heart failure with preserved ejection fraction. Mean percentage change from baseline was calculated for each protein. Additionally, we conducted pathway analysis of proteins altered by spironolactone. RESULTS: Spironolactone therapy was associated with proteome-wide significant changes in 7 proteins. Among these, CARD18 (caspase recruitment domain-containing protein 18), PKD2 (polycystin 2), and PSG2 (pregnancy-specific glycoprotein 2) were upregulated, whereas HGF (hepatic growth factor), PLTP (phospholipid transfer protein), IGF2R (insulin growth factor 2 receptor), and SWP70 (switch-associated protein 70) were downregulated. CARD18, a caspase-1 inhibitor, was the most upregulated protein by spironolactone (-0.5% with placebo versus +66.5% with spironolactone, P<0.0001). The top canonical pathways that were significantly associated with spironolactone were apelin signaling, stellate cell activation, glycoprotein 6 signaling, atherosclerosis signaling, liver X receptor activation, and farnesoid X receptor activation. Among the top pathways, collagens were a consistent theme that increased in patients receiving placebo but decreased in patients randomized to spironolactone. CONCLUSIONS: Proteomic analysis in the TOPCAT trial revealed proteins and pathways altered by spironolactone, including the caspase inhibitor CARD18 and multiple pathways that involved collagens. In addition to effects on fibrosis, our studies suggest potential antiapoptotic effects of spironolactone in heart failure with preserved ejection fraction, a hypothesis that merits further exploration.


Biological Products , Heart Failure , Insulins , Apelin/pharmacology , Apelin/therapeutic use , Biological Products/pharmacology , Biological Products/therapeutic use , Caspases/pharmacology , Caspases/therapeutic use , Humans , Insulins/therapeutic use , Liver X Receptors , Mineralocorticoid Receptor Antagonists/therapeutic use , Phospholipid Transfer Proteins/pharmacology , Phospholipid Transfer Proteins/therapeutic use , Proteome , Proteomics , Spironolactone/adverse effects , Stroke Volume/physiology , Treatment Outcome
5.
Nat Genet ; 53(7): 942-948, 2021 07.
Article En | MEDLINE | ID: mdl-34183854

The UK Biobank Exome Sequencing Consortium (UKB-ESC) is a private-public partnership between the UK Biobank (UKB) and eight biopharmaceutical companies that will complete the sequencing of exomes for all ~500,000 UKB participants. Here, we describe the early results from ~200,000 UKB participants and the features of this project that enabled its success. The biopharmaceutical industry has increasingly used human genetics to improve success in drug discovery. Recognizing the need for large-scale human genetics data, as well as the unique value of the data access and contribution terms of the UKB, the UKB-ESC was formed. As a result, exome data from 200,643 UKB enrollees are now available. These data include ~10 million exonic variants-a rich resource of rare coding variation that is particularly valuable for drug discovery. The UKB-ESC precompetitive collaboration has further strengthened academic and industry ties and has provided teams with an opportunity to interact with and learn from the wider research community.


Biological Specimen Banks , Drug Discovery , Exome Sequencing , Human Genetics , Research , Drug Discovery/methods , Genomics/methods , Humans , United Kingdom
6.
Am J Hum Genet ; 108(7): 1350-1355, 2021 07 01.
Article En | MEDLINE | ID: mdl-34115965

Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) causes coronavirus disease 2019 (COVID-19), a respiratory illness that can result in hospitalization or death. We used exome sequence data to investigate associations between rare genetic variants and seven COVID-19 outcomes in 586,157 individuals, including 20,952 with COVID-19. After accounting for multiple testing, we did not identify any clear associations with rare variants either exome wide or when specifically focusing on (1) 13 interferon pathway genes in which rare deleterious variants have been reported in individuals with severe COVID-19, (2) 281 genes located in susceptibility loci identified by the COVID-19 Host Genetics Initiative, or (3) 32 additional genes of immunologic relevance and/or therapeutic potential. Our analyses indicate there are no significant associations with rare protein-coding variants with detectable effect sizes at our current sample sizes. Analyses will be updated as additional data become available, and results are publicly available through the Regeneron Genetics Center COVID-19 Results Browser.


COVID-19/diagnosis , COVID-19/genetics , Exome Sequencing , Exome/genetics , Genetic Predisposition to Disease , Hospitalization/statistics & numerical data , COVID-19/immunology , COVID-19/therapy , Female , Humans , Interferons/genetics , Male , Prognosis , SARS-CoV-2 , Sample Size
7.
Genet Med ; 22(1): 85-94, 2020 01.
Article En | MEDLINE | ID: mdl-31358947

PURPOSE: The translation of genome sequencing into routine health care has been slow, partly because of concerns about affordability. The aspirational cost of sequencing a genome is $1000, but there is little evidence to support this estimate. We estimate the cost of using genome sequencing in routine clinical care in patients with cancer or rare diseases. METHODS: We performed a microcosting study of Illumina-based genome sequencing in a UK National Health Service laboratory processing 399 samples/year. Cost data were collected for all steps in the sequencing pathway, including bioinformatics analysis and reporting of results. Sensitivity analysis identified key cost drivers. RESULTS: Genome sequencing costs £6841 per cancer case (comprising matched tumor and germline samples) and £7050 per rare disease case (three samples). The consumables used during sequencing are the most expensive component of testing (68-72% of the total cost). Equipment costs are higher for rare disease cases, whereas consumable and staff costs are slightly higher for cancer cases. CONCLUSION: The cost of genome sequencing is underestimated if only sequencing costs are considered, and likely surpasses $1000/genome in a single laboratory. This aspirational sequencing cost will likely only be achieved if consumable costs are considerably reduced and sequencing is performed at scale.


Neoplasms/genetics , Rare Diseases/genetics , Whole Genome Sequencing/economics , High-Throughput Nucleotide Sequencing/economics , High-Throughput Nucleotide Sequencing/instrumentation , Humans , Neoplasms/economics , Rare Diseases/economics , State Medicine , Translational Research, Biomedical , United Kingdom , Whole Genome Sequencing/instrumentation
8.
Blood ; 132(5): 469-483, 2018 08 02.
Article En | MEDLINE | ID: mdl-29891534

Chuvash polycythemia is an autosomal recessive form of erythrocytosis associated with a homozygous p.Arg200Trp mutation in the von Hippel-Lindau (VHL) gene. Since this discovery, additional VHL mutations have been identified in patients with congenital erythrocytosis, in a homozygous or compound-heterozygous state. VHL is a major tumor suppressor gene, mutations in which were first described in patients presenting with VHL disease, which is characterized by the development of highly vascularized tumors. Here, we identify a new VHL cryptic exon (termed E1') deep in intron 1 that is naturally expressed in many tissues. More importantly, we identify mutations in E1' in 7 families with erythrocytosis (1 homozygous case and 6 compound-heterozygous cases with a mutation in E1' in addition to a mutation in VHL coding sequences) and in 1 large family with typical VHL disease but without any alteration in the other VHL exons. In this study, we show that the mutations induced a dysregulation of VHL splicing with excessive retention of E1' and were associated with a downregulation of VHL protein expression. In addition, we demonstrate a pathogenic role for synonymous mutations in VHL exon 2 that altered splicing through E2-skipping in 5 families with erythrocytosis or VHL disease. In all the studied cases, the mutations differentially affected splicing, correlating with phenotype severity. This study demonstrates that cryptic exon retention and exon skipping are new VHL alterations and reveals a novel complex splicing regulation of the VHL gene. These findings open new avenues for diagnosis and research regarding the VHL-related hypoxia-signaling pathway.


Exons , Genetic Predisposition to Disease , Mutation , Polycythemia/genetics , RNA Splicing , Von Hippel-Lindau Tumor Suppressor Protein/genetics , von Hippel-Lindau Disease/genetics , Adolescent , Adult , Child , Female , Heterozygote , Humans , Male , Middle Aged , Pedigree , Polycythemia/classification , Polycythemia/pathology , Young Adult , von Hippel-Lindau Disease/pathology
9.
Article En | MEDLINE | ID: mdl-29610388

Next-generation sequencing (NGS) efforts have established catalogs of mutations relevant to cancer development. However, the clinical utility of this information remains largely unexplored. Here, we present the results of the first eight patients recruited into a clinical whole-genome sequencing (WGS) program in the United Kingdom. We performed PCR-free WGS of fresh frozen tumors and germline DNA at 75× and 30×, respectively, using the HiSeq2500 HTv4. Subtracted tumor VCFs and paired germlines were subjected to comprehensive analysis of coding and noncoding regions, integration of germline with somatically acquired variants, and global mutation signatures and pathway analyses. Results were classified into tiers and presented to a multidisciplinary tumor board. WGS results helped to clarify an uncertain histopathological diagnosis in one case, led to informed or supported prognosis in two cases, leading to de-escalation of therapy in one, and indicated potential treatments in all eight. Overall 26 different tier 1 potentially clinically actionable findings were identified using WGS compared with six SNVs/indels using routine targeted NGS. These initial results demonstrate the potential of WGS to inform future diagnosis, prognosis, and treatment choice in cancer and justify the systematic evaluation of the clinical utility of WGS in larger cohorts of patients with cancer.


Biomarkers, Tumor , Mutation , Neoplasms/diagnosis , Neoplasms/genetics , Whole Genome Sequencing , Adolescent , Adult , Aged , Biopsy , Child , DNA Mutational Analysis , Female , Humans , Immunohistochemistry , Male , Middle Aged , United Kingdom , Young Adult
10.
BMC Genomics ; 19(1): 115, 2018 02 01.
Article En | MEDLINE | ID: mdl-29390960

BACKGROUND: Transposable elements (TEs) are mobile genetic sequences that randomly propagate within their host's genome. This mobility has the potential to affect gene transcription and cause disease. However, TEs are technically challenging to identify, which complicates efforts to assess the impact of TE insertions on disease. Here we present a targeted sequencing protocol and computational pipeline to identify polymorphic and novel TE insertions using next-generation sequencing: TE-NGS. The method simultaneously targets the three subfamilies that are responsible for the majority of recent TE activity (L1HS, AluYa5/8, and AluYb8/9) thereby obviating the need for multiple experiments and reducing the amount of input material required. RESULTS: Here we describe the laboratory protocol and detection algorithm, and a benchmark experiment for the reference genome NA12878. We demonstrate a substantial enrichment for on-target fragments, and high sensitivity and precision to both reference and NA12878-specific insertions. We report 17 previously unreported loci for this individual which are supported by orthogonal long-read evidence, and we identify 1470 polymorphic and novel TEs in 12 additional samples that were previously undocumented in databases of insertion polymorphisms. CONCLUSIONS: We anticipate that future applications of TE-NGS alongside exome sequencing of patients with sporadic disease will reduce the number of unresolved cases, and improve estimates of the contribution of TEs to human genetic disease.


Algorithms , DNA Transposable Elements , High-Throughput Nucleotide Sequencing/methods , Polymorphism, Single Nucleotide , Gene Library , Humans
11.
Mol Biol Evol ; 31(1): 23-36, 2014 Jan.
Article En | MEDLINE | ID: mdl-24113537

Elucidating the mechanisms of mutation accumulation and fixation is critical to understand the nature of genetic variation and its contribution to genome evolution. Of particular interest is the effect of insertions and deletions (indels) on the evolution of genome landscapes. Recent population-scaled sequencing efforts provide unprecedented data for analyzing the relative impact of selection versus nonadaptive forces operating on indels. Here, we combined McDonald-Kreitman tests with the analysis of derived allele frequency spectra to investigate the dynamics of allele fixation of short (1-50 bp) indels in the human genome. Our analyses revealed apparently higher fixation probabilities for insertions than deletions. However, this fixation bias is not consistent with either selection or biased gene conversion and varies with local mutation rate, being particularly pronounced at indel hotspots. Furthermore, we identified an unprecedented number of loci with evidence for multiple indel events in the primate phylogeny. Even in nonrepetitive sequence contexts (a priori not prone to indel mutations), such loci are 60-fold more frequent than expected according to a model of uniform indel mutation rate. This provides evidence of as yet unidentified cryptic indel hotspots. We propose that indel homoplasy, at known and cryptic hotspots, produces systematic errors in determination of ancestral alleles via parsimony and advise caution interpreting classic selection tests given the strong heterogeneity in indel rates across the genome. These results will have great impact on studies seeking to infer evolutionary forces operating on indels observed in closely related species, because such mutations are traditionally presumed homoplasy-free.


Genome, Human , INDEL Mutation , Mutation Rate , Selection, Genetic , Evolution, Molecular , Gene Conversion , Genetic Loci , Humans , Models, Genetic , Phylogeny , Polymorphism, Single Nucleotide
12.
Genome Res ; 23(5): 749-61, 2013 May.
Article En | MEDLINE | ID: mdl-23478400

Short insertions and deletions (indels) are the second most abundant form of human genetic variation, but our understanding of their origins and functional effects lags behind that of other types of variants. Using population-scale sequencing, we have identified a high-quality set of 1.6 million indels from 179 individuals representing three diverse human populations. We show that rates of indel mutagenesis are highly heterogeneous, with 43%-48% of indels occurring in 4.03% of the genome, whereas in the remaining 96% their prevalence is 16 times lower than SNPs. Polymerase slippage can explain upwards of three-fourths of all indels, with the remainder being mostly simple deletions in complex sequence. However, insertions do occur and are significantly associated with pseudo-palindromic sequence features compatible with the fork stalling and template switching (FoSTeS) mechanism more commonly associated with large structural variations. We introduce a quantitative model of polymerase slippage, which enables us to identify indel-hypermutagenic protein-coding genes, some of which are associated with recurrent mutations leading to disease. Accounting for mutational rate heterogeneity due to sequence context, we find that indels across functional sequence are generally subject to stronger purifying selection than SNPs. We find that indel length modulates selection strength, and that indels affecting multiple functionally constrained nucleotides undergo stronger purifying selection. We further find that indels are enriched in associations with gene expression and find evidence for a contribution of nonsense-mediated decay. Finally, we show that indels can be integrated in existing genome-wide association studies (GWAS); although we do not find direct evidence that potentially causal protein-coding indels are enriched with associations to known disease-associated SNPs, our findings suggest that the causal variant underlying some of these associations may be indels.


Evolution, Molecular , Genome, Human , INDEL Mutation/genetics , Genetics, Population , Genome-Wide Association Study , High-Throughput Nucleotide Sequencing , Humans , Mutagenesis, Insertional , Mutation Rate , Polymorphism, Single Nucleotide
13.
Genome Res ; 20(5): 600-13, 2010 May.
Article En | MEDLINE | ID: mdl-20219940

The densities of transposable elements (TEs) in the human genome display substantial variation both within individual chromosomes and among chromosome types (autosomes and the two sex chromosomes). Finding an explanation for this variability has been challenging, especially in light of genome landscapes unique to the sex chromosomes. Here, using a multiple regression framework, we investigate primate Alu and L1 densities shaped by regional genome features and location on a particular chromosome type. As a result of our analysis, first, we build statistical models explaining up to 79% and 44% of variation in Alu and L1 element density, respectively. Second, we analyze sex chromosome versus autosome TE densities corrected for regional genomic effects. We discover that sex-chromosome bias in Alu and L1 distributions not only persists after accounting for these effects, but even presents differences in patterns, confirming preferential Alu integration in the male germline, yet likely integration of L1s in both male and female germlines or in early embryogenesis. Additionally, our models reveal that local base composition (measured by GC content and density of L1 target sites) and natural selection (inferred via density of most conserved elements) are significant to predicting densities of L1s. Interestingly, measurements of local double-stranded breaks (a 13-mer associated with genome instability) strongly correlate with densities of Alu elements; little evidence was found for the role of recombination-driven deletion in driving TE distributions over evolutionary time. Thus, Alu and L1 densities have been influenced by the combination of distinct local genome landscapes and the unique evolutionary dynamics of sex chromosomes.


Alu Elements/genetics , Genome/genetics , Long Interspersed Nucleotide Elements/genetics , Primates/genetics , Sex Chromosomes/genetics , Short Interspersed Nucleotide Elements/genetics , Animals , Chromosome Mapping , Chromosomes, Human, X/genetics , Chromosomes, Human, Y/genetics , Evolution, Molecular , Female , Humans , Male , Models, Genetic
14.
Genome Res ; 19(7): 1153-64, 2009 Jul.
Article En | MEDLINE | ID: mdl-19502380

Recent studies have revealed that insertions and deletions (indels) are more different in their formation than previously assumed. What remains enigmatic is how the local DNA sequence context contributes to these differences. To investigate the relative impact of various molecular mechanisms to indel formation, we analyzed sequence contexts of indels in the non protein- or RNA-coding, nonrepetitive (NCNR) portion of the human genome. We considered small (

Genome, Human , INDEL Mutation/genetics , Gene Deletion , Humans , Mutagenesis, Insertional , Repetitive Sequences, Nucleic Acid
15.
PLoS Comput Biol ; 3(9): 1772-82, 2007 Sep.
Article En | MEDLINE | ID: mdl-17941704

Insertions and deletions (indels) cause numerous genetic diseases and lead to pronounced evolutionary differences among genomes. The macaque sequences provide an opportunity to gain insights into the mechanisms generating these mutations on a genome-wide scale by establishing the polarity of indels occurring in the human lineage since its divergence from the chimpanzee. Here we apply novel regression techniques and multiscale analyses to demonstrate an extensive regional indel rate variation stemming from local fluctuations in divergence, GC content, male and female recombination rates, proximity to telomeres, and other genomic factors. We find that both replication and, surprisingly, recombination are significantly associated with the occurrence of small indels. Intriguingly, the relative inputs of replication versus recombination differ between insertions and deletions, thus the two types of mutations are likely guided in part by distinct mechanisms. Namely, insertions are more strongly associated with factors linked to recombination, while deletions are mostly associated with replication-related features. Indel as a term misleadingly groups the two types of mutations together by their effect on a sequence alignment. However, here we establish that the correct identification of a small gap as an insertion or a deletion (by use of an outgroup) is crucial to determining its mechanism of origin. In addition to providing novel insights into insertion and deletion mutagenesis, these results will assist in gap penalty modeling and eventually lead to more reliable genomic alignments.


Biological Evolution , Chromosome Mapping/methods , DNA Mutational Analysis/methods , DNA Transposable Elements/genetics , Gene Deletion , Genome, Human/genetics , Macaca/genetics , Animals , Computer Simulation , Evolution, Molecular , Humans , Models, Genetic
16.
J Bacteriol ; 186(22): 7773-82, 2004 Nov.
Article En | MEDLINE | ID: mdl-15516592

Modern comparative genomics has been established, in part, by the sequencing and annotation of a broad range of microbial species. To gain further insights, new sequencing efforts are now dealing with the variety of strains or isolates that gives a species definition and range; however, this number vastly outstrips our ability to sequence them. Given the availability of a large number of microbial species, new whole genome approaches must be developed to fully leverage this information at the level of strain diversity that maximize discovery. Here, we describe how optical mapping, a single-molecule system, was used to identify and annotate chromosomal alterations between bacterial strains represented by several species. Since whole-genome optical maps are ordered restriction maps, sequenced strains of Shigella flexneri serotype 2a (2457T and 301), Yersinia pestis (CO 92 and KIM), and Escherichia coli were aligned as maps to identify regions of homology and to further characterize them as possible insertions, deletions, inversions, or translocations. Importantly, an unsequenced Shigella flexneri strain (serotype Y strain AMC[328Y]) was optically mapped and aligned with two sequenced ones to reveal one novel locus implicated in serotype conversion and several other loci containing insertion sequence elements or phage-related gene insertions. Our results suggest that genomic rearrangements and chromosomal breakpoints are readily identified and annotated against a prototypic sequenced strain by using the tools of optical mapping.


Escherichia coli K12/genetics , Genome, Bacterial , Genomics , Restriction Mapping/methods , Shigella flexneri/genetics , Yersinia pestis/genetics , Image Processing, Computer-Assisted
17.
Mol Biochem Parasitol ; 138(1): 97-106, 2004 Nov.
Article En | MEDLINE | ID: mdl-15500921

Leishmania is a group of protozoan parasites which causes a broad spectrum of diseases resulting in widespread human suffering and death, as well as economic loss from the infection of some domestic animals and wildlife. To further understand the fundamental genomic architecture of this parasite, and to accelerate the on-going sequencing project, a whole-genome XbaI restriction map was constructed using the optical mapping system. This map supplemented traditional physical maps that were generated by fingerprinting and hybridization of cosmid and P1 clone libraries. Thirty-six optical map contigs were constructed for the corresponding known 36 chromosomes of the Leishmania major Friedlin genome. The chromosome sizes ranged from 326.9 to 2821.3 kb, with a total genome size of 34.7 Mb; the average XbaI restriction fragment was 25.3 kb, and ranged from 15.7 to 77.8 kb on a per chromosomes basis. Comparison between the optical maps and the in silico maps of sequence drawn from completed, nearly finished, or large sequence contigs showed that optical maps served several useful functions within the path to create finished sequence by: guiding aspects of the sequence assembly, identifying misassemblies, detection of cosmid or PAC clones misplacements to chromosomes, and validation of sequence stemming from varying degrees of finishing. Our results also showed the potential use of optical maps as a means to detect and characterize map segmental duplication within genomes.


Genome, Protozoan , Leishmania major/genetics , Restriction Mapping/methods , Animals , Deoxyribonucleases, Type II Site-Specific/metabolism , Electrophoresis, Gel, Pulsed-Field , Image Processing, Computer-Assisted
18.
Genome Res ; 13(9): 2142-51, 2003 Sep.
Article En | MEDLINE | ID: mdl-12952882

Rhodobacter sphaeroides 2.4.1 is a facultative photoheterotrophic bacterium with tremendous metabolic diversity, which has significantly contributed to our understanding of the molecular genetics of photosynthesis, photoheterotrophy, nitrogen fixation, hydrogen metabolism, carbon dioxide fixation, taxis, and tetrapyrrole biosynthesis. To further understand this remarkable bacterium, and to accelerate an ongoing sequencing project, two whole-genome restriction maps (EcoRI and HindIII) of R. sphaeroides strain 2.4.1 were constructed using shotgun optical mapping. The approach directly mapped genomic DNA by the random mapping of single molecules. The two maps were used to facilitate sequence assembly by providing an optical scaffold for high-resolution alignment and verification of sequence contigs. Our results show that such maps facilitated the closure of sequence gaps by the early detection of nascent sequence contigs during the course of the whole-genome shotgun sequencing process.


Genome, Bacterial , Optics and Photonics , Restriction Mapping/methods , Rhodobacter sphaeroides/genetics , Sequence Analysis, DNA/methods , Chromosomes, Bacterial/genetics , Contig Mapping/methods , Genetic Markers/genetics , Microscopy, Fluorescence , Sequence Alignment/methods , Species Specificity
19.
Appl Environ Microbiol ; 68(12): 6321-31, 2002 Dec.
Article En | MEDLINE | ID: mdl-12450857

Yersinia pestis is the causative agent of the bubonic, septicemic, and pneumonic plagues (also known as black death) and has been responsible for recurrent devastating pandemics throughout history. To further understand this virulent bacterium and to accelerate an ongoing sequencing project, two whole-genome restriction maps (XhoI and PvuII) of Y. pestis strain KIM were constructed using shotgun optical mapping. This approach constructs ordered restriction maps from randomly sheared individual DNA molecules directly extracted from cells. The two maps served different purposes; the XhoI map facilitated sequence assembly by providing a scaffold for high-resolution alignment, while the PvuII map verified genome sequence assembly. Our results show that such maps facilitated the closure of sequence gaps and, most importantly, provided a purely independent means for sequence validation. Given the recent advancements to the optical mapping system, increased resolution and throughput are enabling such maps to guide sequence assembly at a very early stage of a microbial sequencing project.


Genome, Bacterial , Restriction Mapping , Yersinia pestis/genetics
...