Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 52
Filter
Add more filters










Publication year range
2.
Article in English | MEDLINE | ID: mdl-38905513

ABSTRACT

Long-range sequencing grants insight into additional genetic information beyond that which can be accessed by both short reads and modern long-read technology. Several new sequencing technologies are available for long-range datasets such as "Hi-C" and "Linked Reads" with high-throughput and high-resolution genome analysis, and are rapidly advancing the field of genome assembly, genome scaffolding, and more comprehensive variant identification. In this article, we focused on five major long-range sequencing technologies: high-throughput chromosome conformation capture (Hi-C), 10x Genomics Linked Reads, haplotagging, transposase enzyme linked long-read sequencing (TELL-seq), and single tube long fragment read (stLFR). We detailed the mechanisms and data products of the five platforms, introduced several of the most important applications, evaluated the quality of sequencing data from different platforms, and discussed the currently available bioinformatics tools. We hope this work will benefit the selection of appropriate long-range technology for specific biological studies.

3.
Cell Genom ; 4(2): 100484, 2024 Feb 14.
Article in English | MEDLINE | ID: mdl-38232733

ABSTRACT

The epigenetic landscape of cancer is regulated by many factors, but primarily it derives from the underlying genome sequence. Chromothripsis is a catastrophic localized genome shattering event that drives, and often initiates, cancer evolution. We characterized five esophageal adenocarcinoma organoids with chromothripsis using long-read sequencing and transcriptome and epigenome profiling. Complex structural variation and subclonal variants meant that haplotype-aware de novo methods were required to generate contiguous cancer genome assemblies. Chromosomes were assembled separately and scaffolded using haplotype-resolved Hi-C reads, producing accurate assemblies even with up to 900 structural rearrangements. There were widespread differences between the chromothriptic and wild-type copies of chromosomes in topologically associated domains, chromatin accessibility, histone modifications, and gene expression. Differential epigenome peaks were most enriched within 10 kb of chromothriptic structural variants. Alterations in transcriptome and higher-order chromosome organization frequently occurred near differential epigenetic marks. Overall, chromothripsis reshapes gene regulation, causing coordinated changes in epigenetic landscape, transcription, and chromosome conformation.


Subject(s)
Adenocarcinoma , Chromothripsis , Esophageal Neoplasms , Humans , Haplotypes , Chromatin , Genome , Adenocarcinoma/genetics
4.
Nat Cancer ; 4(11): 1575-1591, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37783803

ABSTRACT

Transmissible cancers are malignant cell lineages that spread clonally between individuals. Several such cancers, termed bivalve transmissible neoplasia (BTN), induce leukemia-like disease in marine bivalves. This is the case of BTN lineages affecting the common cockle, Cerastoderma edule, which inhabits the Atlantic coasts of Europe and northwest Africa. To investigate the evolution of cockle BTN, we collected 6,854 cockles, diagnosed 390 BTN tumors, generated a reference genome and assessed genomic variation across 61 tumors. Our analyses confirmed the existence of two BTN lineages with hemocytic origins. Mitochondrial variation revealed mitochondrial capture and host co-infection events. Mutational analyses identified lineage-specific signatures, one of which likely reflects DNA alkylation. Cytogenetic and copy number analyses uncovered pervasive genomic instability, with whole-genome duplication, oncogene amplification and alkylation-repair suppression as likely drivers. Satellite DNA distributions suggested ancient clonal origins. Our study illuminates long-term cancer evolution under the sea and reveals tolerance of extreme instability in neoplastic genomes.


Subject(s)
Bivalvia , Cardiidae , Leukemia , Neoplasms , Animals , Humans , Cardiidae/genetics , Clonal Evolution
5.
6.
Nat Commun ; 14(1): 3412, 2023 06 09.
Article in English | MEDLINE | ID: mdl-37296119

ABSTRACT

Numerous novel adaptations characterise the radiation of notothenioids, the dominant fish group in the freezing seas of the Southern Ocean. To improve understanding of the evolution of this iconic fish group, here we generate and analyse new genome assemblies for 24 species covering all major subgroups of the radiation, including five long-read assemblies. We present a new estimate for the onset of the radiation at 10.7 million years ago, based on a time-calibrated phylogeny derived from genome-wide sequence data. We identify a two-fold variation in genome size, driven by expansion of multiple transposable element families, and use the long-read data to reconstruct two evolutionarily important, highly repetitive gene family loci. First, we present the most complete reconstruction to date of the antifreeze glycoprotein gene family, whose emergence enabled survival in sub-zero temperatures, showing the expansion of the antifreeze gene locus from the ancestral to the derived state. Second, we trace the loss of haemoglobin genes in icefishes, the only vertebrates lacking functional haemoglobins, through complete reconstruction of the two haemoglobin gene clusters across notothenioid families. Both the haemoglobin and antifreeze genomic loci are characterised by multiple transposon expansions that may have driven the evolutionary history of these genes.


Subject(s)
Fishes , Perciformes , Animals , Fishes/genetics , Genomics , Vertebrates , Phylogeny , Hemoglobins/genetics , Antarctic Regions
7.
Science ; 380(6642): 283-293, 2023 04 21.
Article in English | MEDLINE | ID: mdl-37079675

ABSTRACT

Tasmanian devils have spawned two transmissible cancer lineages, named devil facial tumor 1 (DFT1) and devil facial tumor 2 (DFT2). We investigated the genetic diversity and evolution of these clones by analyzing 78 DFT1 and 41 DFT2 genomes relative to a newly assembled, chromosome-level reference. Time-resolved phylogenetic trees reveal that DFT1 first emerged in 1986 (1982 to 1989) and DFT2 in 2011 (2009 to 2012). Subclone analysis documents transmission of heterogeneous cell populations. DFT2 has faster mutation rates than DFT1 across all variant classes, including substitutions, indels, rearrangements, transposable element insertions, and copy number alterations, and we identify a hypermutated DFT1 lineage with defective DNA mismatch repair. Several loci show plausible evidence of positive selection in DFT1 or DFT2, including loss of chromosome Y and inactivation of MGA, but none are common to both cancers. This study reveals the parallel long-term evolution of two transmissible cancers inhabiting a common niche in Tasmanian devils.


Subject(s)
Evolution, Molecular , Facial Neoplasms , Marsupialia , Selection, Genetic , Animals , Facial Neoplasms/classification , Facial Neoplasms/genetics , Facial Neoplasms/veterinary , Genome , Marsupialia/genetics , Phylogeny
9.
Nat Commun ; 13(1): 3150, 2022 06 07.
Article in English | MEDLINE | ID: mdl-35672295

ABSTRACT

The STORR gene fusion event is considered essential for the evolution of the promorphinan/morphinan subclass of benzylisoquinoline alkaloids (BIAs) in opium poppy as the resulting bi-modular protein performs the isomerization of (S)- to (R)-reticuline essential for their biosynthesis. Here, we show that of the 12 Papaver species analysed those containing the STORR gene fusion also contain promorphinans/morphinans with one important exception. P. californicum encodes a functionally conserved STORR but does not produce promorphinans/morphinans. We also show that the gene fusion event occurred only once, between 16.8-24.1 million years ago before the separation of P. californicum from other Clade 2 Papaver species. The most abundant BIA in P. californicum is (R)-glaucine, a member of the aporphine subclass of BIAs, raising the possibility that STORR, once evolved, contributes to the biosynthesis of more than just the promorphinan/morphinan subclass of BIAs in the Papaveraceae.


Subject(s)
Alkaloids , Benzylisoquinolines , Morphinans , Papaver , Alkaloids/metabolism , Benzylisoquinolines/metabolism , Gene Fusion , Morphinans/metabolism , Papaver/genetics , Papaver/metabolism , Plant Proteins/genetics , Plant Proteins/metabolism
10.
BMC Bioinformatics ; 22(1): 569, 2021 Nov 27.
Article in English | MEDLINE | ID: mdl-34837944

ABSTRACT

BACKGROUND: Efficient and effective genome scaffolding tools are still in high demand for generating reference-quality assemblies. While long read data itself is unlikely to create a chromosome-scale assembly for most eukaryotic species, the inexpensive Hi-C sequencing technology, capable of capturing the chromosomal profile of a genome, is now widely used to complete the task. However, the existing Hi-C based scaffolding tools either require a priori chromosome number as input, or lack the ability to build highly continuous scaffolds. RESULTS: We design and develop a novel Hi-C based scaffolding tool, pin_hic, which takes advantage of contact information from Hi-C reads to construct a scaffolding graph iteratively based on N-best neighbors of contigs. Subsequent to scaffolding, it identifies potential misjoins and breaks them to keep the scaffolding accuracy. Through our tests on three long read based de novo assemblies from three different species, we demonstrate that pin_hic is more efficient than current standard state-of-art tools, and it can generate much more continuous scaffolds, while achieving a higher or comparable accuracy. CONCLUSIONS: Pin_hic is an efficient Hi-C based scaffolding tool, which can be useful for building chromosome-scale assemblies. As many sequencing projects have been launched in the recent years, we believe pin_hic has potential to be applied in these projects and makes a meaningful contribution.


Subject(s)
Genome , Genomics , Chromosomes/genetics , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA
11.
Nat Commun ; 12(1): 5508, 2021 09 17.
Article in English | MEDLINE | ID: mdl-34535649

ABSTRACT

Perilla is a young allotetraploid Lamiaceae species widely used in East Asia as herb and oil plant. Here, we report the high-quality, chromosome-scale genomes of the tetraploid (Perilla frutescens) and the AA diploid progenitor (Perilla citriodora). Comparative analyses suggest post Neolithic allotetraploidization within 10,000 years, and nucleotide mutation in tetraploid is 10% more than in diploid, both of which are dominated by G:C → A:T transitions. Incipient diploidization is characterized by balanced swaps of homeologous segments, and subsequent homeologous exchanges are enriched towards telomeres, with excess of replacements of AA genes by fractionated BB homeologs. Population analyses suggest that the crispa lines are close to the nascent tetraploid, and involvement of acyl-CoA: lysophosphatidylcholine acyltransferase gene for high α-linolenic acid content of seed oil is revealed by GWAS. These resources and findings provide insights into incipient diploidization and basis for breeding improvement of this medicinal plant.


Subject(s)
Diploidy , Perilla/genetics , Plants, Medicinal/genetics , Base Sequence , Biological Evolution , Genes, Plant , Genetics, Population , Genome, Plant , Genome-Wide Association Study , Nucleotides/genetics , Pigmentation/genetics , Plant Leaves/genetics , Polyploidy
12.
PLoS Pathog ; 17(8): e1009772, 2021 08.
Article in English | MEDLINE | ID: mdl-34352039

ABSTRACT

Understanding SARS-CoV-2 evolution and host immunity is critical to control COVID-19 pandemics. At the core is an arms-race between SARS-CoV-2 antibody and angiotensin-converting enzyme 2 (ACE2) recognition, a function of the viral protein spike. Mutations in spike impacting antibody and/or ACE2 binding are appearing worldwide, imposing the need to monitor SARS-CoV2 evolution and dynamics in the population. Determining signatures in SARS-CoV-2 that render the virus resistant to neutralizing antibodies is critical. We engineered 25 spike-pseudotyped lentiviruses containing individual and combined mutations in the spike protein, including all defining mutations in the variants of concern, to identify the effect of single and synergic amino acid substitutions in promoting immune escape. We confirmed that E484K evades antibody neutralization elicited by infection or vaccination, a capacity augmented when complemented by K417N and N501Y mutations. In silico analysis provided an explanation for E484K immune evasion. E484 frequently engages in interactions with antibodies but not with ACE2. Importantly, we identified a novel amino acid of concern, S494, which shares a similar pattern. Using the already circulating mutation S494P, we found that it reduces antibody neutralization of convalescent and post-immunization sera, particularly when combined with E484K and with mutations able to increase binding to ACE2, such as N501Y. Our analysis of synergic mutations provides a signature for hotspots for immune evasion and for targets of therapies, vaccines and diagnostics.


Subject(s)
Antibodies, Neutralizing/immunology , COVID-19/virology , SARS-CoV-2/immunology , Spike Glycoprotein, Coronavirus/immunology , Amino Acid Substitution/genetics , Angiotensin-Converting Enzyme 2/genetics , Angiotensin-Converting Enzyme 2/immunology , Antibodies, Monoclonal/immunology , Antibodies, Viral/immunology , COVID-19/immunology , Cell Line , Humans , Immune Evasion , Mutation/genetics , Protein Binding , SARS-CoV-2/genetics , Spike Glycoprotein, Coronavirus/genetics , Spike Glycoprotein, Coronavirus/metabolism
13.
Plant J ; 107(2): 613-628, 2021 07.
Article in English | MEDLINE | ID: mdl-33960539

ABSTRACT

Traditional crops have historically provided accessible and affordable nutrition to millions of rural dwellers but have been neglected, with most modern agricultural systems over-reliant on a small number of internationally traded crops. Traditional crops are typically well-adapted to local agro-ecological conditions and many are nutrient-dense. They can play a vital role in local food systems through enhanced nutrition (particularly where diets are dominated by starch crops), food security and livelihoods for smallholder farmers, and a climate-resilient and biodiverse agriculture. Using short-read, long-read and phased sequencing technologies, we generated a high-quality chromosome-level genome assembly for Amaranthus cruentus, an under-researched crop with micronutrient- and protein-rich leaves and gluten-free seed, but lacking improved varieties, with respect to productivity and quality traits. The 370.9 Mb genome demonstrates a shared whole genome duplication with a related species, Amaranthus hypochondriacus. Comparative genome analysis indicates chromosomal loss and fusion events following genome duplication that are common to both species, as well as fission of chromosome 2 in A. cruentus alone, giving rise to a haploid chromosome number of 17 (versus 16 in A. hypochondriacus). Genomic features potentially underlying the nutritional value of this crop include two A. cruentus-specific genes with a likely role in phytic acid synthesis (an anti-nutrient), expansion of ion transporter gene families, and identification of biosynthetic gene clusters conserved within the amaranth lineage. The A. cruentus genome assembly will underpin much-needed research and global breeding efforts to develop improved varieties for economically viable cultivation and realization of the benefits to global nutrition security and agrobiodiversity.


Subject(s)
Amaranthus/genetics , Chromosomes, Plant/genetics , Crops, Agricultural/genetics , Evolution, Molecular , Genome, Plant/genetics , Multigene Family/genetics , Nutritive Value/genetics , Amaranthus/metabolism , Chromosome Mapping , Genes, Plant/genetics , Phylogeny
15.
Wellcome Open Res ; 5: 148, 2020.
Article in English | MEDLINE | ID: mdl-33195818

ABSTRACT

We present a genome assembly for Cottoperca gobio (channel bull blenny, (Günther, 1861)); Chordata; Actinopterygii (ray-finned fishes), a temperate water outgroup for Antarctic Notothenioids. The size of the genome assembly is 609 megabases, with the majority of the assembly scaffolded into 24 chromosomal pseudomolecules. Gene annotation on Ensembl of this assembly has identified 21,662 coding genes.

16.
Science ; 362(6412): 343-347, 2018 10 19.
Article in English | MEDLINE | ID: mdl-30166436

ABSTRACT

Morphinan-based painkillers are derived from opium poppy (Papaver somniferum L.). We report a draft of the opium poppy genome, with 2.72 gigabases assembled into 11 chromosomes with contig N50 and scaffold N50 of 1.77 and 204 megabases, respectively. Synteny analysis suggests a whole-genome duplication at ~7.8 million years ago and ancient segmental or whole-genome duplication(s) that occurred before the Papaveraceae-Ranunculaceae divergence 110 million years ago. Syntenic blocks representative of phthalideisoquinoline and morphinan components of a benzylisoquinoline alkaloid cluster of 15 genes provide insight into how this cluster evolved. Paralog analysis identified P450 and oxidoreductase genes that combined to form the STORR gene fusion essential for morphinan biosynthesis in opium poppy. Thus, gene duplication, rearrangement, and fusion events have led to evolution of specialized metabolic products in opium poppy.


Subject(s)
Benzylisoquinolines/metabolism , Evolution, Molecular , Gene Duplication , Genome, Plant , Morphinans/metabolism , Papaver/genetics , Papaver/metabolism , Gene Fusion , Gene Order , Multigene Family , NADPH-Ferrihemoprotein Reductase/genetics , Plant Proteins/genetics , Synteny
17.
Methods Mol Biol ; 1833: 95-105, 2018.
Article in English | MEDLINE | ID: mdl-30039366

ABSTRACT

Genetic variations are important evolutionary forces in all forms of life in nature. Accurate and efficient detection of various forms of genetic variants is crucial for understanding cell function, evolution and diseases in living organisms. In this chapter, we describe a detailed protocol that uses Pindel, a split-read algorithm, to discover indels and structural variants in a given genome, from Illumina short-read sequencing data produced from biological samples.


Subject(s)
Algorithms , Genetic Variation , Genome , Sequence Analysis, DNA/methods
18.
Cancer Cell ; 33(4): 607-619.e15, 2018 04 09.
Article in English | MEDLINE | ID: mdl-29634948

ABSTRACT

Transmissible cancers are clonal lineages that spread through populations via contagious cancer cells. Although rare in nature, two facial tumor clones affect Tasmanian devils. Here we perform comparative genetic and functional characterization of these lineages. The two cancers have similar patterns of mutation and show no evidence of exposure to exogenous mutagens or viruses. Genes encoding PDGF receptors have copy number gains and are present on extrachromosomal double minutes. Drug screening indicates causative roles for receptor tyrosine kinases and sensitivity to inhibitors of DNA repair. Y chromosome loss from a male clone infecting a female host suggests immunoediting. These results imply that Tasmanian devils may have inherent susceptibility to transmissible cancers and present a suite of therapeutic compounds for use in conservation.


Subject(s)
Facial Neoplasms/veterinary , Marsupialia/genetics , Mutation , Receptors, Platelet-Derived Growth Factor/genetics , Animals , Cell Line, Tumor , Chromosomes, Mammalian/genetics , Clone Cells/immunology , Clone Cells/pathology , Facial Neoplasms/genetics , Facial Neoplasms/immunology , Female , Gene Dosage , Gene Editing , Immunity , Male
19.
Bioinformatics ; 34(17): 3022-3024, 2018 09 01.
Article in English | MEDLINE | ID: mdl-29608694

ABSTRACT

Motivation: The recent technological advances in genome sequencing techniques have resulted in an exponential increase in the number of sequenced human and non-human genomes. The ever increasing number of assemblies generated by novel de novo pipelines and strategies demands the development of new software to evaluate assembly quality and completeness. One way to determine the completeness of an assembly is by detecting its Presence-Absence variations (PAV) with respect to a reference, where PAVs between two assemblies are defined as the sequences present in one assembly but entirely missing in the other one. Beyond assembly error or technology bias, PAVs can also reveal real genome polymorphism, consequence of species or individual evolution, or horizontal transfer from viruses and bacteria. Results: We present scanPAV, a pipeline for pairwise assembly comparison to identify and extract sequences present in one assembly but not the other. In this note, we use the GRCh38 reference assembly to assess the completeness of six human genome assemblies from various assembly strategies and sequencing technologies including Illumina short reads, 10× genomics linked-reads, PacBio and Oxford Nanopore long reads, and Bionano optical maps. We also discuss the PAV polymorphism of seven Tasmanian devil whole genome assemblies of normal animal tissues and devil facial tumour 1 (DFT1) and 2 (DFT2) samples, and the identification of bacterial sequences as contamination in some of the tumorous assemblies. Availability and implementation: The pipeline is available under the MIT License at https://github.com/wtsi-hpag/scanPAV. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome , Animals , Chromosome Mapping , Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Humans , Sequence Analysis, DNA/methods , Software
20.
Sci Rep ; 7(1): 3935, 2017 06 21.
Article in English | MEDLINE | ID: mdl-28638050

ABSTRACT

Long-read sequencing technologies such as Pacific Biosciences and Oxford Nanopore MinION are capable of producing long sequencing reads with average fragment lengths of over 10,000 base-pairs and maximum lengths reaching 100,000 base- pairs. Compared with short reads, the assemblies obtained from long-read sequencing platforms have much higher contig continuity and genome completeness as long fragments are able to extend paths into problematic or repetitive regions. Many successful assembly applications of the Pacific Biosciences technology have been reported ranging from small bacterial genomes to large plant and animal genomes. Recently, genome assemblies using Oxford Nanopore MinION data have attracted much attention due to the portability and low cost of this novel sequencing instrument. In this paper, we re-sequenced a well characterized genome, the Saccharomyces cerevisiae S288C strain using three different platforms: MinION, PacBio and MiSeq. We present a comprehensive metric comparison of assemblies generated by various pipelines and discuss how the platform associated data characteristics affect the assembly quality. With a given read depth of 31X, the assemblies from both Pacific Biosciences and Oxford Nanopore MinION show excellent continuity and completeness for the 16 nuclear chromosomes, but not for the mitochondrial genome, whose reconstruction still represents a significant challenge.


Subject(s)
Genome, Fungal , Genomics , Saccharomyces cerevisiae/genetics , Sequence Analysis, DNA , Genome, Mitochondrial , Genomics/instrumentation , Genomics/methods , High-Throughput Nucleotide Sequencing/instrumentation , High-Throughput Nucleotide Sequencing/methods , Reproducibility of Results , Sequence Analysis, DNA/instrumentation , Sequence Analysis, DNA/methods
SELECTION OF CITATIONS
SEARCH DETAIL