RESUMO
The switch/sucrose non-fermentable (SWI/SNF) complex has a crucial role in chromatin remodelling1 and is altered in over 20% of cancers2,3. Here we developed a proteolysis-targeting chimera (PROTAC) degrader of the SWI/SNF ATPase subunits, SMARCA2 and SMARCA4, called AU-15330. Androgen receptor (AR)+ forkhead box A1 (FOXA1)+ prostate cancer cells are exquisitely sensitive to dual SMARCA2 and SMARCA4 degradation relative to normal and other cancer cell lines. SWI/SNF ATPase degradation rapidly compacts cis-regulatory elements bound by transcription factors that drive prostate cancer cell proliferation, namely AR, FOXA1, ERG and MYC, which dislodges them from chromatin, disables their core enhancer circuitry, and abolishes the downstream oncogenic gene programs. SWI/SNF ATPase degradation also disrupts super-enhancer and promoter looping interactions that wire supra-physiologic expression of the AR, FOXA1 and MYC oncogenes themselves. AU-15330 induces potent inhibition of tumour growth in xenograft models of prostate cancer and synergizes with the AR antagonist enzalutamide, even inducing disease remission in castration-resistant prostate cancer (CRPC) models without toxicity. Thus, impeding SWI/SNF-mediated enhancer accessibility represents a promising therapeutic approach for enhancer-addicted cancers.
Assuntos
Adenosina Trifosfatases , DNA Helicases , Proteínas Nucleares , Neoplasias da Próstata , Fatores de Transcrição , Adenosina Trifosfatases/metabolismo , Animais , Benzamidas , DNA Helicases/genética , Elementos Facilitadores Genéticos , Genes myc , Fator 3-alfa Nuclear de Hepatócito , Humanos , Masculino , Nitrilas , Proteínas Nucleares/genética , Oncogenes , Feniltioidantoína , Neoplasias da Próstata/tratamento farmacológico , Neoplasias da Próstata/genética , Receptores Androgênicos , Fatores de Transcrição/genética , Regulador Transcricional ERG , Ensaios Antitumorais Modelo de XenoenxertoRESUMO
The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.
Assuntos
Mapeamento Cromossômico , Diploide , Genoma Humano , Genômica , Humanos , Mapeamento Cromossômico/normas , Genoma Humano/genética , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normas , Padrões de Referência , Genômica/métodos , Genômica/normas , Cromossomos Humanos/genética , Variação Genética/genéticaRESUMO
MOTIVATION: Microbial gene catalogs are data structures that organize genes found in microbial communities, providing a reference for standardized analysis of the microbes across samples and studies. Although gene catalogs are commonly used, they have not been critically evaluated for their effectiveness as a basis for metagenomic analyses. RESULTS: As a case study, we investigate one such catalog, the Integrated Gene Catalog (IGC), however, our observations apply broadly to most gene catalogs constructed to date. We focus on both the approach used to construct this catalog and on its effectiveness when used as a reference for microbiome studies. Our results highlight important limitations of the approach used to construct the IGC and call into question the broad usefulness of gene catalogs more generally. We also recommend best practices for the construction and use of gene catalogs in microbiome studies and highlight opportunities for future research. AVAILABILITY AND IMPLEMENTATION: All supporting scripts for our analyses can be found on GitHub: https://github.com/SethCommichaux/IGC.git. The supporting data can be downloaded from: https://obj.umiacs.umd.edu/igc-analysis/IGC_analysis_data.tar.gz. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Metagenoma , Microbiota , Microbiota/genética , MetagenômicaRESUMO
Metagenomic samples are snapshots of complex ecosystems at work. They comprise hundreds of known and unknown species, contain multiple strain variants and vary greatly within and across environments. Many microbes found in microbial communities are not easily grown in culture making their DNA sequence our only clue into their evolutionary history and biological function. Metagenomic assembly is a computational process aimed at reconstructing genes and genomes from metagenomic mixtures. Current methods have made significant strides in reconstructing DNA segments comprising operons, tandem gene arrays and syntenic blocks. Shorter, higher-throughput sequencing technologies have become the de facto standard in the field. Sequencers are now able to generate billions of short reads in only a few days. Multiple metagenomic assembly strategies, pipelines and assemblers have appeared in recent years. Owing to the inherent complexity of metagenome assembly, regardless of the assembly algorithm and sequencing method, metagenome assemblies contain errors. Recent developments in assembly validation tools have played a pivotal role in improving metagenomics assemblers. Here, we survey recent progress in the field of metagenomic assembly, provide an overview of key approaches for genomic and metagenomic assembly validation and demonstrate the insights that can be derived from assemblies through the use of assembly validation strategies. We also discuss the potential for impact of long-read technologies in metagenomics. We conclude with a discussion of future challenges and opportunities in the field of metagenomic assembly and validation.
Assuntos
Metagenoma , Metagenômica/métodos , Microbiota/genética , Algoritmos , Biologia Computacional , Bases de Dados Genéticas/estatística & dados numéricos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Metagenômica/estatística & dados numéricos , Metagenômica/tendências , SoftwareRESUMO
The computational reconstruction of genome sequences from shotgun sequencing data has been greatly simplified by the advent of sequencing technologies that generate long reads. In the case of relatively small genomes (e.g., bacterial or viral), complete genome sequences can frequently be reconstructed computationally without the need for further experiments. However, large and complex genomes, such as those of most animals and plants, continue to pose significant challenges. In such genomes, assembly software produces incomplete and fragmented reconstructions that require additional experimentally derived information and manual intervention in order to reconstruct individual chromosome arms. Recent technologies originally designed to capture chromatin structure have been shown to effectively complement sequencing data, leading to much more contiguous reconstructions of genomes than previously possible. Here, we survey these technologies and the algorithms used to assemble and analyze large eukaryotic genomes, placed within the historical context of genome scaffolding technologies that have been in existence since the dawn of the genomic era.
Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Genoma Humano/genética , Genômica/métodos , Alinhamento de Sequência/métodos , Humanos , Análise de Sequência de DNARESUMO
Long-read sequencing and novel long-range assays have revolutionized de novo genome assembly by automating the reconstruction of reference-quality genomes. In particular, Hi-C sequencing is becoming an economical method for generating chromosome-scale scaffolds. Despite its increasing popularity, there are limited open-source tools available. Errors, particularly inversions and fusions across chromosomes, remain higher than alternate scaffolding technologies. We present a novel open-source Hi-C scaffolder that does not require an a priori estimate of chromosome number and minimizes errors by scaffolding with the assistance of an assembly graph. We demonstrate higher accuracy than the state-of-the-art methods across a variety of Hi-C library preparations and input assembly sizes. The Python and C++ code for our method is openly available at https://github.com/machinegun/SALSA.
Assuntos
Cromossomos Humanos/genética , Genoma Humano , Genômica/métodos , Algoritmos , Animais , Biologia Computacional , Simulação por Computador , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Biblioteca Genômica , Genômica/estatística & dados numéricos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/estatística & dados numéricos , SoftwareRESUMO
BACKGROUND: Mammalian X chromosomes are mainly euchromatic with a similar size and structure among species whereas Y chromosomes are smaller, have undergone substantial evolutionary changes and accumulated male specific genes and genes involved in sex determination. The pseudoautosomal region (PAR) is conserved on the X and Y and pair during meiosis. The structure, evolution and function of mammalian sex chromosomes, particularly the Y chromsome, is still poorly understood because few species have high quality sex chromosome assemblies. RESULTS: Here we report the first bovine sex chromosome assemblies that include the complete PAR spanning 6.84 Mb and three Y chromosome X-degenerate (X-d) regions. The PAR comprises 31 genes, including genes that are missing from the X chromosome in current cattle, sheep and goat reference genomes. Twenty-nine PAR genes are single-copy genes and two are multi-copy gene families, OBP, which has 3 copies and BDA20, which has 4 copies. The Y chromosome X-d1, 2a and 2b regions contain 11, 2 and 2 gametologs, respectively. CONCLUSIONS: The ruminant PAR comprises 31 genes and is similar to the PAR of pig and dog but extends further than those of human and horse. Differences in the pseudoautosomal boundaries are consistent with evolutionary divergence times. A bovidae-specific expansion of members of the lipocalin gene family in the PAR reported here, may affect immune-modulation and anti-inflammatory responses in ruminants. Comparison of the X-d regions of Y chromosomes across species revealed that five of the X-Y gametologs, which are known to be global regulators of gene activity and candidate sexual dimorphism genes, are conserved.
Assuntos
Bovinos/genética , Cromossomo X , Cromossomo Y , Animais , Cromossomos de Mamíferos , Cães , Evolução Molecular , Ordem dos Genes , Humanos , Masculino , Sequenciamento Completo do GenomaRESUMO
BACKGROUND: Long read technologies have revolutionized de novo genome assembly by generating contigs orders of magnitude longer than that of short read assemblies. Although assembly contiguity has increased, it usually does not reconstruct a full chromosome or an arm of the chromosome, resulting in an unfinished chromosome level assembly. To increase the contiguity of the assembly to the chromosome level, different strategies are used which exploit long range contact information between chromosomes in the genome. METHODS: We develop a scalable and computationally efficient scaffolding method that can boost the assembly contiguity to a large extent using genome-wide chromatin interaction data such as Hi-C. RESULTS: we demonstrate an algorithm that uses Hi-C data for longer-range scaffolding of de novo long read genome assemblies. We tested our methods on the human and goat genome assemblies. We compare our scaffolds with the scaffolds generated by LACHESIS based on various metrics. CONCLUSION: Our new algorithm SALSA produces more accurate scaffolds compared to the existing state of the art method LACHESIS.
Assuntos
Mapeamento de Sequências Contíguas/métodos , Algoritmos , Animais , Genômica , Cabras/genética , HumanosRESUMO
Advances in sequencing technologies have led to the increased use of high throughput sequencing in characterizing the microbial communities associated with our bodies and our environment. Critical to the analysis of the resulting data are sequence assembly algorithms able to reconstruct genes and organisms from complex mixtures. Metagenomic assembly involves new computational challenges due to the specific characteristics of the metagenomic data. In this survey, we focus on major algorithmic approaches for genome and metagenome assembly, and discuss the new challenges and opportunities afforded by this new field. We also review several applications of metagenome assembly in addressing interesting biological problems.
Assuntos
Metagenômica/métodos , Algoritmos , Animais , Humanos , Metagenoma/genética , Análise de Sequência de DNA/métodosRESUMO
Haplotype-resolved or phased genome assembly provides a complete picture of genomes and their complex genetic variations. However, current algorithms for phased assembly either do not generate chromosome-scale phasing or require pedigree information, which limits their application. We present a method named diploid assembly (DipAsm) that uses long, accurate reads and long-range conformation data for single individuals to generate a chromosome-scale phased assembly within 1 day. Applied to four public human genomes, PGP1, HG002, NA12878 and HG00733, DipAsm produced haplotype-resolved assemblies with minimum contig length needed to cover 50% of the known genome (NG50) up to 25 Mb and phased ~99.5% of heterozygous sites at 98-99% accuracy, outperforming other approaches in terms of both contiguity and phasing completeness. We demonstrate the importance of chromosome-scale phased assemblies for the discovery of structural variants (SVs), including thousands of new transposon insertions, and of highly polymorphic and medically important regions such as the human leukocyte antigen (HLA) and killer cell immunoglobulin-like receptor (KIR) regions. DipAsm will facilitate high-quality precision medicine and studies of individual haplotype variation and population diversity.
Assuntos
Cromossomos Humanos , Genoma Humano , Haplótipos , Algoritmos , Heterozigoto , Humanos , Polimorfismo de Nucleotídeo ÚnicoRESUMO
BACKGROUND: The Asian tiger mosquito Aedes albopictus is globally expanding and has become the main vector for human arboviruses in Europe. With limited antiviral drugs and vaccines available, vector control is the primary approach to prevent mosquito-borne diseases. A reliable and accurate DNA sequence of the Ae. albopictus genome is essential to develop new approaches that involve genetic manipulation of mosquitoes. RESULTS: We use long-read sequencing methods and modern scaffolding techniques (PacBio, 10X, and Hi-C) to produce AalbF2, a dramatically improved assembly of the Ae. albopictus genome. AalbF2 reveals widespread viral insertions, novel microRNAs and piRNA clusters, the sex-determining locus, and new immunity genes, and enables genome-wide studies of geographically diverse Ae. albopictus populations and analyses of the developmental and stage-dependent network of expression data. Additionally, we build the first physical map for this species with 75% of the assembled genome anchored to the chromosomes. CONCLUSION: The AalbF2 genome assembly represents the most up-to-date collective knowledge of the Ae. albopictus genome. These resources represent a foundation to improve understanding of the adaptation potential and the epidemiological relevance of this species and foster the development of innovative control measures.
Assuntos
Aedes/genética , Arbovírus/genética , Genoma , Mosquitos Vetores/genética , Aedes/imunologia , Aedes/virologia , Animais , Mapeamento Cromossômico , Cromossomos , Tamanho do Genoma , Imunidade , Insetos Vetores , Mosquitos Vetores/imunologia , Mosquitos Vetores/virologia , RNA Interferente Pequeno/genética , TranscriptomaRESUMO
Inbred animals were historically chosen for genome analysis to circumvent assembly issues caused by haplotype variation but this resulted in a composite of the two genomes. Here we report a haplotype-aware scaffolding and polishing pipeline which was used to create haplotype-resolved, chromosome-level genome assemblies of Angus (taurine) and Brahman (indicine) cattle subspecies from contigs generated by the trio binning method. These assemblies reveal structural and copy number variants that differentiate the subspecies and that variant detection is sensitive to the specific reference genome chosen. Six genes with immune related functions have additional copies in the indicine compared with taurine lineage and an indicus-specific extra copy of fatty acid desaturase is under positive selection. The haplotyped genomes also enable transcripts to be phased to detect allele-specific expression. This work exemplifies the value of haplotype-resolved genomes to better explore evolutionary and functional variations.
Assuntos
Bovinos/genética , Variação Genética , Genoma , Haplótipos/genética , Alelos , Desequilíbrio Alélico , Animais , Sequência de Bases , Cromossomos de Mamíferos/genética , Feminino , Loci Gênicos , Mutação INDEL/genética , Masculino , Anotação de Sequência Molecular , Polimorfismo de Nucleotídeo Único/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Sequências Repetitivas de Ácido Nucleico/genéticaRESUMO
BACKGROUND: Major advances in selection progress for cattle have been made following the introduction of genomic tools over the past 10-12 years. These tools depend upon the Bos taurus reference genome (UMD3.1.1), which was created using now-outdated technologies and is hindered by a variety of deficiencies and inaccuracies. RESULTS: We present the new reference genome for cattle, ARS-UCD1.2, based on the same animal as the original to facilitate transfer and interpretation of results obtained from the earlier version, but applying a combination of modern technologies in a de novo assembly to increase continuity, accuracy, and completeness. The assembly includes 2.7 Gb and is >250× more continuous than the original assembly, with contig N50 >25 Mb and L50 of 32. We also greatly expanded supporting RNA-based data for annotation that identifies 30,396 total genes (21,039 protein coding). The new reference assembly is accessible in annotated form for public use. CONCLUSIONS: We demonstrate that improved continuity of assembled sequence warrants the adoption of ARS-UCD1.2 as the new cattle reference genome and that increased assembly accuracy will benefit future research on this species.
Assuntos
Cruzamento/normas , Bovinos/genética , Genoma , Genômica/normas , Polimorfismo Genético , Animais , Cruzamento/métodos , Genômica/métodos , RNA-Seq/métodos , RNA-Seq/normas , Padrões de Referência , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normasRESUMO
Reconstructing genomic segments from metagenomics data is a highly complex task. In addition to general challenges, such as repeats and sequencing errors, metagenomic assembly needs to tolerate the uneven depth of coverage among organisms in a community and differences between nearly identical strains. Previous methods have addressed these issues by smoothing genomic variants. We present a variant-aware metagenomic scaffolder called MetaCarvel, which combines new strategies for repeat detection with graph analytics for the discovery of variants. We show that MetaCarvel can accurately reconstruct genomic segments from complex microbial mixtures and correctly identify and characterize several classes of common genomic variants.
Assuntos
Algoritmos , Variação Genética , Metagenômica/métodos , Acinetobacter/genética , Sequência de Bases , Bases de Dados Genéticas , Fezes/microbiologia , Humanos , Microbiota/genética , Mutação/genética , Sequências Repetitivas de Ácido Nucleico/genética , Análise de Sequência de DNARESUMO
BACKGROUND: Anopheles funestus is one of the 3 most consequential and widespread vectors of human malaria in tropical Africa. However, the lack of a high-quality reference genome has hindered the association of phenotypic traits with their genetic basis in this important mosquito. FINDINGS: Here we present a new high-quality A. funestus reference genome (AfunF3) assembled using 240× coverage of long-read single-molecule sequencing for contigging, combined with 100× coverage of short-read Hi-C data for chromosome scaffolding. The assembled contigs total 446 Mbp of sequence and contain substantial duplication due to alternative alleles present in the sequenced pool of mosquitos from the FUMOZ colony. Using alignment and depth-of-coverage information, these contigs were deduplicated to a 211 Mbp primary assembly, which is closer to the expected haploid genome size of 250 Mbp. This primary assembly consists of 1,053 contigs organized into 3 chromosome-scale scaffolds with an N50 contig size of 632 kbp and an N50 scaffold size of 93.811 Mbp, representing a 100-fold improvement in continuity versus the current reference assembly, AfunF1. CONCLUSION: This highly contiguous and complete A. funestus reference genome assembly will serve as an improved basis for future studies of genomic variation and organization in this important disease vector.
Assuntos
Anopheles/genética , Cromossomos de Insetos , Sequenciamento Completo do Genoma , Animais , Feminino , GenômicaRESUMO
We describe a method that adds long-read sequencing to a mix of technologies used to assemble a highly complex cattle rumen microbial community, and provide a comparison to short read-based methods. Long-read alignments and Hi-C linkage between contigs support the identification of 188 novel virus-host associations and the determination of phage life cycle states in the rumen microbial community. The long-read assembly also identifies 94 antimicrobial resistance genes, compared to only seven alleles in the short-read assembly. We demonstrate novel techniques that work synergistically to improve characterization of biological features in a highly complex rumen microbial community.
Assuntos
Resistência Microbiana a Medicamentos/genética , Metagenômica/métodos , Microbiota/genética , Análise de Sequência de DNA/métodos , Vírus/genética , Animais , Bovinos , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Transferência Genética Horizontal , Genes Microbianos , Fases de Leitura Aberta , Prófagos/genética , Rúmen/microbiologia , Rúmen/virologia , Vírus/isolamento & purificaçãoRESUMO
The Mid-Atlantic Microbiome Meet-up (M3) organization brings together academic, government, and industry groups to share ideas and develop best practices for microbiome research. In January of 2018, M3 held its fourth meeting, which focused on recent advances in biodefense, specifically those relating to infectious disease, and the use of metagenomic methods for pathogen detection. Presentations highlighted the utility of next-generation sequencing technologies for identifying and tracking microbial community members across space and time. However, they also stressed the current limitations of genomic approaches for biodefense, including insufficient sensitivity to detect low-abundance pathogens and the inability to quantify viable organisms. Participants discussed ways in which the community can improve software usability and shared new computational tools for metagenomic processing, assembly, annotation, and visualization. Looking to the future, they identified the need for better bioinformatics toolkits for longitudinal analyses, improved sample processing approaches for characterizing viruses and fungi, and more consistent maintenance of database resources. Finally, they addressed the necessity of improving data standards to incentivize data sharing. Here, we summarize the presentations and discussions from the meeting, identifying the areas where microbiome analyses have improved our ability to detect and manage biological threats and infectious disease, as well as gaps of knowledge in the field that require future funding and focus.