Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 47
Filter
1.
Nat Commun ; 15(1): 5323, 2024 Jun 22.
Article in English | MEDLINE | ID: mdl-38909053

ABSTRACT

Bioethanol is a sustainable energy alternative and can contribute to global greenhouse-gas emission reductions by over 60%. Its industrial production faces various bottlenecks, including sub-optimal efficiency resulting from bacteria. Broad-spectrum removal of these contaminants results in negligible gains, suggesting that the process is shaped by ecological interactions within the microbial community. Here, we survey the microbiome across all process steps at two biorefineries, over three timepoints in a production season. Leveraging shotgun metagenomics and cultivation-based approaches, we identify beneficial bacteria and find improved outcome when yeast-to-bacteria ratios increase during fermentation. We provide a microbial gene catalogue which reveals bacteria-specific pathways associated with performance. We also show that Limosilactobacillus fermentum overgrowth lowers production, with one strain reducing yield by ~5% in laboratory fermentations, potentially due to its metabolite profile. Temperature is found to be a major driver for strain-level dynamics. Improved microbial management strategies could unlock environmental and economic gains in this US $ 60 billion industry enabling its wider adoption.


Subject(s)
Bacteria , Ethanol , Fermentation , Ethanol/metabolism , Bacteria/metabolism , Bacteria/genetics , Bacteria/classification , Microbiota/physiology , Biofuels , Metagenomics , Industrial Microbiology/methods , Temperature
2.
Nature ; 617(7960): 312-324, 2023 05.
Article in English | MEDLINE | ID: mdl-37165242

ABSTRACT

Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.


Subject(s)
Genome, Human , Genomics , Humans , Diploidy , Genome, Human/genetics , Haplotypes/genetics , Sequence Analysis, DNA , Genomics/standards , Reference Standards , Cohort Studies , Alleles , Genetic Variation
3.
Nat Commun ; 14(1): 1358, 2023 03 13.
Article in English | MEDLINE | ID: mdl-36914638

ABSTRACT

Cancer genomes are highly complex and heterogeneous. The standard short-read sequencing and analytical methods are unable to provide the complete and precise base-level structural variant landscape of cancer genomes. In this work, we apply high-resolution long accurate HiFi and long-range Hi-C sequencing to the melanoma COLO829 cancer line. Also, we develop an efficient graph-based approach that processes these data types for chromosome-scale haplotype-resolved reconstruction to characterise the cancer precise structural variant landscape. Our method produces high-quality phased scaffolds on the chromosome level on three healthy samples and the COLO829 cancer line in less than half a day even in the absence of trio information, outperforming existing state-of-the-art methods. In the COLO829 cancer cell line, here we show that our method identifies and characterises precise somatic structural variant calls in important repeat elements that were missed in short-read-based call sets. Our method also finds the precise chromosome-level structural variant (germline and somatic) landscape with 19,956 insertions, 14,846 deletions, 421 duplications, 52 inversions and 498 translocations at the base resolution. Our simple pstools approach should facilitate better personalised diagnosis and disease management, including predicting therapeutic responses.


Subject(s)
High-Throughput Nucleotide Sequencing , Melanoma , Humans , Haplotypes/genetics , Genomics/methods , Sequence Analysis, DNA/methods , Chromosomes , Genome, Human
4.
Nature ; 611(7936): 519-531, 2022 Nov.
Article in English | MEDLINE | ID: mdl-36261518

ABSTRACT

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.


Subject(s)
Chromosome Mapping , Diploidy , Genome, Human , Genomics , Humans , Chromosome Mapping/standards , Genome, Human/genetics , Haplotypes/genetics , High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/standards , Sequence Analysis, DNA/methods , Sequence Analysis, DNA/standards , Reference Standards , Genomics/methods , Genomics/standards , Chromosomes, Human/genetics , Genetic Variation/genetics
5.
Trends Genet ; 38(11): 1103-1107, 2022 11.
Article in English | MEDLINE | ID: mdl-35817620

ABSTRACT

Complete pangenomics is crucial for understanding genetic diversity and evolution across the tree of life. Chromosome-scale, haplotype-resolved pangenomics allows complex structural variations, long-range interactions, and associated functions to be discerned in species populations. We explore the need for high-resolution pangenomes, discuss computational strategies for their development, and describe applications in biodiversity and human health.


Subject(s)
Chromosomes , Chromosomes/genetics , Haplotypes/genetics , Humans
6.
South Asian J Cancer ; 11(1): 3-8, 2022 Jan.
Article in English | MEDLINE | ID: mdl-35833049

ABSTRACT

Shilpa GargBackground Nuclear size, shape, chromatin pattern, and nucleolar size and number have all been reported to change in breast cancer. Aim The aim of the study was to quantify nuclear changes on malignant breast aspirates using morphometry and to correlate the morphometric parameters with clinicopathologic features such as cytologic grade, tumor size, lymph node status, mitotic index, and histopathologic grade. Materials and Methods Forty-five cases of carcinoma breast diagnosed on cytology were included in this study. Cytologic grading was performed as per the Robinson's cytologic grading system. Nuclear morphometry was done on Papanicolaou stained smears. One hundred nonoverlapping cells per case were evaluated. Both geometrical and textural parameters were evaluated. Results Comparison of cytologic grades with most morphometric features (nuclear area, perimeter, shape, long axis, short axis, intensity, total run length, and TI homogeneity) was highly significant on statistical analysis. Correlation with tumor size yielded significant results for nuclear area, perimeter, long and short axes, and intensity with p < 0.05. The study of lymph node status and morphometry showed a highly significant statistical association with all the parameters. Mitotic count was significantly associated with all the geometric parameters and one textural parameter (total run length). On correlation of ductal carcinoma in situ and histopathological Grades 1 to 3 with morphometry, it was found that all the parameters except long-run emphasis were highly significant with p < 0.001. Conclusion Morphometry as a technique holds immense promise in prognostication in breast carcinoma.

7.
Genome Res ; 32(4): 643-655, 2022 04.
Article in English | MEDLINE | ID: mdl-35177558

ABSTRACT

The occurrence and formation of genomic structural variants (SVs) is known to be influenced by the 3D chromatin architecture, but the extent and magnitude have been challenging to study. Here, we apply Hi-C to study chromatin organization before and after induction of chromothripsis in human cells. We use Hi-C to manually assemble the derivative chromosomes following the occurrence of massive complex rearrangements, which allows us to study the sources of SV formation and their consequences on gene regulation. We observe an action-reaction interplay whereby the 3D chromatin architecture directly impacts the location and formation of SVs. In turn, the SVs reshape the chromatin organization to alter the local topologies, replication timing, and gene regulation in cis We show that SVs have a strong tendency to occur between similar chromatin compartments and replication timing regions. Moreover, we find that SVs frequently occur at 3D loop anchors, that SVs can cause a switch in chromatin compartments and replication timing, and that this is a major source of SV-mediated effects on nearby gene expression changes. Finally, we provide evidence for a general mechanistic bias of the 3D chromatin on SV occurrence using data from more than 2700 patient-derived cancer genomes.


Subject(s)
Chromothripsis , Genome , Chromatin/genetics , Chromosomes , Genome, Human , Genomic Structural Variation , Humans
9.
South Asian J Cancer ; 10(2): 64-68, 2021 Apr.
Article in English | MEDLINE | ID: mdl-34568216

ABSTRACT

Objectives The primary objective of this study was to correlate nuclear morphometric parameters with clinicopathologic features such as cytologic grade, tumor size, lymph node status, mitotic index, and histopathologic grade. Secondary objective was to quantify nuclear changes on malignant breast aspirates using morphometry. Material and Methods Forty-five cases of carcinoma breast diagnosed on cytology were included in this study. These were graded into cytologic grades 1, 2, and 3 as per Robinson's cytologic grading system. Nuclear morphometry was done in all cases on smears stained with Papanicolaou stain. Clinicopathologic parameters including cytological grade, tumor size, lymph node status, mitotic count, and histological grade were correlated with nuclear morphometric parameters, namely, area, perimeter, shape, long axis, short axis, intensity, long-run emphasis, total run length, and T1 homogeneity. Results There were 9 cases in cytologic grade 1, 26 in grade 2, and 10 cases in cytologic grade 3. Histopathology showed 42 cases of infiltrating duct carcinoma, not otherwise specified (IDC, NOS) and 3 cases (6.7%) of ductal carcinoma in situ (DCIS). IDC (NOS) included 6, 27, and 9 cases in grades 1, 2, and 3, respectively. Majority of our cases had a tumor size less than 5 cm ( n = 38, 84.4%) and had positive nodes ( n = 30, 66.7%). Correlation of cytologic and histopathologic grades (including DCIS) with all morphometric features except long-run emphasis was statistically significant. Correlation of morphometry with tumor size yielded significant results for nuclear area, perimeter, long and short axes, and intensity with p < 0.05. Study of lymph node status (positive/negative) versus morphometry showed a highly significant statistical association with all the geometric as well as textural parameters. Mitotic count was significantly associated with all the geometric parameters and one textural parameter (total run length). Statistics Continuous variables were presented as mean ± standard deviation and compared using the two-tailed, independent sample t -test and one-way analysis of variance test. Tests were performed at significance level of 0.05. Conclusion Morphometry is an objective technique which holds immense promise in prognostication in breast carcinoma.

10.
Genome Biol ; 22(1): 101, 2021 04 12.
Article in English | MEDLINE | ID: mdl-33845884

ABSTRACT

High-quality chromosome-scale haplotype sequences of diploid genomes, polyploid genomes, and metagenomes provide important insights into genetic variation associated with disease and biodiversity. However, whole-genome short read sequencing does not yield haplotype information spanning whole chromosomes directly. Computational assembly of shorter haplotype fragments is required for haplotype reconstruction, which can be challenging owing to limited fragment lengths and high haplotype and repeat variability across genomes. Recent advancements in long-read and chromosome-scale sequencing technologies, alongside computational innovations, are improving the reconstruction of haplotypes at the level of whole chromosomes. Here, we review recent and discuss methodological progress and perspectives in these areas.


Subject(s)
Chromosomes , Computational Biology/methods , Genomics/methods , Haplotypes , Diploidy , High-Throughput Nucleotide Sequencing/methods , Humans , Metagenome , Metagenomics/methods , Polyploidy , Sequence Analysis, DNA/methods
11.
Nat Biotechnol ; 39(3): 309-312, 2021 03.
Article in English | MEDLINE | ID: mdl-33288905

ABSTRACT

Haplotype-resolved or phased genome assembly provides a complete picture of genomes and their complex genetic variations. However, current algorithms for phased assembly either do not generate chromosome-scale phasing or require pedigree information, which limits their application. We present a method named diploid assembly (DipAsm) that uses long, accurate reads and long-range conformation data for single individuals to generate a chromosome-scale phased assembly within 1 day. Applied to four public human genomes, PGP1, HG002, NA12878 and HG00733, DipAsm produced haplotype-resolved assemblies with minimum contig length needed to cover 50% of the known genome (NG50) up to 25 Mb and phased ~99.5% of heterozygous sites at 98-99% accuracy, outperforming other approaches in terms of both contiguity and phasing completeness. We demonstrate the importance of chromosome-scale phased assemblies for the discovery of structural variants (SVs), including thousands of new transposon insertions, and of highly polymorphic and medically important regions such as the human leukocyte antigen (HLA) and killer cell immunoglobulin-like receptor (KIR) regions. DipAsm will facilitate high-quality precision medicine and studies of individual haplotype variation and population diversity.


Subject(s)
Chromosomes, Human , Genome, Human , Haplotypes , Algorithms , Heterozygote , Humans , Polymorphism, Single Nucleotide
12.
Nat Commun ; 11(1): 4794, 2020 09 22.
Article in English | MEDLINE | ID: mdl-32963235

ABSTRACT

Most human genomes are characterized by aligning individual reads to the reference genome, but accurate long reads and linked reads now enable us to construct accurate, phased de novo assemblies. We focus on a medically important, highly variable, 5 million base-pair (bp) region where diploid assembly is particularly useful - the Major Histocompatibility Complex (MHC). Here, we develop a human genome benchmark derived from a diploid assembly for the openly-consented Genome in a Bottle sample HG002. We assemble a single contig for each haplotype, align them to the reference, call phased small and structural variants, and define a small variant benchmark for the MHC, covering 94% of the MHC and 22368 variants smaller than 50 bp, 49% more variants than a mapping-based benchmark. This benchmark reliably identifies errors in mapping-based callsets, and enables performance assessment in regions with much denser, complex variation than regions covered by previous benchmarks.


Subject(s)
Diploidy , Major Histocompatibility Complex/genetics , Benchmarking , Cell Line , Genetic Variation , Genome, Human , Haplotypes , Humans
14.
Nat Biotechnol ; 38(11): 1347-1355, 2020 11.
Article in English | MEDLINE | ID: mdl-32541955

ABSTRACT

New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution and comprehensiveness. To help translate these methods to routine research and clinical practice, we developed a sequence-resolved benchmark set for identification of both false-negative and false-positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12,745 isolated, sequence-resolved insertion (7,281) and deletion (5,464) calls ≥50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5,262 insertions and 4,095 deletions supported by ≥1 diploid assembly. We demonstrate that the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked- and long-read sequencing and optical mapping.


Subject(s)
Germ-Line Mutation/genetics , INDEL Mutation/genetics , Diploidy , Genomic Structural Variation , Humans , Molecular Sequence Annotation , Sequence Analysis, DNA
15.
Annu Rev Genomics Hum Genet ; 21: 139-162, 2020 08 31.
Article in English | MEDLINE | ID: mdl-32453966

ABSTRACT

Low-cost whole-genome assembly has enabled the collection of haplotype-resolved pangenomes for numerous organisms. In turn, this technological change is encouraging the development of methods that can precisely address the sequence and variation described in large collections of related genomes. These approaches often use graphical models of the pangenome to support algorithms for sequence alignment, visualization, functional genomics, and association studies. The additional information provided to these methods by the pangenome allows them to achieve superior performance on a variety of bioinformatic tasks, including read alignment, variant calling, and genotyping. Pangenome graphs stand to become a ubiquitous tool in genomics. Although it is unclear whether they will replace linearreference genomes, their ability to harmoniously relate multiple sequence and coordinate systems will make them useful irrespective of which pangenomic models become most common in the future.


Subject(s)
Algorithms , Computational Biology/methods , Computer Graphics , Genome, Human , High-Throughput Nucleotide Sequencing , Humans , Sequence Analysis, DNA
16.
Nucleic Acids Res ; 48(9): 5183-5195, 2020 05 21.
Article in English | MEDLINE | ID: mdl-32315033

ABSTRACT

To extend the frontier of genome editing and enable editing of repetitive elements of mammalian genomes, we made use of a set of dead-Cas9 base editor (dBE) variants that allow editing at tens of thousands of loci per cell by overcoming the cell death associated with DNA double-strand breaks and single-strand breaks. We used a set of gRNAs targeting repetitive elements-ranging in target copy number from about 32 to 161 000 per cell. dBEs enabled survival after large-scale base editing, allowing targeted mutations at up to ∼13 200 and ∼12 200 loci in 293T and human induced pluripotent stem cells (hiPSCs), respectively, three orders of magnitude greater than previously recorded. These dBEs can overcome current on-target mutation and toxicity barriers that prevent cell survival after large-scale genome engineering.


Subject(s)
Gene Editing/methods , Retroelements , CRISPR-Associated Proteins , CRISPR-Cas Systems , Cell Survival , Endodeoxyribonucleases , HEK293 Cells , Humans , Induced Pluripotent Stem Cells , Mutation , RNA
17.
Bioinformatics ; 36(8): 2385-2392, 2020 04 15.
Article in English | MEDLINE | ID: mdl-31860070

ABSTRACT

MOTIVATION: Reconstructing high-quality haplotype-resolved assemblies for related individuals has important applications in Mendelian diseases and population genomics. Through major genomics sequencing efforts such as the Personal Genome Project, the Vertebrate Genome Project (VGP) and the Genome in a Bottle project (GIAB), a variety of sequencing datasets from trios of diploid genomes are becoming available. Current trio assembly approaches are not designed to incorporate long- and short-read data from mother-father-child trios, and therefore require relatively high coverages of costly long-read data to produce high-quality assemblies. Thus, building a trio-aware assembler capable of producing accurate and chromosomal-scale diploid genomes of all individuals in a pedigree, while being cost-effective in terms of sequencing costs, is a pressing need of the genomics community. RESULTS: We present a novel pedigree sequence graph based approach to diploid assembly using accurate Illumina data and long-read Pacific Biosciences (PacBio) data from all related individuals, thereby generalizing our previous work on single individuals. We demonstrate the effectiveness of our pedigree approach on a simulated trio of pseudo-diploid yeast genomes with different heterozygosity rates, and real data from human chromosome. We show that we require as little as 30× coverage Illumina data and 15× PacBio data from each individual in a trio to generate chromosomal-scale phased assemblies. Additionally, we show that we can detect and phase variants from generated phased assemblies. AVAILABILITY AND IMPLEMENTATION: https://github.com/shilpagarg/WHdenovo.


Subject(s)
Genome , Genomics , Haplotypes , High-Throughput Nucleotide Sequencing , Humans , Pedigree , Sequence Analysis, DNA
18.
F1000Res ; 8: 1751, 2019.
Article in English | MEDLINE | ID: mdl-34386196

ABSTRACT

In March 2019, 45 scientists and software engineers from around the world converged at the University of California, Santa Cruz for the first pangenomics codeathon. The purpose of the meeting was to propose technical specifications and standards for a usable human pangenome as well as to build relevant tools for genome graph infrastructures. During the meeting, the group held several intense and productive discussions covering a diverse set of topics, including advantages of graph genomes over a linear reference representation, design of new methods that can leverage graph-based data structures, and novel visualization and annotation approaches for pangenomes. Additionally, the participants self-organized themselves into teams that worked intensely over a three-day period to build a set of pipelines and tools for specific pangenomic applications. A summary of the questions raised and the tools developed are reported in this manuscript.

20.
Nat Biotechnol ; 36(9): 875-879, 2018 10.
Article in English | MEDLINE | ID: mdl-30125266

ABSTRACT

Reference genomes guide our interpretation of DNA sequence data. However, conventional linear references represent only one version of each locus, ignoring variation in the population. Poor representation of an individual's genome sequence impacts read mapping and introduces bias. Variation graphs are bidirected DNA sequence graphs that compactly represent genetic variation across a population, including large-scale structural variation such as inversions and duplications. Previous graph genome software implementations have been limited by scalability or topological constraints. Here we present vg, a toolkit of computational methods for creating, manipulating, and using these structures as references at the scale of the human genome. vg provides an efficient approach to mapping reads onto arbitrary variation graphs using generalized compressed suffix arrays, with improved accuracy over alignment to a linear reference, and effectively removing reference bias. These capabilities make using variation graphs as references for DNA sequencing practical at a gigabase scale, or at the topological complexity of de novo assemblies.


Subject(s)
Genetic Variation , Computer Simulation , DNA/genetics , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...