Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 38
Filter
Add more filters

Publication year range
1.
Nat Immunol ; 23(8): 1208-1221, 2022 08.
Article in English | MEDLINE | ID: mdl-35879451

ABSTRACT

T cell antigen-receptor (TCR) signaling controls the development, activation and survival of T cells by involving several layers and numerous mechanisms of gene regulation. N6-methyladenosine (m6A) is the most prevalent messenger RNA modification affecting splicing, translation and stability of transcripts. In the present study, we describe the Wtap protein as essential for m6A methyltransferase complex function and reveal its crucial role in TCR signaling in mouse T cells. Wtap and m6A methyltransferase functions were required for the differentiation of thymocytes, control of activation-induced death of peripheral T cells and prevention of colitis by enabling gut RORγt+ regulatory T cell function. Transcriptome and epitranscriptomic analyses reveal that m6A modification destabilizes Orai1 and Ripk1 mRNAs. Lack of post-transcriptional repression of the encoded proteins correlated with increased store-operated calcium entry activity and diminished survival of T cells with conditional genetic inactivation of Wtap. These findings uncover how m6A modification impacts on TCR signal transduction and determines activation and survival of T cells.


Subject(s)
Cell Cycle Proteins , Methyltransferases , Adenosine/analogs & derivatives , Animals , Cell Cycle Proteins/metabolism , Methylation , Methyltransferases/genetics , Mice , RNA Splicing Factors/metabolism , RNA, Messenger/genetics , RNA, Messenger/metabolism , Signal Transduction
2.
Cell ; 177(3): 654-668.e15, 2019 04 18.
Article in English | MEDLINE | ID: mdl-30929900

ABSTRACT

New neurons arise from quiescent adult neural progenitors throughout life in specific regions of the mammalian brain. Little is known about the embryonic origin and establishment of adult neural progenitors. Here, we show that Hopx+ precursors in the mouse dentate neuroepithelium at embryonic day 11.5 give rise to proliferative Hopx+ neural progenitors in the primitive dentate region, and they, in turn, generate granule neurons, but not other neurons, throughout development and then transition into Hopx+ quiescent radial glial-like neural progenitors during an early postnatal period. RNA-seq and ATAC-seq analyses of Hopx+ embryonic, early postnatal, and adult dentate neural progenitors further reveal common molecular and epigenetic signatures and developmental dynamics. Together, our findings support a "continuous" model wherein a common neural progenitor population exclusively contributes to dentate neurogenesis throughout development and adulthood. Adult dentate neurogenesis may therefore represent a lifelong extension of development that maintains heightened plasticity in the mammalian hippocampus.


Subject(s)
Embryonic Stem Cells/metabolism , Neurogenesis , Animals , Cell Differentiation , Dentate Gyrus/metabolism , Embryo, Mammalian/metabolism , Embryonic Stem Cells/cytology , Female , Gene Expression Regulation, Developmental , Hippocampus/metabolism , Homeodomain Proteins/genetics , Homeodomain Proteins/metabolism , Male , Mice , Mice, Inbred C57BL , Mice, Transgenic , Neural Stem Cells/cytology , Neural Stem Cells/metabolism
3.
Cell ; 171(4): 877-889.e17, 2017 Nov 02.
Article in English | MEDLINE | ID: mdl-28965759

ABSTRACT

N6-methyladenosine (m6A), installed by the Mettl3/Mettl14 methyltransferase complex, is the most prevalent internal mRNA modification. Whether m6A regulates mammalian brain development is unknown. Here, we show that m6A depletion by Mettl14 knockout in embryonic mouse brains prolongs the cell cycle of radial glia cells and extends cortical neurogenesis into postnatal stages. m6A depletion by Mettl3 knockdown also leads to a prolonged cell cycle and maintenance of radial glia cells. m6A sequencing of embryonic mouse cortex reveals enrichment of mRNAs related to transcription factors, neurogenesis, the cell cycle, and neuronal differentiation, and m6A tagging promotes their decay. Further analysis uncovers previously unappreciated transcriptional prepatterning in cortical neural stem cells. m6A signaling also regulates human cortical neurogenesis in forebrain organoids. Comparison of m6A-mRNA landscapes between mouse and human cortical neurogenesis reveals enrichment of human-specific m6A tagging of transcripts related to brain-disorder risk genes. Our study identifies an epitranscriptomic mechanism in heightened transcriptional coordination during mammalian cortical neurogenesis.


Subject(s)
Neurogenesis , Prosencephalon/embryology , RNA Processing, Post-Transcriptional , RNA, Messenger/metabolism , Animals , Cell Cycle , Gene Expression Regulation , Gene Expression Regulation, Developmental , Gene Knockdown Techniques , Humans , Methylation , Methyltransferases/genetics , Methyltransferases/metabolism , Mice , Mice, Knockout , Neural Stem Cells/metabolism , Organoids/metabolism , Prosencephalon/cytology , Prosencephalon/metabolism , RNA Stability
4.
Genome Res ; 34(4): 572-589, 2024 May 15.
Article in English | MEDLINE | ID: mdl-38719471

ABSTRACT

Dormancy is a key feature of stem cell function in adult tissues as well as in embryonic cells in the context of diapause. The establishment of dormancy is an active process that involves extensive transcriptional, epigenetic, and metabolic rewiring. How these processes are coordinated to successfully transition cells to the resting dormant state remains unclear. Here we show that microRNA activity, which is otherwise dispensable for preimplantation development, is essential for the adaptation of early mouse embryos to the dormant state of diapause. In particular, the pluripotent epiblast depends on miRNA activity, the absence of which results in the loss of pluripotent cells. Through the integration of high-sensitivity small RNA expression profiling of individual embryos and protein expression of miRNA targets with public data of protein-protein interactions, we constructed the miRNA-mediated regulatory network of mouse early embryos specific to diapause. We find that individual miRNAs contribute to the combinatorial regulation by the network, and the perturbation of the network compromises embryo survival in diapause. We further identified the nutrient-sensitive transcription factor TFE3 as an upstream regulator of diapause-specific miRNAs, linking cytoplasmic MTOR activity to nuclear miRNA biogenesis. Our results place miRNAs as a critical regulatory layer for the molecular rewiring of early embryos to establish dormancy.


Subject(s)
Cell Proliferation , MicroRNAs , Pluripotent Stem Cells , Animals , MicroRNAs/genetics , MicroRNAs/metabolism , Mice , Pluripotent Stem Cells/metabolism , Pluripotent Stem Cells/cytology , Gene Expression Regulation, Developmental , Gene Regulatory Networks , Embryonic Development/genetics , Germ Layers/metabolism , Germ Layers/cytology , Blastocyst/metabolism , Blastocyst/cytology , Female
5.
Nat Methods ; 21(3): 401-405, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38317008

ABSTRACT

Unique molecular identifiers are random oligonucleotide sequences that remove PCR amplification biases. However, the impact that PCR associated sequencing errors have on the accuracy of generating absolute counts of RNA molecules is underappreciated. We show that PCR errors are a source of inaccuracy in both bulk and single-cell sequencing data, and synthesizing unique molecular identifiers using homotrimeric nucleotide blocks provides an error-correcting solution that allows absolute counting of sequenced molecules.


Subject(s)
High-Throughput Nucleotide Sequencing , Nucleotides , Sequence Analysis, RNA , Oligonucleotides/genetics , Polymerase Chain Reaction
6.
Genome Res ; 31(4): 677-688, 2021 04.
Article in English | MEDLINE | ID: mdl-33627473

ABSTRACT

A fundamental task in single-cell RNA-seq (scRNA-seq) analysis is the identification of transcriptionally distinct groups of cells. Numerous methods have been proposed for this problem, with a recent focus on methods for the cluster analysis of ultralarge scRNA-seq data sets produced by droplet-based sequencing technologies. Most existing methods rely on a sampling step to bridge the gap between algorithm scalability and volume of the data. Ignoring large parts of the data, however, often yields inaccurate groupings of cells and risks overlooking rare cell types. We propose method Specter that adopts and extends recent algorithmic advances in (fast) spectral clustering. In contrast to methods that cluster a (random) subsample of the data, we adopt the idea of landmarks that are used to create a sparse representation of the full data from which a spectral embedding can then be computed in linear time. We exploit Specter's speed in a cluster ensemble scheme that achieves a substantial improvement in accuracy over existing methods and identifies rare cell types with high sensitivity. Its linear-time complexity allows Specter to scale to millions of cells and leads to fast computation times in practice. Furthermore, on CITE-seq data that simultaneously measures gene and protein marker expression, we show that Specter is able to use multimodal omics measurements to resolve subtle transcriptomic differences between subpopulations of cells.


Subject(s)
Cluster Analysis , Gene Expression Profiling , RNA-Seq , Single-Cell Analysis , Algorithms
7.
Bioinformatics ; 39(7)2023 07 01.
Article in English | MEDLINE | ID: mdl-37432342

ABSTRACT

MOTIVATION: Alternative splicing (AS) of introns from pre-mRNA produces diverse sets of transcripts across cell types and tissues, but is also dysregulated in many diseases. Alignment-free computational methods have greatly accelerated the quantification of mRNA transcripts from short RNA-seq reads, but they inherently rely on a catalog of known transcripts and might miss novel, disease-specific splicing events. By contrast, alignment of reads to the genome can effectively identify novel exonic segments and introns. Event-based methods then count how many reads align to predefined features. However, an alignment is more expensive to compute and constitutes a bottleneck in many AS analysis methods. RESULTS: Here, we propose fortuna, a method that guesses novel combinations of annotated splice sites to create transcript fragments. It then pseudoaligns reads to fragments using kallisto and efficiently derives counts of the most elementary splicing units from kallisto's equivalence classes. These counts can be directly used for AS analysis or summarized to larger units as used by other widely applied methods. In experiments on synthetic and real data, fortuna was around 7× faster than traditional align and count approaches, and was able to analyze almost 300 million reads in just 15 min when using four threads. It mapped reads containing mismatches more accurately across novel junctions and found more reads supporting aberrant splicing events in patients with autism spectrum disorder than existing methods. We further used fortuna to identify novel, tissue-specific splicing events in Drosophila. AVAILABILITY AND IMPLEMENTATION: fortuna source code is available at https://github.com/canzarlab/fortuna.


Subject(s)
Autism Spectrum Disorder , Humans , Sequence Analysis, RNA/methods , RNA Splicing , Alternative Splicing , Software
8.
Nucleic Acids Res ; 50(10): 5565-5576, 2022 06 10.
Article in English | MEDLINE | ID: mdl-35640578

ABSTRACT

Heterochromatic silencing is thought to occur through a combination of transcriptional silencing and RNA degradation, but the relative contribution of each pathway is not known. In this study, we analyzed RNA Polymerase II (RNA Pol II) occupancy and levels of nascent and steady-state RNA in different mutants of Schizosaccharomyces pombe, in order to quantify the contribution of each pathway to heterochromatic silencing. We found that transcriptional silencing consists of two components, reduced RNA Pol II accessibility and, unexpectedly, reduced transcriptional efficiency. Heterochromatic loci showed lower transcriptional output compared to euchromatic loci, even when comparable amounts of RNA Pol II were present in both types of regions. We determined that the Ccr4-Not complex and H3K9 methylation are required for reduced transcriptional efficiency in heterochromatin and that a subset of heterochromatic RNA is degraded more rapidly than euchromatic RNA. Finally, we quantified the contribution of different chromatin modifiers, RNAi and RNA degradation to each silencing pathway. Our data show that several pathways contribute to heterochromatic silencing in a locus-specific manner and reveal transcriptional efficiency as a new mechanism of silencing.


Subject(s)
Schizosaccharomyces pombe Proteins , Schizosaccharomyces , Gene Silencing , Heterochromatin/genetics , Heterochromatin/metabolism , RNA/metabolism , RNA Interference , RNA Polymerase II/genetics , RNA Polymerase II/metabolism , RNA-Binding Proteins/metabolism , Schizosaccharomyces/genetics , Schizosaccharomyces/metabolism , Schizosaccharomyces pombe Proteins/genetics , Schizosaccharomyces pombe Proteins/metabolism
9.
RNA ; 26(10): 1489-1506, 2020 10.
Article in English | MEDLINE | ID: mdl-32636310

ABSTRACT

Chemical modifications are found on almost all RNAs and affect their coding and noncoding functions. The identification of m6A on mRNA and its important role in gene regulation stimulated the field to investigate whether additional modifications are present on mRNAs. Indeed, modifications including m1A, m5C, m7G, 2'-OMe, and Ψ were detected. However, since their abundances are low and tools used for their corroboration are often not well characterized, their physiological relevance remains largely elusive. Antibodies targeting modified nucleotides are often used but have limitations such as low affinity or specificity. Moreover, they are not always well characterized and due to the low abundance of the modification, particularly on mRNAs, generated data sets might resemble noise rather than specific modification patterns. Therefore, it is critical that the affinity and specificity is rigorously tested using complementary approaches. Here, we provide an experimental toolbox that allows for testing antibody performance prior to their use.


Subject(s)
Antibodies/genetics , Ribonucleotides/genetics , Nucleotides/genetics , RNA/genetics , RNA, Messenger/genetics
10.
Bioinformatics ; 2021 Jan 30.
Article in English | MEDLINE | ID: mdl-33515239

ABSTRACT

MOTIVATION: Alternative splicing removes intronic sequences from pre-mRNAs in alternative ways to produce different forms (isoforms) of mature mRNA. The composition of expressed transcripts gives specific functionalities to cells in a particular condition or developmental stage. In addition, a large fraction of human disease mutations affect splicing and lead to aberrant mRNA and protein products. Current methods that interrogate the transcriptome based on RNA-seq either suffer from short read length when trying to infer full-length transcripts, or are restricted to predefined units of alternative splicing that they quantify from local read evidence. RESULTS: Instead of attempting to quantify individual outcomes of the splicing process such as local splicing events or full-length transcripts, we propose to quantify alternative splicing using a simplified probabilistic model of the underlying splicing process. Our model is based on the usage of individual splice sites and can generate arbitrarily complex types of splicing patterns. In our implementation, McSplicer, we estimate the parameters of our model using all read data at once and we demonstrate in our experiments that this yields more accurate estimates compared to competing methods. Our model is able to describe multiple effects of splicing mutations using few, easy to interpret parameters, as we illustrate in an experiment on RNA-seq data from autism spectrum disorder patients. AVAILABILITY: McSplicer source code is available at https://github.com/canzarlab/McSplicer and has been deposited in archived format at https://doi.org/10.5281/zenodo.4449881. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

11.
Bioinformatics ; 37(16): 2398-2404, 2021 Aug 25.
Article in English | MEDLINE | ID: mdl-33367514

ABSTRACT

MOTIVATION: Unsupervised learning approaches are frequently used to stratify patients into clinically relevant subgroups and to identify biomarkers such as disease-associated genes. However, clustering and biclustering techniques are oblivious to the functional relationship of genes and are thus not ideally suited to pinpoint molecular mechanisms along with patient subgroups. RESULTS: We developed the network-constrained biclustering approach Biclustering Constrained by Networks (BiCoN) which (i) restricts biclusters to functionally related genes connected in molecular interaction networks and (ii) maximizes the difference in gene expression between two subgroups of patients. This allows BiCoN to simultaneously pinpoint molecular mechanisms responsible for the patient grouping. Network-constrained clustering of genes makes BiCoN more robust to noise and batch effects than typical clustering and biclustering methods. BiCoN can faithfully reproduce known disease subtypes as well as novel, clinically relevant patient subgroups, as we could demonstrate using breast and lung cancer datasets. In summary, BiCoN is a novel systems medicine tool that combines several heuristic optimization strategies for robust disease mechanism extraction. BiCoN is well-documented and freely available as a python package or a web interface. AVAILABILITY AND IMPLEMENTATION: PyPI package: https://pypi.org/project/bicon. WEB INTERFACE: https://exbio.wzw.tum.de/bicon. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

12.
Brief Bioinform ; 20(5): 1754-1768, 2019 09 27.
Article in English | MEDLINE | ID: mdl-29931155

ABSTRACT

In recent years, the emphasis of scientific inquiry has shifted from whole-genome analyses to an understanding of cellular responses specific to tissue, developmental stage or environmental conditions. One of the central mechanisms underlying the diversity and adaptability of the contextual responses is alternative splicing (AS). It enables a single gene to encode multiple isoforms with distinct biological functions. However, to date, the functions of the vast majority of differentially spliced protein isoforms are not known. Integration of genomic, proteomic, functional, phenotypic and contextual information is essential for supporting isoform-based modeling and analysis. Such integrative proteogenomics approaches promise to provide insights into the functions of the alternatively spliced protein isoforms and provide high-confidence hypotheses to be validated experimentally. This manuscript provides a survey of the public databases supporting isoform-based biology. It also presents an overview of the potential global impact of AS on the human canonical gene functions, molecular interactions and cellular pathways.


Subject(s)
Alternative Splicing , Protein Isoforms/metabolism , Computational Biology , Databases, Protein , Humans
13.
Mol Cell Proteomics ; 18(4): 760-772, 2019 04.
Article in English | MEDLINE | ID: mdl-30630937

ABSTRACT

Neutrophil granulocytes are critical mediators of innate immunity and tissue regeneration. Rare diseases of neutrophil granulocytes may affect their differentiation and/or functions. However, there are very few validated diagnostic tests assessing the functions of neutrophil granulocytes in these diseases. Here, we set out to probe omics analysis as a novel diagnostic platform for patients with defective differentiation and function of neutrophil granulocytes. We analyzed highly purified neutrophil granulocytes from 68 healthy individuals and 16 patients with rare monogenic diseases. Cells were isolated from fresh venous blood (purity >99%) and used to create a spectral library covering almost 8000 proteins using strong cation exchange fractionation. Patient neutrophil samples were then analyzed by data-independent acquisition proteomics, quantifying 4154 proteins in each sample. Neutrophils with mutations in the neutrophil elastase gene ELANE showed large proteome changes that suggest these mutations may affect maturation of neutrophil granulocytes and initiate misfolded protein response and cellular stress mechanisms. In contrast, only few proteins changed in patients with leukocyte adhesion deficiency (LAD) and chronic granulomatous disease (CGD). Strikingly, neutrophil transcriptome analysis showed no correlation with its proteome. In case of two patients with undetermined genetic causes, proteome analysis guided the targeted genetic diagnostics and uncovered the underlying genomic mutations. Data-independent acquisition proteomics may help to define novel pathomechanisms in neutrophil diseases and provide a clinically useful diagnostic dimension.


Subject(s)
Disease , Neutrophils/metabolism , Proteome/metabolism , Proteomics , Base Sequence , Disease/genetics , Humans , RNA, Messenger/genetics , RNA, Messenger/metabolism
14.
Genome Res ; 27(1): 145-156, 2017 01.
Article in English | MEDLINE | ID: mdl-27856494

ABSTRACT

Alternative splicing increases the diversity of transcriptomes and proteomes in metazoans. The extent to which alternative splicing is active and functional in unicellular organisms is less understood. Here, we exploit a single-molecule long-read sequencing technique and develop an open-source software program called SpliceHunter to characterize the transcriptome in the meiosis of fission yeast. We reveal 14,353 alternative splicing events in 17,669 novel isoforms at different stages of meiosis, including antisense and read-through transcripts. Intron retention is the major type of alternative splicing, followed by alternate "intron in exon." Seven hundred seventy novel transcription units are detected; 53 of the predicted proteins show homology in other species and form theoretical stable structures. We report the complexity of alternative splicing along isoforms, including 683 intra-molecularly co-associated intron pairs. We compare the dynamics of novel isoforms based on the number of supporting full-length reads with those of annotated isoforms and explore the translational capacity and quality of novel isoforms. The evaluation of these factors indicates that the majority of novel isoforms are unlikely to be both condition-specific and translatable but consistent with the possibility of biologically functional novel isoforms. Moreover, the co-option of these unusual transcripts into newly born genes seems likely. Together, the results of this study highlight the diversity and dynamics at the isoform level in the sexual development of fission yeast.


Subject(s)
Alternative Splicing/genetics , Meiosis/genetics , Schizosaccharomyces/genetics , Transcriptome/genetics , Exons/genetics , Humans , Introns/genetics , Molecular Sequence Annotation , Proteome/genetics , Software
15.
Bioinformatics ; 33(3): 425-427, 2017 02 01.
Article in English | MEDLINE | ID: mdl-28172415

ABSTRACT

Motivation: The B-cell receptor enables individual B cells to identify diverse antigens, including bacterial and viral proteins. While advances in RNA-sequencing (RNA-seq) have enabled high throughput profiling of transcript expression in single cells, the unique task of assembling the full-length heavy and light chain sequences from single cell RNA-seq (scRNA-seq) in B cells has been largely unstudied. Results: We developed a new software tool, BASIC, which allows investigators to use scRNA-seq for assembling BCR sequences at single-cell resolution. To demonstrate the utility of our software, we subjected nearly 200 single human B cells to scRNA-seq, assembled the full-length heavy and the light chains, and experimentally confirmed these results by using single-cell primer-based nested PCRs and Sanger sequencing. Availability and Implementation: http://ttic.uchicago.edu/∼aakhan/BASIC Contact: aakhan@ttic.edu Supplementary Information: Supplementary data are available at Bioinformatics online.


Subject(s)
Gene Expression Profiling/methods , Receptors, Antigen, B-Cell/genetics , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Software , Gene Expression Regulation , Humans
16.
Bioinformatics ; 32(17): i658-i664, 2016 09 01.
Article in English | MEDLINE | ID: mdl-27587686

ABSTRACT

MOTIVATION: As an increasing amount of protein-protein interaction (PPI) data becomes available, their computational interpretation has become an important problem in bioinformatics. The alignment of PPI networks from different species provides valuable information about conserved subnetworks, evolutionary pathways and functional orthologs. Although several methods have been proposed for global network alignment, there is a pressing need for methods that produce more accurate alignments in terms of both topological and functional consistency. RESULTS: In this work, we present a novel global network alignment algorithm, named ModuleAlign, which makes use of local topology information to define a module-based homology score. Based on a hierarchical clustering of functionally coherent proteins involved in the same module, ModuleAlign employs a novel iterative scheme to find the alignment between two networks. Evaluated on a diverse set of benchmarks, ModuleAlign outperforms state-of-the-art methods in producing functionally consistent alignments. By aligning Pathogen-Human PPI networks, ModuleAlign also detects a novel set of conserved human genes that pathogens preferentially target to cause pathogenesis. AVAILABILITY: http://ttic.uchicago.edu/∼hashemifar/ModuleAlign.html CONTACT: canzar@ttic.edu or j3xu.ttic.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Protein Interaction Mapping , Protein Interaction Maps , Humans , Proteins , Software
17.
Proc IEEE Inst Electr Electron Eng ; 105(3): 436-458, 2017 Mar.
Article in English | MEDLINE | ID: mdl-28502990

ABSTRACT

Ultra-high-throughput next-generation sequencing (NGS) technology allows us to determine the sequence of nucleotides of many millions of DNA molecules in parallel. Accompanied by a dramatic reduction in cost since its introduction in 2004, NGS technology has provided a new way of addressing a wide range of biological and biomedical questions, from the study of human genetic disease to the analysis of gene expression, protein-DNA interactions, and patterns of DNA methylation. The data generated by NGS instruments comprise huge numbers of very short DNA sequences, or 'reads', that carry little information by themselves. These reads therefore have to be pieced together by well-engineered algorithms to reconstruct biologically meaningful measurments, such as the level of expression of a gene. To solve this complex, high-dimensional puzzle, reads must be mapped back to a reference genome to determine their origin Due to sequencing errors and to genuine differences between the reference genome and the individual being sequenced, this mapping process must be tolerant of mismatches, insertions, and deletions. Although optimal alignment algorithms to solve this problem have long been available, the practical requirements of aligning hundreds of millions of short reads to the 3 billion base pair long human genome have stimulated the development of new, more efficient methods, which today are used routinely throughout the world for the analysis of NGS data.

18.
Bioinformatics ; 29(14): 1718-25, 2013 Jul 15.
Article in English | MEDLINE | ID: mdl-23665771

ABSTRACT

MOTIVATION: A large and rapidly growing number of bacterial organisms have been sequenced by the newest sequencing technologies. Cheaper and faster sequencing technologies make it easy to generate very high coverage of bacterial genomes, but these advances mean that DNA preparation costs can exceed the cost of sequencing for small genomes. The need to contain costs often results in the creation of only a single sequencing library, which in turn introduces new challenges for genome assembly methods. RESULTS: We evaluated the ability of multiple genome assembly programs to assemble bacterial genomes from a single, deep-coverage library. For our comparison, we chose bacterial species spanning a wide range of GC content and measured the contiguity and accuracy of the resulting assemblies. We compared the assemblies produced by this very high-coverage, one-library strategy to the best assemblies created by two-library sequencing, and we found that remarkably good bacterial assemblies are possible with just one library. We also measured the effect of read length and depth of coverage on assembly quality and determined the values that provide the best results with current algorithms. CONTACT: salzberg@jhu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome, Bacterial , Genomics/methods , Software , Algorithms , Gene Library , Sequence Analysis, DNA
19.
Algorithms Mol Biol ; 19(1): 21, 2024 Jun 11.
Article in English | MEDLINE | ID: mdl-38863064

ABSTRACT

Metric multidimensional scaling is one of the classical methods for embedding data into low-dimensional Euclidean space. It creates the low-dimensional embedding by approximately preserving the pairwise distances between the input points. However, current state-of-the-art approaches only scale to a few thousand data points. For larger data sets such as those occurring in single-cell RNA sequencing experiments, the running time becomes prohibitively large and thus alternative methods such as PCA are widely used instead. Here, we propose a simple neural network-based approach for solving the metric multidimensional scaling problem that is orders of magnitude faster than previous state-of-the-art approaches, and hence scales to data sets with up to a few million cells. At the same time, it provides a non-linear mapping between high- and low-dimensional space that can place previously unseen cells in the same embedding.

20.
bioRxiv ; 2024 Apr 03.
Article in English | MEDLINE | ID: mdl-38617276

ABSTRACT

Y chromosomes of great apes harbor Ampliconic Genes (YAGs)-multi-copy gene families (BPY2, CDY, DAZ, HSFY, PRY, RBMY, TSPY, VCY, and XKRY) that encode proteins important for spermatogenesis. Previous work assembled YAG transcripts based on their targeted sequencing but not using reference genome assemblies, potentially resulting in an incomplete transcript repertoire. Here we used the recently produced gapless telomere-to-telomere (T2T) Y chromosome assemblies of great ape species (bonobo, chimpanzee, human, gorilla, Bornean orangutan, and Sumatran orangutan) and analyzed RNA data from whole-testis samples for the same species. We generated hybrid transcriptome assemblies by combining targeted long reads (Pacific Biosciences), untargeted long reads (Pacific Biosciences) and untargeted short reads (Illumina)and mapping them to the T2T reference genomes. Compared to the results from the reference-free approach, average transcript length was more than two times higher, and the total number of transcripts decreased three times, improving the quality of the assembled transcriptome. The reference-based transcriptome assemblies allowed us to differentiate transcripts originating from different Y chromosome gene copies and from their non-Y chromosome homologs. We identified two sources of transcriptome diversity-alternative splicing and gene duplication with subsequent diversification of gene copies. For each gene family, we detected transcribed pseudogenes along with protein-coding gene copies. We revealed previously unannotated gene copies of YAGs as compared to currently available NCBI annotations, as well as novel isoforms for annotated gene copies. This analysis paves the way for better understanding Y chromosome gene functions, which is important given their role in spermatogenesis.

SELECTION OF CITATIONS
SEARCH DETAIL