Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
Nat Immunol ; 23(8): 1208-1221, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-35879451

RESUMEN

T cell antigen-receptor (TCR) signaling controls the development, activation and survival of T cells by involving several layers and numerous mechanisms of gene regulation. N6-methyladenosine (m6A) is the most prevalent messenger RNA modification affecting splicing, translation and stability of transcripts. In the present study, we describe the Wtap protein as essential for m6A methyltransferase complex function and reveal its crucial role in TCR signaling in mouse T cells. Wtap and m6A methyltransferase functions were required for the differentiation of thymocytes, control of activation-induced death of peripheral T cells and prevention of colitis by enabling gut RORγt+ regulatory T cell function. Transcriptome and epitranscriptomic analyses reveal that m6A modification destabilizes Orai1 and Ripk1 mRNAs. Lack of post-transcriptional repression of the encoded proteins correlated with increased store-operated calcium entry activity and diminished survival of T cells with conditional genetic inactivation of Wtap. These findings uncover how m6A modification impacts on TCR signal transduction and determines activation and survival of T cells.


Asunto(s)
Proteínas de Ciclo Celular , Metiltransferasas , Adenosina/análogos & derivados , Animales , Proteínas de Ciclo Celular/metabolismo , Metilación , Metiltransferasas/genética , Ratones , Factores de Empalme de ARN/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismo , Transducción de Señal
2.
Cell ; 177(3): 654-668.e15, 2019 04 18.
Artículo en Inglés | MEDLINE | ID: mdl-30929900

RESUMEN

New neurons arise from quiescent adult neural progenitors throughout life in specific regions of the mammalian brain. Little is known about the embryonic origin and establishment of adult neural progenitors. Here, we show that Hopx+ precursors in the mouse dentate neuroepithelium at embryonic day 11.5 give rise to proliferative Hopx+ neural progenitors in the primitive dentate region, and they, in turn, generate granule neurons, but not other neurons, throughout development and then transition into Hopx+ quiescent radial glial-like neural progenitors during an early postnatal period. RNA-seq and ATAC-seq analyses of Hopx+ embryonic, early postnatal, and adult dentate neural progenitors further reveal common molecular and epigenetic signatures and developmental dynamics. Together, our findings support a "continuous" model wherein a common neural progenitor population exclusively contributes to dentate neurogenesis throughout development and adulthood. Adult dentate neurogenesis may therefore represent a lifelong extension of development that maintains heightened plasticity in the mammalian hippocampus.


Asunto(s)
Células Madre Embrionarias/metabolismo , Neurogénesis , Animales , Diferenciación Celular , Giro Dentado/metabolismo , Embrión de Mamíferos/metabolismo , Células Madre Embrionarias/citología , Femenino , Regulación del Desarrollo de la Expresión Génica , Hipocampo/metabolismo , Proteínas de Homeodominio/genética , Proteínas de Homeodominio/metabolismo , Masculino , Ratones , Ratones Endogámicos C57BL , Ratones Transgénicos , Células-Madre Neurales/citología , Células-Madre Neurales/metabolismo
3.
Cell ; 171(4): 877-889.e17, 2017 Nov 02.
Artículo en Inglés | MEDLINE | ID: mdl-28965759

RESUMEN

N6-methyladenosine (m6A), installed by the Mettl3/Mettl14 methyltransferase complex, is the most prevalent internal mRNA modification. Whether m6A regulates mammalian brain development is unknown. Here, we show that m6A depletion by Mettl14 knockout in embryonic mouse brains prolongs the cell cycle of radial glia cells and extends cortical neurogenesis into postnatal stages. m6A depletion by Mettl3 knockdown also leads to a prolonged cell cycle and maintenance of radial glia cells. m6A sequencing of embryonic mouse cortex reveals enrichment of mRNAs related to transcription factors, neurogenesis, the cell cycle, and neuronal differentiation, and m6A tagging promotes their decay. Further analysis uncovers previously unappreciated transcriptional prepatterning in cortical neural stem cells. m6A signaling also regulates human cortical neurogenesis in forebrain organoids. Comparison of m6A-mRNA landscapes between mouse and human cortical neurogenesis reveals enrichment of human-specific m6A tagging of transcripts related to brain-disorder risk genes. Our study identifies an epitranscriptomic mechanism in heightened transcriptional coordination during mammalian cortical neurogenesis.


Asunto(s)
Neurogénesis , Prosencéfalo/embriología , Procesamiento Postranscripcional del ARN , ARN Mensajero/metabolismo , Animales , Ciclo Celular , Regulación de la Expresión Génica , Regulación del Desarrollo de la Expresión Génica , Técnicas de Silenciamiento del Gen , Humanos , Metilación , Metiltransferasas/genética , Metiltransferasas/metabolismo , Ratones , Ratones Noqueados , Células-Madre Neurales/metabolismo , Organoides/metabolismo , Prosencéfalo/citología , Prosencéfalo/metabolismo , Estabilidad del ARN
4.
Genome Res ; 34(4): 572-589, 2024 05 15.
Artículo en Inglés | MEDLINE | ID: mdl-38719471

RESUMEN

Dormancy is a key feature of stem cell function in adult tissues as well as in embryonic cells in the context of diapause. The establishment of dormancy is an active process that involves extensive transcriptional, epigenetic, and metabolic rewiring. How these processes are coordinated to successfully transition cells to the resting dormant state remains unclear. Here we show that microRNA activity, which is otherwise dispensable for preimplantation development, is essential for the adaptation of early mouse embryos to the dormant state of diapause. In particular, the pluripotent epiblast depends on miRNA activity, the absence of which results in the loss of pluripotent cells. Through the integration of high-sensitivity small RNA expression profiling of individual embryos and protein expression of miRNA targets with public data of protein-protein interactions, we constructed the miRNA-mediated regulatory network of mouse early embryos specific to diapause. We find that individual miRNAs contribute to the combinatorial regulation by the network, and the perturbation of the network compromises embryo survival in diapause. We further identified the nutrient-sensitive transcription factor TFE3 as an upstream regulator of diapause-specific miRNAs, linking cytoplasmic MTOR activity to nuclear miRNA biogenesis. Our results place miRNAs as a critical regulatory layer for the molecular rewiring of early embryos to establish dormancy.


Asunto(s)
Proliferación Celular , MicroARNs , Células Madre Pluripotentes , Animales , MicroARNs/genética , MicroARNs/metabolismo , Ratones , Células Madre Pluripotentes/metabolismo , Células Madre Pluripotentes/citología , Regulación del Desarrollo de la Expresión Génica , Redes Reguladoras de Genes , Desarrollo Embrionario/genética , Estratos Germinativos/metabolismo , Estratos Germinativos/citología , Blastocisto/metabolismo , Blastocisto/citología , Femenino
5.
Nat Methods ; 21(3): 401-405, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38317008

RESUMEN

Unique molecular identifiers are random oligonucleotide sequences that remove PCR amplification biases. However, the impact that PCR associated sequencing errors have on the accuracy of generating absolute counts of RNA molecules is underappreciated. We show that PCR errors are a source of inaccuracy in both bulk and single-cell sequencing data, and synthesizing unique molecular identifiers using homotrimeric nucleotide blocks provides an error-correcting solution that allows absolute counting of sequenced molecules.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Nucleótidos , Análisis de Secuencia de ARN , Oligonucleótidos/genética , Reacción en Cadena de la Polimerasa
6.
Genome Res ; 31(4): 677-688, 2021 04.
Artículo en Inglés | MEDLINE | ID: mdl-33627473

RESUMEN

A fundamental task in single-cell RNA-seq (scRNA-seq) analysis is the identification of transcriptionally distinct groups of cells. Numerous methods have been proposed for this problem, with a recent focus on methods for the cluster analysis of ultralarge scRNA-seq data sets produced by droplet-based sequencing technologies. Most existing methods rely on a sampling step to bridge the gap between algorithm scalability and volume of the data. Ignoring large parts of the data, however, often yields inaccurate groupings of cells and risks overlooking rare cell types. We propose method Specter that adopts and extends recent algorithmic advances in (fast) spectral clustering. In contrast to methods that cluster a (random) subsample of the data, we adopt the idea of landmarks that are used to create a sparse representation of the full data from which a spectral embedding can then be computed in linear time. We exploit Specter's speed in a cluster ensemble scheme that achieves a substantial improvement in accuracy over existing methods and identifies rare cell types with high sensitivity. Its linear-time complexity allows Specter to scale to millions of cells and leads to fast computation times in practice. Furthermore, on CITE-seq data that simultaneously measures gene and protein marker expression, we show that Specter is able to use multimodal omics measurements to resolve subtle transcriptomic differences between subpopulations of cells.


Asunto(s)
Análisis por Conglomerados , Perfilación de la Expresión Génica , RNA-Seq , Análisis de la Célula Individual , Algoritmos
7.
Bioinformatics ; 39(7)2023 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-37432342

RESUMEN

MOTIVATION: Alternative splicing (AS) of introns from pre-mRNA produces diverse sets of transcripts across cell types and tissues, but is also dysregulated in many diseases. Alignment-free computational methods have greatly accelerated the quantification of mRNA transcripts from short RNA-seq reads, but they inherently rely on a catalog of known transcripts and might miss novel, disease-specific splicing events. By contrast, alignment of reads to the genome can effectively identify novel exonic segments and introns. Event-based methods then count how many reads align to predefined features. However, an alignment is more expensive to compute and constitutes a bottleneck in many AS analysis methods. RESULTS: Here, we propose fortuna, a method that guesses novel combinations of annotated splice sites to create transcript fragments. It then pseudoaligns reads to fragments using kallisto and efficiently derives counts of the most elementary splicing units from kallisto's equivalence classes. These counts can be directly used for AS analysis or summarized to larger units as used by other widely applied methods. In experiments on synthetic and real data, fortuna was around 7× faster than traditional align and count approaches, and was able to analyze almost 300 million reads in just 15 min when using four threads. It mapped reads containing mismatches more accurately across novel junctions and found more reads supporting aberrant splicing events in patients with autism spectrum disorder than existing methods. We further used fortuna to identify novel, tissue-specific splicing events in Drosophila. AVAILABILITY AND IMPLEMENTATION: fortuna source code is available at https://github.com/canzarlab/fortuna.


Asunto(s)
Trastorno del Espectro Autista , Humanos , Análisis de Secuencia de ARN/métodos , Empalme del ARN , Empalme Alternativo , Programas Informáticos
8.
Nucleic Acids Res ; 50(10): 5565-5576, 2022 06 10.
Artículo en Inglés | MEDLINE | ID: mdl-35640578

RESUMEN

Heterochromatic silencing is thought to occur through a combination of transcriptional silencing and RNA degradation, but the relative contribution of each pathway is not known. In this study, we analyzed RNA Polymerase II (RNA Pol II) occupancy and levels of nascent and steady-state RNA in different mutants of Schizosaccharomyces pombe, in order to quantify the contribution of each pathway to heterochromatic silencing. We found that transcriptional silencing consists of two components, reduced RNA Pol II accessibility and, unexpectedly, reduced transcriptional efficiency. Heterochromatic loci showed lower transcriptional output compared to euchromatic loci, even when comparable amounts of RNA Pol II were present in both types of regions. We determined that the Ccr4-Not complex and H3K9 methylation are required for reduced transcriptional efficiency in heterochromatin and that a subset of heterochromatic RNA is degraded more rapidly than euchromatic RNA. Finally, we quantified the contribution of different chromatin modifiers, RNAi and RNA degradation to each silencing pathway. Our data show that several pathways contribute to heterochromatic silencing in a locus-specific manner and reveal transcriptional efficiency as a new mechanism of silencing.


Asunto(s)
Proteínas de Schizosaccharomyces pombe , Schizosaccharomyces , Silenciador del Gen , Heterocromatina/genética , Heterocromatina/metabolismo , ARN/metabolismo , Interferencia de ARN , ARN Polimerasa II/genética , ARN Polimerasa II/metabolismo , Proteínas de Unión al ARN/metabolismo , Schizosaccharomyces/genética , Schizosaccharomyces/metabolismo , Proteínas de Schizosaccharomyces pombe/genética , Proteínas de Schizosaccharomyces pombe/metabolismo
9.
RNA ; 26(10): 1489-1506, 2020 10.
Artículo en Inglés | MEDLINE | ID: mdl-32636310

RESUMEN

Chemical modifications are found on almost all RNAs and affect their coding and noncoding functions. The identification of m6A on mRNA and its important role in gene regulation stimulated the field to investigate whether additional modifications are present on mRNAs. Indeed, modifications including m1A, m5C, m7G, 2'-OMe, and Ψ were detected. However, since their abundances are low and tools used for their corroboration are often not well characterized, their physiological relevance remains largely elusive. Antibodies targeting modified nucleotides are often used but have limitations such as low affinity or specificity. Moreover, they are not always well characterized and due to the low abundance of the modification, particularly on mRNAs, generated data sets might resemble noise rather than specific modification patterns. Therefore, it is critical that the affinity and specificity is rigorously tested using complementary approaches. Here, we provide an experimental toolbox that allows for testing antibody performance prior to their use.


Asunto(s)
Anticuerpos/genética , Ribonucleótidos/genética , Nucleótidos/genética , ARN/genética , ARN Mensajero/genética
10.
Bioinformatics ; 2021 Jan 30.
Artículo en Inglés | MEDLINE | ID: mdl-33515239

RESUMEN

MOTIVATION: Alternative splicing removes intronic sequences from pre-mRNAs in alternative ways to produce different forms (isoforms) of mature mRNA. The composition of expressed transcripts gives specific functionalities to cells in a particular condition or developmental stage. In addition, a large fraction of human disease mutations affect splicing and lead to aberrant mRNA and protein products. Current methods that interrogate the transcriptome based on RNA-seq either suffer from short read length when trying to infer full-length transcripts, or are restricted to predefined units of alternative splicing that they quantify from local read evidence. RESULTS: Instead of attempting to quantify individual outcomes of the splicing process such as local splicing events or full-length transcripts, we propose to quantify alternative splicing using a simplified probabilistic model of the underlying splicing process. Our model is based on the usage of individual splice sites and can generate arbitrarily complex types of splicing patterns. In our implementation, McSplicer, we estimate the parameters of our model using all read data at once and we demonstrate in our experiments that this yields more accurate estimates compared to competing methods. Our model is able to describe multiple effects of splicing mutations using few, easy to interpret parameters, as we illustrate in an experiment on RNA-seq data from autism spectrum disorder patients. AVAILABILITY: McSplicer source code is available at https://github.com/canzarlab/McSplicer and has been deposited in archived format at https://doi.org/10.5281/zenodo.4449881. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

11.
Bioinformatics ; 37(16): 2398-2404, 2021 Aug 25.
Artículo en Inglés | MEDLINE | ID: mdl-33367514

RESUMEN

MOTIVATION: Unsupervised learning approaches are frequently used to stratify patients into clinically relevant subgroups and to identify biomarkers such as disease-associated genes. However, clustering and biclustering techniques are oblivious to the functional relationship of genes and are thus not ideally suited to pinpoint molecular mechanisms along with patient subgroups. RESULTS: We developed the network-constrained biclustering approach Biclustering Constrained by Networks (BiCoN) which (i) restricts biclusters to functionally related genes connected in molecular interaction networks and (ii) maximizes the difference in gene expression between two subgroups of patients. This allows BiCoN to simultaneously pinpoint molecular mechanisms responsible for the patient grouping. Network-constrained clustering of genes makes BiCoN more robust to noise and batch effects than typical clustering and biclustering methods. BiCoN can faithfully reproduce known disease subtypes as well as novel, clinically relevant patient subgroups, as we could demonstrate using breast and lung cancer datasets. In summary, BiCoN is a novel systems medicine tool that combines several heuristic optimization strategies for robust disease mechanism extraction. BiCoN is well-documented and freely available as a python package or a web interface. AVAILABILITY AND IMPLEMENTATION: PyPI package: https://pypi.org/project/bicon. WEB INTERFACE: https://exbio.wzw.tum.de/bicon. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

12.
Brief Bioinform ; 20(5): 1754-1768, 2019 09 27.
Artículo en Inglés | MEDLINE | ID: mdl-29931155

RESUMEN

In recent years, the emphasis of scientific inquiry has shifted from whole-genome analyses to an understanding of cellular responses specific to tissue, developmental stage or environmental conditions. One of the central mechanisms underlying the diversity and adaptability of the contextual responses is alternative splicing (AS). It enables a single gene to encode multiple isoforms with distinct biological functions. However, to date, the functions of the vast majority of differentially spliced protein isoforms are not known. Integration of genomic, proteomic, functional, phenotypic and contextual information is essential for supporting isoform-based modeling and analysis. Such integrative proteogenomics approaches promise to provide insights into the functions of the alternatively spliced protein isoforms and provide high-confidence hypotheses to be validated experimentally. This manuscript provides a survey of the public databases supporting isoform-based biology. It also presents an overview of the potential global impact of AS on the human canonical gene functions, molecular interactions and cellular pathways.


Asunto(s)
Empalme Alternativo , Isoformas de Proteínas/metabolismo , Biología Computacional , Bases de Datos de Proteínas , Humanos
13.
Mol Cell Proteomics ; 18(4): 760-772, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-30630937

RESUMEN

Neutrophil granulocytes are critical mediators of innate immunity and tissue regeneration. Rare diseases of neutrophil granulocytes may affect their differentiation and/or functions. However, there are very few validated diagnostic tests assessing the functions of neutrophil granulocytes in these diseases. Here, we set out to probe omics analysis as a novel diagnostic platform for patients with defective differentiation and function of neutrophil granulocytes. We analyzed highly purified neutrophil granulocytes from 68 healthy individuals and 16 patients with rare monogenic diseases. Cells were isolated from fresh venous blood (purity >99%) and used to create a spectral library covering almost 8000 proteins using strong cation exchange fractionation. Patient neutrophil samples were then analyzed by data-independent acquisition proteomics, quantifying 4154 proteins in each sample. Neutrophils with mutations in the neutrophil elastase gene ELANE showed large proteome changes that suggest these mutations may affect maturation of neutrophil granulocytes and initiate misfolded protein response and cellular stress mechanisms. In contrast, only few proteins changed in patients with leukocyte adhesion deficiency (LAD) and chronic granulomatous disease (CGD). Strikingly, neutrophil transcriptome analysis showed no correlation with its proteome. In case of two patients with undetermined genetic causes, proteome analysis guided the targeted genetic diagnostics and uncovered the underlying genomic mutations. Data-independent acquisition proteomics may help to define novel pathomechanisms in neutrophil diseases and provide a clinically useful diagnostic dimension.


Asunto(s)
Enfermedad , Neutrófilos/metabolismo , Proteoma/metabolismo , Proteómica , Secuencia de Bases , Enfermedad/genética , Humanos , ARN Mensajero/genética , ARN Mensajero/metabolismo
14.
Genome Res ; 27(1): 145-156, 2017 01.
Artículo en Inglés | MEDLINE | ID: mdl-27856494

RESUMEN

Alternative splicing increases the diversity of transcriptomes and proteomes in metazoans. The extent to which alternative splicing is active and functional in unicellular organisms is less understood. Here, we exploit a single-molecule long-read sequencing technique and develop an open-source software program called SpliceHunter to characterize the transcriptome in the meiosis of fission yeast. We reveal 14,353 alternative splicing events in 17,669 novel isoforms at different stages of meiosis, including antisense and read-through transcripts. Intron retention is the major type of alternative splicing, followed by alternate "intron in exon." Seven hundred seventy novel transcription units are detected; 53 of the predicted proteins show homology in other species and form theoretical stable structures. We report the complexity of alternative splicing along isoforms, including 683 intra-molecularly co-associated intron pairs. We compare the dynamics of novel isoforms based on the number of supporting full-length reads with those of annotated isoforms and explore the translational capacity and quality of novel isoforms. The evaluation of these factors indicates that the majority of novel isoforms are unlikely to be both condition-specific and translatable but consistent with the possibility of biologically functional novel isoforms. Moreover, the co-option of these unusual transcripts into newly born genes seems likely. Together, the results of this study highlight the diversity and dynamics at the isoform level in the sexual development of fission yeast.


Asunto(s)
Empalme Alternativo/genética , Meiosis/genética , Schizosaccharomyces/genética , Transcriptoma/genética , Exones/genética , Humanos , Intrones/genética , Anotación de Secuencia Molecular , Proteoma/genética , Programas Informáticos
15.
Bioinformatics ; 33(3): 425-427, 2017 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-28172415

RESUMEN

Motivation: The B-cell receptor enables individual B cells to identify diverse antigens, including bacterial and viral proteins. While advances in RNA-sequencing (RNA-seq) have enabled high throughput profiling of transcript expression in single cells, the unique task of assembling the full-length heavy and light chain sequences from single cell RNA-seq (scRNA-seq) in B cells has been largely unstudied. Results: We developed a new software tool, BASIC, which allows investigators to use scRNA-seq for assembling BCR sequences at single-cell resolution. To demonstrate the utility of our software, we subjected nearly 200 single human B cells to scRNA-seq, assembled the full-length heavy and the light chains, and experimentally confirmed these results by using single-cell primer-based nested PCRs and Sanger sequencing. Availability and Implementation: http://ttic.uchicago.edu/∼aakhan/BASIC Contact: aakhan@ttic.edu Supplementary Information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Receptores de Antígenos de Linfocitos B/genética , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Programas Informáticos , Regulación de la Expresión Génica , Humanos
16.
Bioinformatics ; 32(17): i658-i664, 2016 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-27587686

RESUMEN

MOTIVATION: As an increasing amount of protein-protein interaction (PPI) data becomes available, their computational interpretation has become an important problem in bioinformatics. The alignment of PPI networks from different species provides valuable information about conserved subnetworks, evolutionary pathways and functional orthologs. Although several methods have been proposed for global network alignment, there is a pressing need for methods that produce more accurate alignments in terms of both topological and functional consistency. RESULTS: In this work, we present a novel global network alignment algorithm, named ModuleAlign, which makes use of local topology information to define a module-based homology score. Based on a hierarchical clustering of functionally coherent proteins involved in the same module, ModuleAlign employs a novel iterative scheme to find the alignment between two networks. Evaluated on a diverse set of benchmarks, ModuleAlign outperforms state-of-the-art methods in producing functionally consistent alignments. By aligning Pathogen-Human PPI networks, ModuleAlign also detects a novel set of conserved human genes that pathogens preferentially target to cause pathogenesis. AVAILABILITY: http://ttic.uchicago.edu/∼hashemifar/ModuleAlign.html CONTACT: canzar@ttic.edu or j3xu.ttic.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Mapeo de Interacción de Proteínas , Mapas de Interacción de Proteínas , Humanos , Proteínas , Programas Informáticos
17.
Proc IEEE Inst Electr Electron Eng ; 105(3): 436-458, 2017 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-28502990

RESUMEN

Ultra-high-throughput next-generation sequencing (NGS) technology allows us to determine the sequence of nucleotides of many millions of DNA molecules in parallel. Accompanied by a dramatic reduction in cost since its introduction in 2004, NGS technology has provided a new way of addressing a wide range of biological and biomedical questions, from the study of human genetic disease to the analysis of gene expression, protein-DNA interactions, and patterns of DNA methylation. The data generated by NGS instruments comprise huge numbers of very short DNA sequences, or 'reads', that carry little information by themselves. These reads therefore have to be pieced together by well-engineered algorithms to reconstruct biologically meaningful measurments, such as the level of expression of a gene. To solve this complex, high-dimensional puzzle, reads must be mapped back to a reference genome to determine their origin Due to sequencing errors and to genuine differences between the reference genome and the individual being sequenced, this mapping process must be tolerant of mismatches, insertions, and deletions. Although optimal alignment algorithms to solve this problem have long been available, the practical requirements of aligning hundreds of millions of short reads to the 3 billion base pair long human genome have stimulated the development of new, more efficient methods, which today are used routinely throughout the world for the analysis of NGS data.

18.
Bioinformatics ; 29(14): 1718-25, 2013 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-23665771

RESUMEN

MOTIVATION: A large and rapidly growing number of bacterial organisms have been sequenced by the newest sequencing technologies. Cheaper and faster sequencing technologies make it easy to generate very high coverage of bacterial genomes, but these advances mean that DNA preparation costs can exceed the cost of sequencing for small genomes. The need to contain costs often results in the creation of only a single sequencing library, which in turn introduces new challenges for genome assembly methods. RESULTS: We evaluated the ability of multiple genome assembly programs to assemble bacterial genomes from a single, deep-coverage library. For our comparison, we chose bacterial species spanning a wide range of GC content and measured the contiguity and accuracy of the resulting assemblies. We compared the assemblies produced by this very high-coverage, one-library strategy to the best assemblies created by two-library sequencing, and we found that remarkably good bacterial assemblies are possible with just one library. We also measured the effect of read length and depth of coverage on assembly quality and determined the values that provide the best results with current algorithms. CONTACT: salzberg@jhu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genoma Bacteriano , Genómica/métodos , Programas Informáticos , Algoritmos , Biblioteca de Genes , Análisis de Secuencia de ADN
19.
Algorithms Mol Biol ; 19(1): 21, 2024 Jun 11.
Artículo en Inglés | MEDLINE | ID: mdl-38863064

RESUMEN

Metric multidimensional scaling is one of the classical methods for embedding data into low-dimensional Euclidean space. It creates the low-dimensional embedding by approximately preserving the pairwise distances between the input points. However, current state-of-the-art approaches only scale to a few thousand data points. For larger data sets such as those occurring in single-cell RNA sequencing experiments, the running time becomes prohibitively large and thus alternative methods such as PCA are widely used instead. Here, we propose a simple neural network-based approach for solving the metric multidimensional scaling problem that is orders of magnitude faster than previous state-of-the-art approaches, and hence scales to data sets with up to a few million cells. At the same time, it provides a non-linear mapping between high- and low-dimensional space that can place previously unseen cells in the same embedding.

20.
bioRxiv ; 2024 Apr 03.
Artículo en Inglés | MEDLINE | ID: mdl-38617276

RESUMEN

Y chromosomes of great apes harbor Ampliconic Genes (YAGs)-multi-copy gene families (BPY2, CDY, DAZ, HSFY, PRY, RBMY, TSPY, VCY, and XKRY) that encode proteins important for spermatogenesis. Previous work assembled YAG transcripts based on their targeted sequencing but not using reference genome assemblies, potentially resulting in an incomplete transcript repertoire. Here we used the recently produced gapless telomere-to-telomere (T2T) Y chromosome assemblies of great ape species (bonobo, chimpanzee, human, gorilla, Bornean orangutan, and Sumatran orangutan) and analyzed RNA data from whole-testis samples for the same species. We generated hybrid transcriptome assemblies by combining targeted long reads (Pacific Biosciences), untargeted long reads (Pacific Biosciences) and untargeted short reads (Illumina)and mapping them to the T2T reference genomes. Compared to the results from the reference-free approach, average transcript length was more than two times higher, and the total number of transcripts decreased three times, improving the quality of the assembled transcriptome. The reference-based transcriptome assemblies allowed us to differentiate transcripts originating from different Y chromosome gene copies and from their non-Y chromosome homologs. We identified two sources of transcriptome diversity-alternative splicing and gene duplication with subsequent diversification of gene copies. For each gene family, we detected transcribed pseudogenes along with protein-coding gene copies. We revealed previously unannotated gene copies of YAGs as compared to currently available NCBI annotations, as well as novel isoforms for annotated gene copies. This analysis paves the way for better understanding Y chromosome gene functions, which is important given their role in spermatogenesis.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA