Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Nature ; 626(8001): 1084-1093, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38355799

RESUMO

The house mouse (Mus musculus) is an exceptional model system, combining genetic tractability with close evolutionary affinity to humans1,2. Mouse gestation lasts only 3 weeks, during which the genome orchestrates the astonishing transformation of a single-cell zygote into a free-living pup composed of more than 500 million cells. Here, to establish a global framework for exploring mammalian development, we applied optimized single-cell combinatorial indexing3 to profile the transcriptional states of 12.4 million nuclei from 83 embryos, precisely staged at 2- to 6-hour intervals spanning late gastrulation (embryonic day 8) to birth (postnatal day 0). From these data, we annotate hundreds of cell types and explore the ontogenesis of the posterior embryo during somitogenesis and of kidney, mesenchyme, retina and early neurons. We leverage the temporal resolution and sampling depth of these whole-embryo snapshots, together with published data4-8 from earlier timepoints, to construct a rooted tree of cell-type relationships that spans the entirety of prenatal development, from zygote to birth. Throughout this tree, we systematically nominate genes encoding transcription factors and other proteins as candidate drivers of the in vivo differentiation of hundreds of cell types. Remarkably, the most marked temporal shifts in cell states are observed within one hour of birth and presumably underlie the massive physiological adaptations that must accompany the successful transition of a mammalian fetus to life outside the womb.


Assuntos
Animais Recém-Nascidos , Embrião de Mamíferos , Desenvolvimento Embrionário , Gástrula , Análise de Célula Única , Imagem com Lapso de Tempo , Animais , Feminino , Camundongos , Gravidez , Animais Recém-Nascidos/embriologia , Animais Recém-Nascidos/genética , Diferenciação Celular/genética , Embrião de Mamíferos/citologia , Embrião de Mamíferos/embriologia , Desenvolvimento Embrionário/genética , Gástrula/citologia , Gástrula/embriologia , Gastrulação/genética , Rim/citologia , Rim/embriologia , Mesoderma/citologia , Mesoderma/enzimologia , Neurônios/citologia , Neurônios/metabolismo , Retina/citologia , Retina/embriologia , Somitos/citologia , Somitos/embriologia , Fatores de Tempo , Fatores de Transcrição/genética , Transcrição Gênica , Especificidade de Órgãos/genética
2.
Genome Res ; 2024 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-38849157

RESUMO

Long-read DNA sequencing has recently emerged as a powerful tool for studying both genetic and epigenetic architectures at single-molecule and single-nucleotide resolution. Long-read epigenetic studies encompass both the direct identification of native cytosine methylation as well as the identification of exogenously placed DNA N6-methyladenine (DNA-m6A). However, detecting DNA-m6A modifications using single-molecule sequencing, as well as coprocessing single-molecule genetic and epigenetic architectures, is limited by computational demands and a lack of supporting tools. Here, we introduce fibertools, a state-of-the-art toolkit that features a semisupervised convolutional neural network for fast and accurate identification of m6A-marked bases using PacBio single-molecule long-read sequencing, as well as the coprocessing of long-read genetic and epigenetic data produced using either PacBio or Oxford Nanopore sequencing platforms. We demonstrate accurate DNA-m6A identification (>90% precision and recall) along >20 kilobase long DNA molecules with a ~1,000-fold improvement in speed. In addition, we demonstrate that fibertools can readily integrate genetic and epigenetic data at single-molecule resolution, including the seamless conversion between molecular and reference coordinate systems, allowing for accurate genetic and epigenetic analyses of long-read data within structurally and somatically variable genomic regions.

3.
Bioinformatics ; 40(Supplement_1): i410-i417, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940129

RESUMO

MOTIVATION: One of the core problems in the analysis of protein tandem mass spectrometry data is the peptide assignment problem: determining, for each observed spectrum, the peptide sequence that was responsible for generating the spectrum. Two primary classes of methods are used to solve this problem: database search and de novo peptide sequencing. State-of-the-art methods for de novo sequencing use machine learning methods, whereas most database search engines use hand-designed score functions to evaluate the quality of a match between an observed spectrum and a candidate peptide from the database. We hypothesized that machine learning models for de novo sequencing implicitly learn a score function that captures the relationship between peptides and spectra, and thus may be re-purposed as a score function for database search. Because this score function is trained from massive amounts of mass spectrometry data, it could potentially outperform existing, hand-designed database search tools. RESULTS: To test this hypothesis, we re-engineered Casanovo, which has been shown to provide state-of-the-art de novo sequencing capabilities, to assign scores to given peptide-spectrum pairs. We then evaluated the statistical power of this Casanovo score function, Casanovo-DB, to detect peptides on a benchmark of three mass spectrometry runs from three different species. In addition, we show that re-scoring with the Percolator post-processor benefits Casanovo-DB more than other score functions, further increasing the number of detected peptides.


Assuntos
Bases de Dados de Proteínas , Peptídeos , Peptídeos/química , Aprendizado de Máquina , Espectrometria de Massas/métodos , Algoritmos , Análise de Sequência de Proteína/métodos , Espectrometria de Massas em Tandem/métodos
4.
Proteomics ; 24(8): e2300084, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38380501

RESUMO

Assigning statistical confidence estimates to discoveries produced by a tandem mass spectrometry proteomics experiment is critical to enabling principled interpretation of the results and assessing the cost/benefit ratio of experimental follow-up. The most common technique for computing such estimates is to use target-decoy competition (TDC), in which observed spectra are searched against a database of real (target) peptides and a database of shuffled or reversed (decoy) peptides. TDC procedures for estimating the false discovery rate (FDR) at a given score threshold have been developed for application at the level of spectra, peptides, or proteins. Although these techniques are relatively straightforward to implement, it is common in the literature to skip over the implementation details or even to make mistakes in how the TDC procedures are applied in practice. Here we present Crema, an open-source Python tool that implements several TDC methods of spectrum-, peptide- and protein-level FDR estimation. Crema is compatible with a variety of existing database search tools and provides a straightforward way to obtain robust FDR estimates.


Assuntos
Algoritmos , Peptídeos , Bases de Dados de Proteínas , Peptídeos/química , Proteínas/análise , Proteômica/métodos
5.
J Proteome Res ; 23(6): 1907-1914, 2024 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-38687997

RESUMO

Traditional database search methods for the analysis of bottom-up proteomics tandem mass spectrometry (MS/MS) data are limited in their ability to detect peptides with post-translational modifications (PTMs). Recently, "open modification" database search strategies, in which the requirement that the mass of the database peptide closely matches the observed precursor mass is relaxed, have become popular as ways to find a wider variety of types of PTMs. Indeed, in one study, Kong et al. reported that the open modification search tool MSFragger can achieve higher statistical power to detect peptides than a traditional "narrow window" database search. We investigated this claim empirically and, in the process, uncovered a potential general problem with false discovery rate (FDR) control in the machine learning postprocessors Percolator and PeptideProphet. This problem might have contributed to Kong et al.'s report that their empirical results suggest that false discovery (FDR) control in the narrow window setting might generally be compromised. Indeed, reanalyzing the same data while using a more standard form of target-decoy competition-based FDR control, we found that, after accounting for chimeric spectra as well as for the inherent difference in the number of candidates in open and narrow searches, the data does not provide sufficient evidence that FDR control in proteomics MS/MS database search is inherently problematic.


Assuntos
Bases de Dados de Proteínas , Processamento de Proteína Pós-Traducional , Proteômica , Espectrometria de Massas em Tandem , Espectrometria de Massas em Tandem/métodos , Proteômica/métodos , Peptídeos/análise , Peptídeos/química , Aprendizado de Máquina , Humanos , Algoritmos , Software
6.
bioRxiv ; 2024 Apr 14.
Artigo em Inglês | MEDLINE | ID: mdl-38645064

RESUMO

Over the past 15 years, a variety of next-generation sequencing assays have been developed for measuring the 3D conformation of DNA in the nucleus. Each of these assays gives, for a particular cell or tissue type, a distinct picture of 3D chromatin architecture. Accordingly, making sense of the relationship between genome structure and function requires teasing apart two closely related questions: how does chromatin 3D structure change from one cell type to the next, and how do different measurements of that structure differ from one another, even when the two assays are carried out in the same cell type? In this work, we assemble a collection of chromatin 3D datasets-each represented as a 2D contact map- spanning multiple assay types and cell types. We then build a machine learning model that predicts missing contact maps in this collection. We use the model to systematically explore how genome 3D architecture changes, at the level of compartments, domains, and loops, between cell type and between assay types.

7.
bioRxiv ; 2024 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-38496477

RESUMO

The emergence of single-cell time-series datasets enables modeling of changes in various types of cellular profiles over time. However, due to the disruptive nature of single-cell measurements, it is impossible to capture the full temporal trajectory of a particular cell. Furthermore, single-cell profiles can be collected at mismatched time points across different conditions (e.g., sex, batch, disease) and data modalities (e.g., scRNA-seq, scATAC-seq), which makes modeling challenging. Here we propose a joint modeling framework, Sunbear, for integrating multi-condition and multi-modal single-cell profiles across time. Sunbear can be used to impute single-cell temporal profile changes, align multi-dataset and multi-modal profiles across time, and extrapolate single-cell profiles in a missing modality. We applied Sunbear to reveal sex-biased transcription during mouse embryonic development and predict dynamic relationships between epigenetic priming and transcription for cells in which multi-modal profiles are unavailable. Sunbear thus enables the projection of single-cell time-series snapshots to multi-modal and multi-condition views of cellular trajectories.

8.
Nat Commun ; 15(1): 6427, 2024 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-39080256

RESUMO

A fundamental challenge in mass spectrometry-based proteomics is the identification of the peptide that generated each acquired tandem mass spectrum. Approaches that leverage known peptide sequence databases cannot detect unexpected peptides and can be impractical or impossible to apply in some settings. Thus, the ability to assign peptide sequences to tandem mass spectra without prior information-de novo peptide sequencing-is valuable for tasks including antibody sequencing, immunopeptidomics, and metaproteomics. Although many methods have been developed to address this problem, it remains an outstanding challenge in part due to the difficulty of modeling the irregular data structure of tandem mass spectra. Here, we describe Casanovo, a machine learning model that uses a transformer neural network architecture to translate the sequence of peaks in a tandem mass spectrum into the sequence of amino acids that comprise the generating peptide. We train a Casanovo model from 30 million labeled spectra and demonstrate that the model outperforms several state-of-the-art methods on a cross-species benchmark dataset. We also develop a version of Casanovo that is fine-tuned for non-enzymatic peptides. Finally, we demonstrate that Casanovo's superior performance improves the analysis of immunopeptidomics and metaproteomics experiments and allows us to delve deeper into the dark proteome.


Assuntos
Peptídeos , Proteômica , Espectrometria de Massas em Tandem , Peptídeos/química , Peptídeos/metabolismo , Espectrometria de Massas em Tandem/métodos , Proteômica/métodos , Redes Neurais de Computação , Aprendizado de Máquina , Humanos , Sequência de Aminoácidos , Análise de Sequência de Proteína/métodos , Bases de Dados de Proteínas , Algoritmos
9.
iScience ; 27(5): 109570, 2024 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-38646172

RESUMO

The three-dimensional organization of genomes plays a crucial role in essential biological processes. The segregation of chromatin into A and B compartments highlights regions of activity and inactivity, providing a window into the genomic activities specific to each cell type. Yet, the steep costs associated with acquiring Hi-C data, necessary for studying this compartmentalization across various cell types, pose a significant barrier in studying cell type specific genome organization. To address this, we present a prediction tool called compartment prediction using recurrent neural networks (CoRNN), which predicts compartmentalization of 3D genome using histone modification enrichment. CoRNN demonstrates robust cross-cell-type prediction of A/B compartments with an average AuROC of 90.9%. Cell-type-specific predictions align well with known functional elements, with H3K27ac and H3K36me3 identified as highly predictive histone marks. We further investigate our mispredictions and found that they are located in regions with ambiguous compartmental status. Furthermore, our model's generalizability is validated by predicting compartments in independent tissue samples, which underscores its broad applicability.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA