Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 72
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
PLoS Biol ; 22(9): e3002813, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39348416

RESUMO

Mycobacterium tuberculosis (Mtb) releases the unusual terpene nucleoside 1-tuberculosinyladenosine (1-TbAd) to block lysosomal function and promote survival in human macrophages. Using conventional approaches, we found that genes Rv3377c and Rv3378c, but not Rv3376, were necessary for 1-TbAd biosynthesis. Here, we introduce linear models for mass spectrometry (limms) software as a next-generation lipidomics tool to study the essential functions of lipid biosynthetic enzymes on a whole-cell basis. Using limms, whole-cell lipid profiles deepened the phenotypic landscape of comparative mass spectrometry experiments and identified a large family of approximately 100 terpene nucleoside metabolites downstream of Rv3378c. We validated the identity of previously unknown adenine-, adenosine-, and lipid-modified tuberculosinol-containing molecules using synthetic chemistry and collisional mass spectrometry, including comprehensive profiling of bacterial lipids that fragment to adenine. We tracked terpene nucleoside genotypes and lipid phenotypes among Mycobacterium tuberculosis complex (MTC) species that did or did not evolve to productively infect either human or nonhuman mammals. Although 1-TbAd biosynthesis genes were thought to be restricted to the MTC, we identified the locus in unexpected species outside the MTC. Sequence analysis of the locus showed nucleotide usage characteristic of plasmids from plant-associated bacteria, clarifying the origin and timing of horizontal gene transfer to a pre-MTC progenitor. The data demonstrated correlation between high level terpene nucleoside biosynthesis and mycobacterial competence for human infection, and 2 mechanisms of 1-TbAd biosynthesis loss. Overall, the selective gain and evolutionary retention of tuberculosinyl metabolites in modern species that cause human TB suggest a role in human TB disease, and the newly discovered molecules represent candidate disease-specific biomarkers.


Assuntos
Mycobacterium tuberculosis , Nucleosídeos , Terpenos , Tuberculose , Mycobacterium tuberculosis/metabolismo , Mycobacterium tuberculosis/genética , Tuberculose/microbiologia , Terpenos/metabolismo , Humanos , Nucleosídeos/metabolismo , Adenosina/metabolismo , Adenosina/análogos & derivados , Lipidômica/métodos , Espectrometria de Massas , Proteínas de Bactérias/metabolismo , Proteínas de Bactérias/genética , Genes Bacterianos , Lipídeos
2.
Nature ; 598(7879): 103-110, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34616066

RESUMO

Single-cell transcriptomics can provide quantitative molecular signatures for large, unbiased samples of the diverse cell types in the brain1-3. With the proliferation of multi-omics datasets, a major challenge is to validate and integrate results into a biological understanding of cell-type organization. Here we generated transcriptomes and epigenomes from more than 500,000 individual cells in the mouse primary motor cortex, a structure that has an evolutionarily conserved role in locomotion. We developed computational and statistical methods to integrate multimodal data and quantitatively validate cell-type reproducibility. The resulting reference atlas-containing over 56 neuronal cell types that are highly replicable across analysis methods, sequencing technologies and modalities-is a comprehensive molecular and genomic account of the diverse neuronal and non-neuronal cell types in the mouse primary motor cortex. The atlas includes a population of excitatory neurons that resemble pyramidal cells in layer 4 in other cortical regions4. We further discovered thousands of concordant marker genes and gene regulatory elements for these cell types. Our results highlight the complex molecular regulation of cell types in the brain and will directly enable the design of reagents to target specific cell types in the mouse primary motor cortex for functional analysis.


Assuntos
Epigenômica , Perfilação da Expressão Gênica , Córtex Motor/citologia , Neurônios/classificação , Análise de Célula Única , Transcriptoma , Animais , Atlas como Assunto , Conjuntos de Dados como Assunto , Epigênese Genética , Feminino , Masculino , Camundongos , Córtex Motor/anatomia & histologia , Neurônios/citologia , Neurônios/metabolismo , Especificidade de Órgãos , Reprodutibilidade dos Testes
3.
BMC Bioinformatics ; 25(1): 198, 2024 May 24.
Artigo em Inglês | MEDLINE | ID: mdl-38789920

RESUMO

BACKGROUND: Single-cell transcriptome sequencing (scRNA-Seq) has allowed new types of investigations at unprecedented levels of resolution. Among the primary goals of scRNA-Seq is the classification of cells into distinct types. Many approaches build on existing clustering literature to develop tools specific to single-cell. However, almost all of these methods rely on heuristics or user-supplied parameters to control the number of clusters. This affects both the resolution of the clusters within the original dataset as well as their replicability across datasets. While many recommendations exist, in general, there is little assurance that any given set of parameters will represent an optimal choice in the trade-off between cluster resolution and replicability. For instance, another set of parameters may result in more clusters that are also more replicable. RESULTS: Here, we propose Dune, a new method for optimizing the trade-off between the resolution of the clusters and their replicability. Our method takes as input a set of clustering results-or partitions-on a single dataset and iteratively merges clusters within each partitions in order to maximize their concordance between partitions. As demonstrated on multiple datasets from different platforms, Dune outperforms existing techniques, that rely on hierarchical merging for reducing the number of clusters, in terms of replicability of the resultant merged clusters as well as concordance with ground truth. Dune is available as an R package on Bioconductor: https://www.bioconductor.org/packages/release/bioc/html/Dune.html . CONCLUSIONS: Cluster refinement by Dune helps improve the robustness of any clustering analysis and reduces the reliance on tuning parameters. This method provides an objective approach for borrowing information across multiple clusterings to generate replicable clusters most likely to represent common biological features across multiple datasets.


Assuntos
RNA-Seq , Análise de Célula Única , Software , Análise de Célula Única/métodos , RNA-Seq/métodos , Análise por Conglomerados , Algoritmos , Análise de Sequência de RNA/métodos , Humanos , Transcriptoma/genética , Reprodutibilidade dos Testes , Perfilação da Expressão Gênica/métodos , Análise da Expressão Gênica de Célula Única
4.
Biostatistics ; 24(4): 1085-1105, 2023 10 18.
Artigo em Inglês | MEDLINE | ID: mdl-35861622

RESUMO

An endeavor central to precision medicine is predictive biomarker discovery; they define patient subpopulations which stand to benefit most, or least, from a given treatment. The identification of these biomarkers is often the byproduct of the related but fundamentally different task of treatment rule estimation. Using treatment rule estimation methods to identify predictive biomarkers in clinical trials where the number of covariates exceeds the number of participants often results in high false discovery rates. The higher than expected number of false positives translates to wasted resources when conducting follow-up experiments for drug target identification and diagnostic assay development. Patient outcomes are in turn negatively affected. We propose a variable importance parameter for directly assessing the importance of potentially predictive biomarkers and develop a flexible nonparametric inference procedure for this estimand. We prove that our estimator is double robust and asymptotically linear under loose conditions in the data-generating process, permitting valid inference about the importance metric. The statistical guarantees of the method are verified in a thorough simulation study representative of randomized control trials with moderate and high-dimensional covariate vectors. Our procedure is then used to discover predictive biomarkers from among the tumor gene expression data of metastatic renal cell carcinoma patients enrolled in recently completed clinical trials. We find that our approach more readily discerns predictive from nonpredictive biomarkers than procedures whose primary purpose is treatment rule estimation. An open-source software implementation of the methodology, the uniCATE R package, is briefly introduced.


Assuntos
Pesquisa Biomédica , Carcinoma de Células Renais , Neoplasias Renais , Humanos , Carcinoma de Células Renais/diagnóstico , Carcinoma de Células Renais/genética , Neoplasias Renais/diagnóstico , Neoplasias Renais/genética , Biomarcadores , Simulação por Computador
5.
Bioinformatics ; 38(Suppl 1): i36-i44, 2022 06 24.
Artigo em Inglês | MEDLINE | ID: mdl-35758804

RESUMO

MOTIVATION: Genome-wide association studies (GWAS), aiming to find genetic variants associated with a trait, have widely been used on bacteria to identify genetic determinants of drug resistance or hypervirulence. Recent bacterial GWAS methods usually rely on k-mers, whose presence in a genome can denote variants ranging from single-nucleotide polymorphisms to mobile genetic elements. This approach does not require a reference genome, making it easier to account for accessory genes. However, a same gene can exist in slightly different versions across different strains, leading to diluted effects. RESULTS: Here, we overcome this issue by testing covariates built from closed connected subgraphs (CCSs) of the de Bruijn graph defined over genomic k-mers. These covariates capture polymorphic genes as a single entity, improving k-mer-based GWAS both in terms of power and interpretability. However, a method naively testing all possible subgraphs would be powerless due to multiple testing corrections, and the mere exploration of these subgraphs would quickly become computationally intractable. The concept of testable hypothesis has successfully been used to address both problems in similar contexts. We leverage this concept to test all CCSs by proposing a novel enumeration scheme for these objects which fully exploits the pruning opportunity offered by testability, resulting in drastic improvements in computational efficiency. Our method integrates with existing visual tools to facilitate interpretation. AVAILABILITY AND IMPLEMENTATION: We provide an implementation of our method, as well as code to reproduce all results at https://github.com/HectorRDB/Caldera_ISMB. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Estudo de Associação Genômica Ampla , Software , Algoritmos , Bactérias/genética , Análise de Sequência de DNA/métodos
6.
Bioinformatics ; 36(11): 3422-3430, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32176249

RESUMO

MOTIVATION: Statistical analyses of high-throughput sequencing data have re-shaped the biological sciences. In spite of myriad advances, recovering interpretable biological signal from data corrupted by technical noise remains a prevalent open problem. Several classes of procedures, among them classical dimensionality reduction techniques and others incorporating subject-matter knowledge, have provided effective advances. However, no procedure currently satisfies the dual objectives of recovering stable and relevant features simultaneously. RESULTS: Inspired by recent proposals for making use of control data in the removal of unwanted variation, we propose a variant of principal component analysis (PCA), sparse contrastive PCA that extracts sparse, stable, interpretable and relevant biological signal. The new methodology is compared to competing dimensionality reduction approaches through a simulation study and via analyses of several publicly available protein expression, microarray gene expression and single-cell transcriptome sequencing datasets. AVAILABILITY AND IMPLEMENTATION: A free and open-source software implementation of the methodology, the scPCA R package, is made available via the Bioconductor Project. Code for all analyses presented in this article is also available via GitHub. CONTACT: philippe_boileau@berkeley.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Análise de Componente Principal
7.
Chem Res Toxicol ; 34(12): 2549-2557, 2021 12 20.
Artigo em Inglês | MEDLINE | ID: mdl-34788011

RESUMO

We previously developed an adductomics pipeline that employed nanoflow liquid chromatography and high-resolution tandem mass spectrometry (nLC-HR-MS/MS) plus informatics to perform an untargeted detection of modifications to Cys34 in the tryptic T3 peptide of human serum albumin (HSA) (21ALVLIAFAQYLQQC34PFEDHVK41). In order to detect these peptide modifications without targeting specific masses, the pipeline interrogates MS2 ions that are signatures of the T3 peptide. The pipeline had been pilot-tested with archived plasma from healthy human subjects, and several of the 43 Cys34 adducts were highly associated with the smoking status. In the current investigation, we adapted the pipeline to include modifications to the ε-amino group of Lys525─a major glycation site in HSA─and thereby extend the coverage to products of Schiff bases that cannot be produced at Cys34. Because trypsin is generally unable to digest proteins at modified lysines, our pipeline detects miscleaved tryptic peptides with the sequence 525KQTALVELVK534. Adducts of both Lys525 and Cys34 are measured in a single nLC-HR-MS/MS run by increasing the mass range of precursor ions in MS1 scans and including both triply and doubly charged precursor ions for collision-induced dissociation fragmentation. For proof of principle, we applied the Cys34/Lys525 pipeline to archived plasma specimens from a subset of the same volunteer subjects used in the original investigation. Twelve modified Lys525 peptides were detected, including products of glycation (fructosyl-lysine plus advanced-glycated-end products), acetylation, and elimination of ammonia and water. Surprisingly, the carbamylated and glycated adducts were present at significantly lower levels in smoking subjects. By including a larger class of in vivo nucleophilic substitution reactions, the Cys34/Lys525 adductomics pipeline expands exposomic investigations of unknown human exposure to reactive electrophiles derived from both exogenous and endogenous sources.


Assuntos
Cisteína/química , Lisina/química , Albumina Sérica Humana/química , Cisteína/sangue , Voluntários Saudáveis , Humanos , Lisina/sangue , Masculino , Modelos Moleculares , Peptídeos/sangue , Peptídeos/química
8.
Nat Methods ; 14(6): 565-571, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-28504683

RESUMO

Single-cell transcriptomics is becoming an important component of the molecular biologist's toolkit. A critical step when analyzing data generated using this technology is normalization. However, normalization is typically performed using methods developed for bulk RNA sequencing or even microarray data, and the suitability of these methods for single-cell transcriptomics has not been assessed. We here discuss commonly used normalization approaches and illustrate how these can produce misleading results. Finally, we present alternative approaches and provide recommendations for single-cell RNA sequencing users.


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/normas , RNA/genética , Análise de Sequência de RNA/normas , Análise de Célula Única/normas , Transcriptoma/genética , Interpretação Estatística de Dados , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Valores de Referência
9.
BMC Bioinformatics ; 20(1): 334, 2019 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-31200644

RESUMO

BACKGROUND: Untargeted metabolomics datasets contain large proportions of uninformative features that can impede subsequent statistical analysis such as biomarker discovery and metabolic pathway analysis. Thus, there is a need for versatile and data-adaptive methods for filtering data prior to investigating the underlying biological phenomena. Here, we propose a data-adaptive pipeline for filtering metabolomics data that are generated by liquid chromatography-mass spectrometry (LC-MS) platforms. Our data-adaptive pipeline includes novel methods for filtering features based on blank samples, proportions of missing values, and estimated intra-class correlation coefficients. RESULTS: Using metabolomics datasets that were generated in our laboratory from samples of human blood, as well as two public LC-MS datasets, we compared our data-adaptive filtering method with traditional methods that rely on non-method specific thresholds. The data-adaptive approach outperformed traditional approaches in terms of removing noisy features and retaining high quality, biologically informative ones. The R code for running the data-adaptive filtering method is provided at https://github.com/courtneyschiffman/Metabolomics-Filtering . CONCLUSIONS: Our proposed data-adaptive filtering pipeline is intuitive and effectively removes uninformative features from untargeted metabolomics datasets. It is particularly relevant for interrogation of biological phenomena in data derived from complex matrices associated with biospecimens.


Assuntos
Metabolômica/métodos , Espectrometria de Massas em Tandem/métodos , Cromatografia Líquida , Neoplasias Colorretais/metabolismo , Bases de Dados como Assunto , Humanos , Redes e Vias Metabólicas
10.
PLoS Comput Biol ; 14(9): e1006378, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-30180157

RESUMO

Clustering of genes and/or samples is a common task in gene expression analysis. The goals in clustering can vary, but an important scenario is that of finding biologically meaningful subtypes within the samples. This is an application that is particularly appropriate when there are large numbers of samples, as in many human disease studies. With the increasing popularity of single-cell transcriptome sequencing (RNA-Seq), many more controlled experiments on model organisms are similarly creating large gene expression datasets with the goal of detecting previously unknown heterogeneity within cells. It is common in the detection of novel subtypes to run many clustering algorithms, as well as rely on subsampling and ensemble methods to improve robustness. We introduce a Bioconductor R package, clusterExperiment, that implements a general and flexible strategy we entitle Resampling-based Sequential Ensemble Clustering (RSEC). RSEC enables the user to easily create multiple, competing clusterings of the data based on different techniques and associated tuning parameters, including easy integration of resampling and sequential clustering, and then provides methods for consolidating the multiple clusterings into a final consensus clustering. The package is modular and allows the user to separately apply the individual components of the RSEC procedure, i.e., apply multiple clustering algorithms, create a consensus clustering or choose tuning parameters, and merge clusters. Additionally, clusterExperiment provides a variety of visualization tools for the clustering process, as well as methods for the identification of possible cluster signatures or biomarkers. The R package clusterExperiment is publicly available through the Bioconductor Project, with a detailed manual (vignette) as well as well documented help pages for each function.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Hipotálamo/fisiologia , Mucosa Olfatória/fisiologia , Algoritmos , Animais , Astrócitos/fisiologia , Biomarcadores , Análise por Conglomerados , Bases de Dados Factuais , Humanos , Microglia/fisiologia , Família Multigênica , Neurônios/fisiologia , Oligodendroglia/fisiologia , Linguagens de Programação , Análise de Sequência de RNA , Software
11.
Anal Bioanal Chem ; 411(11): 2351-2362, 2019 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-30783713

RESUMO

Metabolism of chemicals from the diet, exposures to xenobiotics, the microbiome, and lifestyle factors (e.g., smoking, alcohol intake) produce electrophiles that react with nucleophilic sites in circulating proteins, notably Cys34 of human serum albumin (HSA). To discover potential risk factors resulting from in utero exposures, we are investigating HSA-Cys34 adducts in archived newborn dried blood spots (DBS) that reflect systemic exposures during the last month of gestation. The workflow includes extraction of proteins from DBS, measurement of hemoglobin (Hb) to normalize for blood volume, addition of methanol to enrich HSA by precipitation of Hb and other interfering proteins, digestion with trypsin, and detection of HSA-Cys34 adducts via nanoflow liquid chromatography-high-resolution mass spectrometry. As proof-of-principle, we applied the method to 49 archived DBS collected from newborns whose mothers either actively smoked during pregnancy or were nonsmokers. Twenty-six HSA-Cys34 adducts were detected, including Cys34 oxidation products, mixed disulfides with low molecular weight thiols (e.g., cysteine, homocysteine, glutathione, cysteinylglycine), and other modifications. Data were normalized with a novel method ("scone") to remove unwanted technical variation arising from HSA digestion, blood volume, DBS age, mass spectrometry analysis, and batch effects. Using an ensemble of linear and nonlinear models, the Cys34 adduct of cyanide was found to consistently discriminate between newborns of smoking and nonsmoking mothers with a mean fold change (smoking/nonsmoking) of 1.31. These results indicate that DBS adductomics is suitable for investigating in utero exposures to reactive chemicals and metabolites that may influence disease risks later in life.


Assuntos
Cisteína/análise , Teste em Amostras de Sangue Seco/métodos , Albumina Sérica Humana/química , Espectrometria de Massas em Tandem/métodos , Cromatografia Líquida de Alta Pressão/métodos , Feminino , Humanos , Recém-Nascido , Exposição Materna/efeitos adversos , Oxirredução , Gravidez , Efeitos Tardios da Exposição Pré-Natal/sangue , Fumar/efeitos adversos , Fumar/sangue
12.
BMC Genomics ; 19(1): 477, 2018 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-29914354

RESUMO

BACKGROUND: Single-cell transcriptomics allows researchers to investigate complex communities of heterogeneous cells. It can be applied to stem cells and their descendants in order to chart the progression from multipotent progenitors to fully differentiated cells. While a variety of statistical and computational methods have been proposed for inferring cell lineages, the problem of accurately characterizing multiple branching lineages remains difficult to solve. RESULTS: We introduce Slingshot, a novel method for inferring cell lineages and pseudotimes from single-cell gene expression data. In previously published datasets, Slingshot correctly identifies the biological signal for one to three branching trajectories. Additionally, our simulation study shows that Slingshot infers more accurate pseudotimes than other leading methods. CONCLUSIONS: Slingshot is a uniquely robust and flexible tool which combines the highly stable techniques necessary for noisy single-cell data with the ability to identify multiple trajectories. Accurate lineage inference is a critical step in the identification of dynamic temporal gene expression.


Assuntos
Linhagem da Célula , Perfilação da Expressão Gênica/métodos , Análise por Conglomerados , Humanos , Mioblastos Esqueléticos/metabolismo , Análise de Célula Única , Software
13.
BMC Cancer ; 18(1): 996, 2018 Oct 19.
Artigo em Inglês | MEDLINE | ID: mdl-30340609

RESUMO

BACKGROUND: Epidemiologists are beginning to employ metabolomics and lipidomics with archived blood from incident cases and controls to discover causes of cancer. Although several such studies have focused on colorectal cancer (CRC), they all followed targeted or semi-targeted designs that limited their ability to find discriminating molecules and pathways related to the causes of CRC. METHODS: Using an untargeted design, we measured lipophilic metabolites in prediagnostic serum from 66 CRC patients and 66 matched controls from the European Prospective Investigation into Cancer and Nutrition (Turin, Italy). Samples were analyzed by liquid chromatography-high-resolution mass spectrometry (LC-MS), resulting in 8690 features for statistical analysis. RESULTS: Rather than the usual multiple-hypothesis-testing approach, we based variable selection on an ensemble of regression methods, which found nine features to be associated with case-control status. We then regressed each selected feature on time-to-diagnosis to determine whether the feature was likely to be either a potentially causal biomarker or a reactive product of disease progression (reverse causality). CONCLUSIONS: Of the nine selected LC-MS features, four appear to be involved in CRC etiology and merit further investigation in prospective studies of CRC. Four other features appear to be related to progression of the disease (reverse causality), and may represent biomarkers of value for early detection of CRC.


Assuntos
Biomarcadores Tumorais/sangue , Neoplasias Colorretais/sangue , Neoplasias Colorretais/diagnóstico , Metabolômica/métodos , Adulto , Idoso , Estudos de Casos e Controles , Estudos de Coortes , Neoplasias Colorretais/epidemiologia , Europa (Continente)/epidemiologia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Estudos Prospectivos
14.
Nature ; 471(7339): 473-9, 2011 Mar 24.
Artigo em Inglês | MEDLINE | ID: mdl-21179090

RESUMO

Drosophila melanogaster is one of the most well studied genetic model organisms; nonetheless, its genome still contains unannotated coding and non-coding genes, transcripts, exons and RNA editing sites. Full discovery and annotation are pre-requisites for understanding how the regulation of transcription, splicing and RNA editing directs the development of this complex organism. Here we used RNA-Seq, tiling microarrays and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages. We identified 111,195 new elements, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events, and inferred protein isoforms that previously eluded discovery using established experimental, prediction and conservation-based approaches. These data substantially expand the number of known transcribed elements in the Drosophila genome and provide a high-resolution view of transcriptome dynamics throughout development.


Assuntos
Drosophila melanogaster/crescimento & desenvolvimento , Drosophila melanogaster/genética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento/genética , Transcrição Gênica/genética , Processamento Alternativo/genética , Animais , Sequência de Bases , Proteínas de Drosophila/genética , Drosophila melanogaster/embriologia , Éxons/genética , Feminino , Genes de Insetos/genética , Genoma de Inseto/genética , Masculino , MicroRNAs/genética , Análise de Sequência com Séries de Oligonucleotídeos , Isoformas de Proteínas/genética , Edição de RNA/genética , RNA Mensageiro/análise , RNA Mensageiro/genética , Pequeno RNA não Traduzido/análise , Pequeno RNA não Traduzido/genética , Análise de Sequência , Caracteres Sexuais
15.
Genome Res ; 21(2): 193-202, 2011 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-20921232

RESUMO

Alternative splicing is generally controlled by proteins that bind directly to regulatory sequence elements and either activate or repress splicing of adjacent splice sites in a target pre-mRNA. Here, we have combined RNAi and mRNA-seq to identify exons that are regulated by Pasilla (PS), the Drosophila melanogaster ortholog of mammalian NOVA1 and NOVA2. We identified 405 splicing events in 323 genes that are significantly affected upon depletion of ps, many of which were annotated as being constitutively spliced. The sequence regions upstream and within PS-repressed exons and downstream from PS-activated exons are enriched for YCAY repeats, and these are consistent with the location of these motifs near NOVA-regulated exons in mammals. Thus, the RNA regulatory map of PS and NOVA1/2 is highly conserved between insects and mammals despite the fact that the target gene orthologs regulated by PS and NOVA1/2 are almost entirely nonoverlapping. This observation suggests that the regulatory codes of individual RNA binding proteins may be nearly immutable, yet the regulatory modules controlled by these proteins are highly evolvable.


Assuntos
Drosophila/genética , Mamíferos/genética , RNA Mensageiro/metabolismo , Processamento Alternativo , Animais , Antígenos de Neoplasias/genética , Células Cultivadas , Biologia Computacional , Sequência Conservada/genética , Proteínas de Drosophila/genética , Éxons , Perfilação da Expressão Gênica , Íntrons , Dados de Sequência Molecular , Proteínas do Tecido Nervoso/genética , Antígeno Neuro-Oncológico Ventral , Interferência de RNA , Precursores de RNA/genética , Precursores de RNA/metabolismo , RNA Mensageiro/genética , Proteínas de Ligação a RNA/genética , Ribonucleoproteínas/genética
16.
Nat Commun ; 15(1): 833, 2024 Jan 27.
Artigo em Inglês | MEDLINE | ID: mdl-38280860

RESUMO

In single-cell RNA sequencing (scRNA-Seq), gene expression is assessed individually for each cell, allowing the investigation of developmental processes, such as embryogenesis and cellular differentiation and regeneration, at unprecedented resolution. In such dynamic biological systems, cellular states form a continuum, e.g., for the differentiation of stem cells into mature cell types. This process is often represented via a trajectory in a reduced-dimensional representation of the scRNA-Seq dataset. While many methods have been suggested for trajectory inference, it is often unclear how to handle multiple biological groups or conditions, e.g., inferring and comparing the differentiation trajectories of wild-type and knock-out stem cell populations. In this manuscript, we present condiments, a method for the inference and downstream interpretation of cell trajectories across multiple conditions. Our framework allows the interpretation of differences between conditions at the trajectory, cell population, and gene expression levels. We start by integrating datasets from multiple conditions into a single trajectory. By comparing the cell's conditions along the trajectory's path, we can detect large-scale changes, indicative of differential progression or fate selection. We also demonstrate how to detect subtler changes by finding genes that exhibit different behaviors between these conditions along a differentiation path.


Assuntos
Análise de Célula Única , Células-Tronco , Análise de Célula Única/métodos , Diferenciação Celular/genética , Desenvolvimento Embrionário , Análise de Sequência de RNA/métodos , Condimentos , Perfilação da Expressão Gênica/métodos
17.
Stat Appl Genet Mol Biol ; 11(2)2012 Jan 06.
Artigo em Inglês | MEDLINE | ID: mdl-22499689

RESUMO

We provide a brief editorial introduction to a special issue of Statistical Applications in Genetics and Molecular Biology dedicated to the workshop on "Computational Statistical Methods for Genomics and Systems Biology", held at the Centre de recherches mathématiques in Montreal in April 2011.


Assuntos
Genômica/métodos , Biologia de Sistemas/métodos , Humanos
18.
Proc Natl Acad Sci U S A ; 107(11): 5058-63, 2010 Mar 16.
Artigo em Inglês | MEDLINE | ID: mdl-20194736

RESUMO

The search to understand how genomes innovate in response to selection dominates the field of evolutionary biology. Powerful molecular evolution approaches have been developed to test individual loci for signatures of selection. In many cases, however, an organism's response to changes in selective pressure may be mediated by multiple genes, whose products function together in a cellular process or pathway. Here we assess the prevalence of polygenic evolution in pathways in the yeasts Saccharomyces cerevisiae and S. bayanus. We first established short-read sequencing methods to detect cis-regulatory variation in a diploid hybrid between the species. We then tested for the scenario in which selective pressure in one species to increase or decrease the activity of a pathway has driven the accumulation of cis-regulatory variants that act in the same direction on gene expression. Application of this test revealed a variety of yeast pathways with evidence for directional regulatory evolution. In parallel, we also used population genomic sequencing data to compare protein and cis-regulatory variation within and between species. We identified pathways with evidence for divergence within S. cerevisiae, and we detected signatures of positive selection between S. cerevisiae and S. bayanus. Our results point to polygenic, pathway-level change as a common evolutionary mechanism among yeasts. We suggest that pathway analyses, including our test for directional regulatory evolution, will prove to be a relevant and powerful strategy in many evolutionary genomic applications.


Assuntos
Evolução Biológica , Redes e Vias Metabólicas/genética , Herança Multifatorial/genética , Saccharomyces/genética , Alelos , Sequência de Bases , Exossomos/metabolismo , Regulação Fúngica da Expressão Gênica , Variação Genética , Hibridização Genética , RNA Fúngico/genética , Sequências Reguladoras de Ácido Nucleico/genética , Seleção Genética , Especificidade da Espécie
19.
J Comput Graph Stat ; 32(2): 601-612, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37273839

RESUMO

The covariance matrix plays a fundamental role in many modern exploratory and inferential statistical procedures, including dimensionality reduction, hypothesis testing, and regression. In low-dimensional regimes, where the number of observations far exceeds the number of variables, the optimality of the sample covariance matrix as an estimator of this parameter is well-established. High-dimensional regimes do not admit such a convenience. Thus, a variety of estimators have been derived to overcome the shortcomings of the canonical estimator in such settings. Yet, selecting an optimal estimator from among the plethora available remains an open challenge. Using the framework of cross-validated loss-based estimation, we develop the theoretical underpinnings of just such an estimator selection procedure. We propose a general class of loss functions for covariance matrix estimation and establish accompanying finite-sample risk bounds and conditions for the asymptotic optimality of the cross-validation selector. In numerical experiments, we demonstrate the optimality of our proposed selector in moderate sample sizes and across diverse data-generating processes. The practical benefits of our procedure are highlighted in a dimension reduction application to single-cell transcriptome sequencing data.

20.
Cancers (Basel) ; 15(4)2023 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-36831356

RESUMO

Leukemia is the most common cancer in children in industrialized countries, and its initiation often occurs prenatally. Folic acid is a key vitamin in the production and modification of DNA, and prenatal folic acid intake is known to reduce the risk of childhood leukemia. We characterized the one-carbon (folate) metabolism nutrients that may influence risk of childhood acute lymphoblastic leukemia (ALL) among 122 cases diagnosed at age 0-14 years during 1988-2011 and 122 controls matched on sex, age, and race/ethnicity. Using hydrophilic interaction chromatography (HILIC) applied to neonatal dried blood spots, we evaluated 11 folate pathway metabolites, overall and by sex, race/ethnicity, and age at diagnosis. To conduct the prediction analyses, the 244 samples were separated into learning (75%) and test (25%) sets, maintaining the matched pairings. The learning set was used to train classification methods which were evaluated on the test set. High classification error rates indicate that the folate pathway metabolites measured have little predictive capacity for pediatric ALL. In conclusion, the one-carbon metabolism nutrients measured at birth were unable to predict subsequent leukemia in children. These negative findings are reflective of the last weeks of pregnancy and our study does not address the impact of these nutrients at the time of conception or during the first trimester of pregnancy that are critical for the embryo's DNA methylation programming.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA