Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 20 de 232
Filtrer
1.
mSystems ; : e0088824, 2024 Oct 01.
Article de Anglais | MEDLINE | ID: mdl-39352141

RÉSUMÉ

While numerous computational frameworks and workflows are available for recovering prokaryote and eukaryote genomes from metagenome data, only a limited number of pipelines are designed specifically for viromics analysis. With many viromics tools developed in the last few years alone, it can be challenging for scientists with limited bioinformatics experience to easily recover, evaluate quality, annotate genes, dereplicate, assign taxonomy, and calculate relative abundance and coverage of viral genomes using state-of-the-art methods and standards. Here, we describe Modular Viromics Pipeline (MVP) v.1.0, a user-friendly pipeline written in Python and providing a simple framework to perform standard viromics analyses. MVP combines multiple tools to enable viral genome identification, characterization of genome quality, filtering, clustering, taxonomic and functional annotation, genome binning, and comprehensive summaries of results that can be used for downstream ecological analyses. Overall, MVP provides a standardized and reproducible pipeline for both extensive and robust characterization of viruses from large-scale sequencing data including metagenomes, metatranscriptomes, viromes, and isolate genomes. As a typical use case, we show how the entire MVP pipeline can be applied to a set of 20 metagenomes from wetland sediments using only 10 modules executed via command lines, leading to the identification of 11,656 viral contigs and 8,145 viral operational taxonomic units (vOTUs) displaying a clear beta-diversity pattern. Further, acting as a dynamic wrapper, MVP is designed to continuously incorporate updates and integrate new tools, ensuring its ongoing relevance in the rapidly evolving field of viromics. MVP is available at https://gitlab.com/ccoclet/mvp and as versioned packages in PyPi and Conda.IMPORTANCEThe significance of our work lies in the development of Modular Viromics Pipeline (MVP), an integrated and user-friendly pipeline tailored exclusively for viromics analyses. MVP stands out due to its modular design, which ensures easy installation, execution, and integration of new tools and databases. By combining state-of-the-art tools such as geNomad and CheckV, MVP provides high-quality viral genome recovery and taxonomy and host assignment, and functional annotation, addressing the limitations of existing pipelines. MVP's ability to handle diverse sample types, including environmental, human microbiome, and plant-associated samples, makes it a versatile tool for the broader microbiome research community. By standardizing the analysis process and providing easily interpretable results, MVP enables researchers to perform comprehensive studies of viral communities, significantly advancing our understanding of viral ecology and its impact on various ecosystems.

2.
Cancers (Basel) ; 16(18)2024 Sep 20.
Article de Anglais | MEDLINE | ID: mdl-39335178

RÉSUMÉ

Background: The development of tumors is a highly complex process that entails numerous interactions and intricate relationships between the host immune system and cancer cells. It has been demonstrated in studies that the treatment response of patients can be correlated with the tumor microenvironment (TME). Consequently, the examination of diverse immune profiles within the TME can facilitate the elucidation of tumor development and the development of advantageous models for diagnoses and prognoses. Methods: In this study, we utilized a single-cell decomposition method to analyze the relationships between cell proportions and immune signatures in lung adenocarcinoma (LUAD) patients. Results: Our findings indicate that specific immune cell populations and immune signatures are significantly associated with patient prognosis. By identifying poor prognosis signatures (PPS), we reveal the critical role of immune profiles and cellular composition in disease outcomes, emphasizing their diagnostic potential for predicting patient prognosis. Conclusions: This study highlights the importance of immune signatures and cellular composition, which may serve as valuable biomarkers for disease prognosis in LUAD patients.

3.
BMC Bioinformatics ; 25(1): 315, 2024 Sep 28.
Article de Anglais | MEDLINE | ID: mdl-39342151

RÉSUMÉ

BACKGROUND: Structural variations play a significant role in genetic diseases and evolutionary mechanisms. Extensive research has been conducted over the past decade to detect simple structural variations, leading to the development of well-established detection methods. However, recent studies have highlighted the potentially greater impact of complex structural variations on individuals compared to simple structural variations. Despite this, the field still lacks precise detection methods specifically designed for complex structural variations. Therefore, the development of a highly efficient and accurate detection method is of utmost importance. RESULT: In response to this need, we propose a novel method called FindCSV, which leverages deep learning techniques and consensus sequences to enhance the detection of SVs using long-read sequencing data. Compared to current methods, FindCSV performs better in detecting complex and simple structural variations. CONCLUSIONS: FindCSV is a new method to detect complex and simple structural variations with reasonable accuracy in real and simulated data. The source code for the program is available at https://github.com/nwpuzhengyan/FindCSV .


Sujet(s)
Logiciel , Humains , Apprentissage profond , Variation structurale du génome , Analyse de séquence d'ADN/méthodes , Algorithmes , Séquençage nucléotidique à haut débit/méthodes
4.
Front Genet ; 15: 1451461, 2024.
Article de Anglais | MEDLINE | ID: mdl-39346775

RÉSUMÉ

Gene transcription is a stochastic process that occurs in all organisms. Transcriptional bursting, a critical molecular dynamics mechanism, creates significant heterogeneity in mRNA and protein levels. This heterogeneity drives cellular phenotypic diversity. Currently, the lack of a comprehensive quantitative model limits the research on transcriptional bursting. This review examines various gene expression models and compares their strengths and weaknesses to guide researchers in selecting the most suitable model for their research context. We also provide a detailed summary of the key metrics related to transcriptional bursting. We compared the temporal dynamics of transcriptional bursting across species and the molecular mechanisms influencing these bursts, and highlighted the spatiotemporal patterns of gene expression differences by utilizing metrics such as burst size and burst frequency. We summarized the strategies for modeling gene expression from both biostatistical and biochemical reaction network perspectives. Single-cell sequencing data and integrated multiomics approaches drive our exploration of cutting-edge trends in transcriptional bursting mechanisms. Moreover, we examined classical methods for parameter estimation that help capture dynamic parameters in gene expression data, assessing their merits and limitations to facilitate optimal parameter estimation. Our comprehensive summary and review of the current transcriptional burst dynamics theories provide deeper insights for promoting research on the nature of cell processes, cell fate determination, and cancer diagnosis.

5.
Am J Hum Genet ; 111(9): 2059-2069, 2024 Sep 05.
Article de Anglais | MEDLINE | ID: mdl-39096911

RÉSUMÉ

Co-observation of a gene variant with a pathogenic variant in another gene that explains the disease presentation has been designated as evidence against pathogenicity for commonly used variant classification guidelines. Multiple variant curation expert panels have specified, from consensus opinion, that this evidence type is not applicable for the classification of breast cancer predisposition gene variants. Statistical analysis of sequence data for 55,815 individuals diagnosed with breast cancer from the BRIDGES sequencing project was undertaken to formally assess the utility of co-observation data for germline variant classification. Our analysis included expected loss-of-function variants in 11 breast cancer predisposition genes and pathogenic missense variants in BRCA1, BRCA2, and TP53. We assessed whether co-observation of pathogenic variants in two different genes occurred more or less often than expected under the assumption of independence. Co-observation of pathogenic variants in each of BRCA1, BRCA2, and PALB2 with the remaining genes was less frequent than expected. This evidence for depletion remained after adjustment for age at diagnosis, study design (familial versus population-based), and country. Co-observation of a variant of uncertain significance in BRCA1, BRCA2, or PALB2 with a pathogenic variant in another breast cancer gene equated to supporting evidence against pathogenicity following criterion strength assignment based on the likelihood ratio and showed utility in reclassification of missense BRCA1 and BRCA2 variants identified in BRIDGES. Our approach has applicability for assessing the value of co-observation as a predictor of variant pathogenicity in other clinical contexts, including for gene-specific guidelines developed by ClinGen Variant Curation Expert Panels.


Sujet(s)
Tumeurs du sein , Prédisposition génétique à une maladie , Mutation germinale , Humains , Tumeurs du sein/génétique , Mutation germinale/génétique , Femelle , Protéine BRCA2/génétique , Protéine BRCA1/génétique , Protéine du groupe de complémentation N de l'anémie de Fanconi/génétique , Adulte d'âge moyen , Mutation faux-sens/génétique , Adulte , Protéine p53 suppresseur de tumeur/génétique
6.
Virus Evol ; 10(1): veae061, 2024.
Article de Anglais | MEDLINE | ID: mdl-39175839

RÉSUMÉ

The enigmatic origins and transmission events of the gibbon ape leukemia virus (GALV) and its close relative the koala retrovirus (KoRV) have been a source of enduring debate. Bats and rodents are each proposed as major reservoirs of interspecies transmission, with ongoing efforts to identify additional animal hosts of GALV-KoRV-related retroviruses. In this study, we identified nine rodent species as novel hosts of GALV-KoRV-related retroviruses. Included among these hosts are two African rodents, revealing the first appearance of this clade beyond the Australian and Southeast Asian region. One of these African rodents, Mastomys natalensis, carries an endogenous GALV-KoRV-related retrovirus that is fully intact and potentially still infectious. Our findings support the hypothesis that rodents are the major carriers of GALV-KoRV-related retroviruses.

7.
Methods Mol Biol ; 2846: 215-241, 2024.
Article de Anglais | MEDLINE | ID: mdl-39141239

RÉSUMÉ

Histone post-translational modifications (PTMs) influence the overall structure of the chromatin and gene expression. Over the course of cell differentiation, the distribution of histone modifications is remodeled, resulting in cell type-specific patterns. In the past, their study was limited to abundant cell types that could be purified in necessary numbers. However, studying these cell type-specific dynamic changes in heterogeneous in vivo settings requires sensitive single-cell methods. Current advances in single-cell sequencing methods remove these limitations, allowing the study of nonpurifiable cell types. One complicating factor is that some of the most biologically interesting cell types, including stem and progenitor cells that undergo differentiation, only make up a small fraction of cells in a tissue. This makes whole-tissue analysis rather inefficient. In this chapter, we present a sort-assisted single-cell Chromatin ImmunoCleavage sequencing technique (sortChIC) to map histone PTMs in single cells. This technique combines the mapping of histone PTM location in combination with surface staining-based enrichment, to allow the integration of established strategies for rare cell type enrichment. In general terms, this will enable researchers to quantify local and global chromatin changes in dynamic complex biological systems and can provide additional information on their contribution to lineage and cell-type specification in physiological conditions and disease.


Sujet(s)
Chromatine , Code histone , Histone , Maturation post-traductionnelle des protéines , Analyse sur cellule unique , Analyse sur cellule unique/méthodes , Histone/métabolisme , Humains , Chromatine/métabolisme , Chromatine/génétique , Animaux , Cytométrie en flux/méthodes
8.
Funct Integr Genomics ; 24(5): 139, 2024 Aug 19.
Article de Anglais | MEDLINE | ID: mdl-39158621

RÉSUMÉ

Recent advancements in biomedical technologies and the proliferation of high-dimensional Next Generation Sequencing (NGS) datasets have led to significant growth in the bulk and density of data. The NGS high-dimensional data, characterized by a large number of genomics, transcriptomics, proteomics, and metagenomics features relative to the number of biological samples, presents significant challenges for reducing feature dimensionality. The high dimensionality of NGS data poses significant challenges for data analysis, including increased computational burden, potential overfitting, and difficulty in interpreting results. Feature selection and feature extraction are two pivotal techniques employed to address these challenges by reducing the dimensionality of the data, thereby enhancing model performance, interpretability, and computational efficiency. Feature selection and feature extraction can be categorized into statistical and machine learning methods. The present study conducts a comprehensive and comparative review of various statistical, machine learning, and deep learning-based feature selection and extraction techniques specifically tailored for NGS and microarray data interpretation of humankind. A thorough literature search was performed to gather information on these techniques, focusing on array-based and NGS data analysis. Various techniques, including deep learning architectures, machine learning algorithms, and statistical methods, have been explored for microarray, bulk RNA-Seq, and single-cell, single-cell RNA-Seq (scRNA-Seq) technology-based datasets surveyed here. The study provides an overview of these techniques, highlighting their applications, advantages, and limitations in the context of high-dimensional NGS data. This review provides better insights for readers to apply feature selection and feature extraction techniques to enhance the performance of predictive models, uncover underlying biological patterns, and gain deeper insights into massive and complex NGS and microarray data.


Sujet(s)
Séquençage nucléotidique à haut débit , Apprentissage machine , Humains , Séquençage nucléotidique à haut débit/méthodes , Apprentissage profond
9.
Hum Genomics ; 18(1): 86, 2024 Aug 07.
Article de Anglais | MEDLINE | ID: mdl-39113147

RÉSUMÉ

BACKGROUND: The international disclosure of Chinese human genetic data continues to be a contentious issue in China, generating public debates in both traditional and social media channels. Concerns have intensified after Chinese scientists' research on pangenome data was published in the prestigious journal Nature. METHODS: This study scrutinized microblogs posted on Weibo, a popular Chinese social media site, in the two months immediately following the publication (June 14, 2023-August 21, 2023). Content analysis was conducted to assess the nature of public responses, justifications for positive or negative attitudes, and the users' overall knowledge of how Chinese human genetic information is regulated and managed in China. RESULTS: Weibo users displayed contrasting attitudes towards the article's public disclose of pangenome research data, with 18% positive, 64% negative, and 18% neutral. Positive attitudes came primarily from verified government and media accounts, which praised the publication. In contrast, negative attitudes originated from individual users who were concerned about national security and health risks and often believed that the researchers have betrayed China. The benefits of data sharing highlighted in the commentaries included advancements in disease research and scientific progress. Approximately 16% of the microblogs indicated that Weibo users had misunderstood existing regulations and laws governing data sharing and stewardship. CONCLUSIONS: Based on the predominantly negative public attitudes toward scientific data sharing established by our study, we recommend enhanced outreach by scientists and scientific institutions to increase the public understanding of developments in genetic research, international data sharing, and associated regulations. Additionally, governmental agencies can alleviate public fears and concerns by being more transparent about their security reviews of international collaborative research involving Chinese human genetic data and its cross-border transfer.


Sujet(s)
Recherche biomédicale , Diffusion de l'information , Opinion publique , Médias sociaux , Humains , Chine , Génome humain/génétique , Asiatiques/génétique
10.
AIMS Neurosci ; 11(2): 103-117, 2024.
Article de Anglais | MEDLINE | ID: mdl-38988883

RÉSUMÉ

The central nervous system (CNS) and the immune system collectively coordinate cellular functionalities, sharing common developmental mechanisms. Immunity-related molecules exert an influence on brain development, challenging the conventional view of the brain as immune-privileged. Chronic inflammation emerges as a key player in the pathophysiology of Alzheimer's disease (AD), with increased stress contributing to the disease progression and potentially exacerbating existing symptoms. In this study, the most significant gene signatures from selected RNA-sequencing (RNA-seq) data from AD patients and healthy individuals were obtained and a functional analysis and biological interpretation was conducted, including network and pathway enrichment analysis. Important evidence was reported, such as enrichment in immune system responses and antigen processes, as well as positive regulation of T-cell mediated cytotoxicity and endogenous and exogenous peptide antigen, thus indicating neuroinflammation and immune response participation in disease progression. These findings suggest a disturbance in the immune infiltration of the peripheral immune environment, providing new challenges to explore key biological processes from a molecular perspective that strongly participate in AD development.

11.
Transl Cancer Res ; 13(6): 2704-2720, 2024 Jun 30.
Article de Anglais | MEDLINE | ID: mdl-38988915

RÉSUMÉ

Background: Colorectal cancer (CRC) is one of the leading causes of cancer-related deaths, and improving the prognosis of CRC patients is an urgent concern. The aim of this study was to explore new immunotherapy targets to improve survival in CRC patients. Methods: We analyzed CRC-related single-cell data GSE201348 from the Gene Expression Omnibus (GEO) database, and identified differentially expressed genes (DEGs). Subsequently, we performed differential analysis on the rectum adenocarcinoma (READ) and colon adenocarcinoma (COAD) transcriptome sequencing data [The Cancer Genome Atlas (TCGA)-CRC queue] and clinical data downloaded from TCGA database. Subgroup analysis was performed using CIBERSORTx and cluster analysis. Finally, biomarkers were identified by one-way cox regression as well as least absolute shrinkage and selection operator (LASSO) analysis. Results: In this study, we analyzed CRC-related single-cell data GSE201348, and identified 5,210 DEGs. Subsequently, we performed differential analysis on the TCGA-CRC queue database, and obtained 4,408 DEGs. Then, we categorized the cancer samples in the sequencing data into three groups (k1, k2, and k3), with significant differences observed between the k1 and k2 groups via survival analysis. Further differential analysis on the samples in the k1 and k2 groups identified 1,899 DEGs. A total of 77 DEGs were selected among those DEGs obtained from three differential analyses. Through subsequent Cox univariate analysis and LASSO analysis, seven biomarkers (RETNLB, CLCA4, UGT2A3, SULT1B1, CCL24, BMP5, and ATOH1) were identified and selected to establish a risk score (RS). Conclusions: To sum up, this study demonstrates the potential of the seven-gene prognostic risk model as instrumental variables for predicting the prognosis of CRC.

12.
Aging Cell ; : e14275, 2024 Jul 17.
Article de Anglais | MEDLINE | ID: mdl-39016438

RÉSUMÉ

Renal aging, marked by the accumulation of senescent cells and chronic low-grade inflammation, leads to renal interstitial fibrosis and impaired function. In this study, we investigate the role of macrophages, a key regulator of inflammation, in renal aging by analyzing kidney single-cell RNA sequencing data of C57BL/6J mice from 8 weeks to 24 months. Our findings elucidate the dynamic changes in the proportion of kidney cell types during renal aging and reveal that increased macrophage infiltration contributes to chronic low-grade inflammation, with these macrophages exhibiting senescence and activation of ferroptosis signaling. CellChat analysis indicates enhanced communications between macrophages and tubular cells during aging. Suppressing ferroptosis alleviates macrophage-mediated tubular partial epithelial-mesenchymal transition in vitro, thereby mitigating the expression of fibrosis-related genes. Using SCENIC analysis, we infer Stat1 as a key age-related transcription factor promoting iron dyshomeostasis and ferroptosis in macrophages by regulating the expression of Pcbp1, an iron chaperone protein that inhibits ferroptosis. Furthermore, through virtual screening and molecular docking from a library of anti-aging compounds, we construct a docking model targeting Pcbp1, which indicates that the natural small molecule compound Rutin can suppress macrophage senescence and ferroptosis by preserving Pcbp1. In summary, our study underscores the crucial role of macrophage iron dyshomeostasis and ferroptosis in renal aging. Our results also suggest Pcbp1 as an intervention target in aging-related renal fibrosis and highlight Rutin as a potential therapeutic agent in mitigating age-related renal chronic low-grade inflammation and fibrosis.

13.
medRxiv ; 2024 Jul 06.
Article de Anglais | MEDLINE | ID: mdl-39006429

RÉSUMÉ

PGAP3 is a glycosylphosphatidylinositol (GPI) phospholipase gene localized within chromosome 17q12-21, a region highly linked to asthma. Although much is known about the function of other chromosome 17q12-21 genes expressed at increased levels in bronchial epithelium such as ORMDL3 and GSDMB, little is known about the function of increased PGAP3 expression in bronchial epithelium in the context of asthma. The aim of this study was therefore to determine whether increased PGAP3 expression in human bronchial epithelial cells regulated expression of mRNA pathways important to the pathogenesis of asthma by utilizing RNA-sequencing and bioinformatic analysis. We performed RNA-sequencing on normal human bronchial epithelial cells transfected with PGAP3 for 24 and 48 hours. PGAP3 regulated genes were compared to asthma and respiratory virus (influenza A, rhinovirus, respiratory syncytial virus) reference data sets to identify PGAP3 target genes and pathways. Approximately 9% of the upregulated PGAP3-induced genes were found in an asthma reference data set, 41% in a rhinovirus reference data set, 33% in an influenza A reference data set, and 3% in a respiratory syncytial virus reference data set. PGAP3 significantly upregulated the expression of several genes associated with the innate immune response and viral signatures of respiratory viruses associated with asthma exacerbations. Two of the highest expressed genes induced by PGAP3 are RSAD2, OASL, and IFN-λ, which are anti-viral genes associated with asthma. PGAP3 also upregulated the antiviral gene BST2, which like PGAP3 is a GPI-anchored protein. We conclude that PGAP3 expression in human bronchial epithelial cells regulates expression of genes known to be linked to asthma, and also regulates the bronchial epithelial expression of genes pertinent to the pathogenesis of respiratory viral triggered asthma exacerbations.

14.
Biology (Basel) ; 13(7)2024 Jun 21.
Article de Anglais | MEDLINE | ID: mdl-39056656

RÉSUMÉ

Fibroblast heterogeneity remains undefined in eosinophilic esophagitis (EoE), an allergic inflammatory disorder complicated by fibrosis. We utilized publicly available single-cell RNA sequencing data (GSE201153) of EoE esophageal biopsies to identify fibroblast sub-populations, related transcriptomes, disease status-specific pathways and cell-cell interactions. IL13-treated fibroblast cultures were used to model active disease. At least 2 fibroblast populations were identified, F_A and F_B. Several genes including ACTA2 were more enriched in F_A. F_B percentage was greater than F_A and epithelial-mesenchymal transition upregulated in F_B vs. F_A in active and remission EoE. Epithelial-mesenchymal transition was also upregulated in F_B in active vs. remission EoE and TNF-α signaling via NFKB was downregulated in F_A. IL-13 treatment upregulated ECM-related genes more profoundly in ACTA2- fibroblasts than ACTA2+ myofibroblasts. After proliferating epithelial cells, F_B and F_A contributed most to cell-cell communication networks. ECM-Receptor interaction strength was stronger than secreted or cell-cell contact signaling in active vs. remission EoE and significant ligand-receptor pairs were driven mostly by F_B. This unbiased analysis identifies at least 2 fibroblast sub-populations in EoE in vivo, distinguished in part by ACTA2. Fibroblasts play a critical role in cell-cell interactions in EoE, most profoundly via ECM-receptor signaling via the F_B sub-group.

15.
Heliyon ; 10(13): e33682, 2024 Jul 15.
Article de Anglais | MEDLINE | ID: mdl-39040257

RÉSUMÉ

Aims: This study explored the molecular and biologic mechanisms underlying the association between circadian rhythm disorders (CRD) and increased risk for hepatocellular carcinoma (HCC). Background: CRD are linked to increased risk for HCC, but the molecular and biologic mechanisms underlying this association are limited.ObjectiveThe study constructed and validated a CRD related gene model as an independent prognostic factor for HCC, providing insight into the molecular mechanisms linking CRD to increased HCC risk and identifying potential indicators for the efficacy of immunotherapy and anticancer drugs. This helps provide important clues for personalized treatment strategies for HCC patients. Methods: Gene sets correlated with circadian rhythm were obtained from the Molecular Signatures Database (MSigDB) to intersect with differentially expressed genes (DEGs) between tumor samples and control samples in The Cancer Genome Atlas (TCGA) and HCCDB18 from Hepatocellular Carcinoma Cell DataBase (HCCDB). The CRD related gene model was developed by univariate Cox and stepwise multivariate analysis. Immune checkpoint blockade (ICB) therapy and anticancer drugs were analyzed using the tumor immune dysfunction and exclusion (TIDE) and pRRophetic, respectively. Seurat determined the cell type of HCC by analyzing single-cell data, and malignant cells were identified using Copykat. To detect the mRNA levels of genes in the CRD related gene model, quantitative real-time polymerase chain reaction (qRT-PCR) was carried out. Results: The activity of circadian rhythm in HCC tissue was significantly lower than that in control tissue. Subsequently, EZH2, IMPDH2, TYMS and SERPINE1 were selected to construct the CRD related gene model, which was an independent factor for HCC prognosis. Notably, low-risk patients had lower levels of immune cell infiltration and lower TIDE scores compared to high-risk patients with HCC, indicating that patients with a low risk may derive more benefit from immunotherapy. IMPDH2, TYMS and SERPINE1 expressed significantly higher in malignant cells than in benign epithelial cells. Conclusions: This study presents a CRD related gene model to reveal the molecular perspective of the dependent mechanism of the association between CRD and cancer, which provides a potential indicator for understanding the preclinical efficacy of ICB and anticancer drugs.

16.
Biostatistics ; 2024 Jun 17.
Article de Anglais | MEDLINE | ID: mdl-38887902

RÉSUMÉ

Although transcriptomics data is typically used to analyze mature spliced mRNA, recent attention has focused on jointly investigating spliced and unspliced (or precursor-) mRNA, which can be used to study gene regulation and changes in gene expression production. Nonetheless, most methods for spliced/unspliced inference (such as RNA velocity tools) focus on individual samples, and rarely allow comparisons between groups of samples (e.g. healthy vs. diseased). Furthermore, this kind of inference is challenging, because spliced and unspliced mRNA abundance is characterized by a high degree of quantification uncertainty, due to the prevalence of multi-mapping reads, ie reads compatible with multiple transcripts (or genes), and/or with both their spliced and unspliced versions. Here, we present DifferentialRegulation, a Bayesian hierarchical method to discover changes between experimental conditions with respect to the relative abundance of unspliced mRNA (over the total mRNA). We model the quantification uncertainty via a latent variable approach, where reads are allocated to their gene/transcript of origin, and to the respective splice version. We designed several benchmarks where our approach shows good performance, in terms of sensitivity and error control, vs. state-of-the-art competitors. Importantly, our tool is flexible, and works with both bulk and single-cell RNA-sequencing data. DifferentialRegulation is distributed as a Bioconductor R package.

17.
Evol Appl ; 17(6): e13697, 2024 Jun.
Article de Anglais | MEDLINE | ID: mdl-38911262

RÉSUMÉ

As an invaluable Chinese sheep germplasm resource, Hu sheep are renowned for their high fertility and beautiful wavy lambskins. Their distinctive characteristics have evolved over time through a combination of artificial and natural selection. Identifying selection signatures in Hu sheep can provide a straightforward insight into the mechanism of selection and further uncover the candidate genes associated with breed-specific traits subject to selection. Here, we conducted whole-genome resequencing on 206 Hu sheep individuals, each with an approximate 6-fold depth of coverage. And then we employed three complementary approaches, including composite likelihood ratio, integrated haplotype homozygosity score and the detection of runs of homozygosity, to detect selection signatures. In total, 10 candidate genomic regions displaying selection signatures were simultaneously identified by multiple methods, spanning 88.54 Mb. After annotating, these genomic regions harbored collectively 92 unique genes. Interestingly, 32 candidate genes associated with reproduction were distributed in nine genomic regions detected. Out of them, two stood out as star candidates: BMPR1B and GNRH2, both of which have documented associations with fertility, and a HOXA gene cluster (HOXA1-5, HOXA9, HOXA10, HOXA11 and HOXA13) had also been linked to fertility. Additionally, we identified other genes that are related to hair follicle development (LAMTOR3, EEF1A2), ear size (HOXA1, KCNQ2), fat tail formation (HOXA10, HOXA11), growth and development (FAF1, CCNDBP1, GJB2, GJA3), fat deposition (ACOXL, JAZF1, HOXA3, HOXA4, HOXA5, EBF4), immune (UBR1, FASTKD5) and feed intake (DAPP1, RNF17, NPBWR2). Our results offer novel insights into the genetic mechanisms underlying the selection of breed-specific traits in Hu sheep and provide a reference for sheep genetic improvement programs.

18.
BMC Bioinformatics ; 25(1): 184, 2024 May 09.
Article de Anglais | MEDLINE | ID: mdl-38724907

RÉSUMÉ

BACKGROUND: Major advances in sequencing technologies and the sharing of data and metadata in science have resulted in a wealth of publicly available datasets. However, working with and especially curating public omics datasets remains challenging despite these efforts. While a growing number of initiatives aim to re-use previous results, these present limitations that often lead to the need for further in-house curation and processing. RESULTS: Here, we present the Omics Dataset Curation Toolkit (OMD Curation Toolkit), a python3 package designed to accompany and guide the researcher during the curation process of metadata and fastq files of public omics datasets. This workflow provides a standardized framework with multiple capabilities (collection, control check, treatment and integration) to facilitate the arduous task of curating public sequencing data projects. While centered on the European Nucleotide Archive (ENA), the majority of the provided tools are generic and can be used to curate datasets from different sources. CONCLUSIONS: Thus, it offers valuable tools for the in-house curation previously needed to re-use public omics data. Due to its workflow structure and capabilities, it can be easily used and benefit investigators in developing novel omics meta-analyses based on sequencing data.


Sujet(s)
Curation de données , Logiciel , Flux de travaux , Curation de données/méthodes , Métadonnées , Bases de données génétiques , Génomique/méthodes , Biologie informatique/méthodes
19.
Brief Bioinform ; 25(3)2024 Mar 27.
Article de Anglais | MEDLINE | ID: mdl-38701418

RÉSUMÉ

Coverage quantification is required in many sequencing datasets within the field of genomics research. However, most existing tools fail to provide comprehensive statistical results and exhibit limited performance gains from multithreading. Here, we present PanDepth, an ultra-fast and efficient tool for calculating coverage and depth from sequencing alignments. PanDepth outperforms other tools in computation time and memory efficiency for both BAM and CRAM-format alignment files from sequencing data, regardless of read length. It employs chromosome parallel computation and optimized data structures, resulting in ultrafast computation speeds and memory efficiency. It accepts sorted or unsorted BAM and CRAM-format alignment files as well as GTF, GFF and BED-formatted interval files or a specific window size. When provided with a reference genome sequence and the option to enable GC content calculation, PanDepth includes GC content statistics, enhancing the accuracy and reliability of copy number variation analysis. Overall, PanDepth is a powerful tool that accelerates scientific discovery in genomics research.


Sujet(s)
Génomique , Logiciel , Génomique/méthodes , Humains , Analyse de séquence d'ADN/méthodes , Séquençage nucléotidique à haut débit/méthodes , Composition en bases nucléiques , Variations de nombre de copies de segment d'ADN , Biologie informatique/méthodes , Algorithmes , Alignement de séquences/méthodes
20.
Hum Mol Genet ; 33(16): 1429-1441, 2024 Aug 06.
Article de Anglais | MEDLINE | ID: mdl-38747556

RÉSUMÉ

Inflammation biomarkers can provide valuable insight into the role of inflammatory processes in many diseases and conditions. Sequencing based analyses of such biomarkers can also serve as an exemplar of the genetic architecture of quantitative traits. To evaluate the biological insight, which can be provided by a multi-ancestry, whole-genome based association study, we performed a comprehensive analysis of 21 inflammation biomarkers from up to 38 465 individuals with whole-genome sequencing from the Trans-Omics for Precision Medicine (TOPMed) program (with varying sample size by trait, where the minimum sample size was n = 737 for MMP-1). We identified 22 distinct single-variant associations across 6 traits-E-selectin, intercellular adhesion molecule 1, interleukin-6, lipoprotein-associated phospholipase A2 activity and mass, and P-selectin-that remained significant after conditioning on previously identified associations for these inflammatory biomarkers. We further expanded upon known biomarker associations by pairing the single-variant analysis with a rare variant set-based analysis that further identified 19 significant rare variant set-based associations with 5 traits. These signals were distinct from both significant single variant association signals within TOPMed and genetic signals observed in prior studies, demonstrating the complementary value of performing both single and rare variant analyses when analyzing quantitative traits. We also confirm several previously reported signals from semi-quantitative proteomics platforms. Many of these signals demonstrate the extensive allelic heterogeneity and ancestry-differentiated variant-trait associations common for inflammation biomarkers, a characteristic we hypothesize will be increasingly observed with well-powered, large-scale analyses of complex traits.


Sujet(s)
Marqueurs biologiques , Étude d'association pangénomique , Inflammation , Médecine de précision , Séquençage du génome entier , Humains , Médecine de précision/méthodes , Inflammation/génétique , Étude d'association pangénomique/méthodes , Séquençage du génome entier/méthodes , Polymorphisme de nucléotide simple , Locus de caractère quantitatif , Prédisposition génétique à une maladie , Femelle , Interleukine-6/génétique
SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE