Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 88
Filtrar
1.
BMC Bioinformatics ; 25(1): 222, 2024 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-38914932

RESUMO

BACKGROUND: Pan-virus detection, and virome investigation in general, can be challenging, mainly due to the lack of universally conserved genetic elements in viruses. Metagenomic next-generation sequencing can offer a promising solution to this problem by providing an unbiased overview of the microbial community, enabling detection of any viruses without prior target selection. However, a major challenge in utilising metagenomic next-generation sequencing for virome investigation is that data analysis can be highly complex, involving numerous data processing steps. RESULTS: Here, we present Entourage to address this challenge. Entourage enables short-read sequence assembly, viral sequence search with or without reference virus targets using contig-based approaches, and intrasample sequence variation quantification. Several workflows are implemented in Entourage to facilitate end-to-end virus sequence detection analysis through a single command line, from read cleaning, sequence assembly, to virus sequence searching. The results generated are comprehensive, allowing for thorough quality control, reliability assessment, and interpretation. We illustrate Entourage's utility as a streamlined workflow for virus detection by employing it to comprehensively search for target virus sequences and beyond in raw sequence read data generated from HeLa cell culture samples spiked with viruses. Furthermore, we showcase its flexibility and performance on a real-world dataset by analysing a preassembled Tara Oceans dataset. Overall, our results show that Entourage performs well even with low virus sequencing depth in single digits, and it can be used to discover novel viruses effectively. Additionally, by using sequence data generated from a patient with chronic SARS-CoV-2 infection, we demonstrate Entourage's capability to quantify virus intrasample genetic variations, and generate publication-quality figures illustrating the results. CONCLUSIONS: Entourage is an all-in-one, versatile, and streamlined bioinformatics software for virome investigation, developed with a focus on ease of use. Entourage is available at https://codeberg.org/CENMIG/Entourage under the MIT license.


Assuntos
Genoma Viral , Sequenciamento de Nucleotídeos em Larga Escala , SARS-CoV-2 , Software , Genoma Viral/genética , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , SARS-CoV-2/genética , Metagenômica/métodos , Vírus/genética , COVID-19/virologia , Viroma/genética , Células HeLa
2.
Front Plant Sci ; 15: 1383986, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38784062

RESUMO

Introduction: Plant-pathogen interaction is an inexhaustible source of information on how to sustainably control diseases that negatively affect agricultural production. Meloidogyne incognita is a root-knot nematode (RKN), representing a pest for many crops, including tomato (Solanum lycopersicum). RKNs are a global threat to agriculture, especially under climate change, and RNA technologies offer a potential alternative to chemical nematicides. While endogenous microRNAs have been identified in both S. lycopersicum and M. incognita, and their roles have been related to the regulation of developmental changes, no study has investigated the miRNAs cross-kingdom transfer during this interaction. Methods: Here, we propose a bioinformatics pipeline to highlight potential miRNA-dependent cross-kingdom interactions between tomato and M. incognita. Results: The obtained data show that nematode miRNAs putatively targeting tomato genes are mostly related to detrimental effects on plant development and defense. Similarly, tomato miRNAs putatively targeting M. incognita biological processes have negative effects on digestion, mobility, and reproduction. To experimentally test this hypothesis, an in vitro feeding assay was carried out using sly-miRNAs selected from the bioinformatics approach. The results show that two tomato miRNAs (sly-miRNA156a, sly-miR169f) soaked by juvenile larvae (J2s) affected their ability to infect plant roots and form galls. This was also coupled with a significant downregulation of predicted target genes (Minc11367, Minc00111), as revealed by a qRT-PCR analysis. Discussions: Therefore, the current study expands the knowledge related to the cross-kingdom miRNAs involvement in host-parasite interactions and could pave the way for the application of exogenous plant miRNAs as tools to control nematode infection.

3.
Brief Funct Genomics ; 2024 Mar 30.
Artigo em Inglês | MEDLINE | ID: mdl-38555493

RESUMO

Genomic data analysis has witnessed a surge in complexity and volume, primarily driven by the advent of high-throughput technologies. In particular, studying chromatin loops and structures has become pivotal in understanding gene regulation and genome organization. This systematic investigation explores the realm of specialized bioinformatics pipelines designed specifically for the analysis of chromatin loops and structures. Our investigation incorporates two protein (CTCF and Cohesin) factor-specific loop interaction datasets from six distinct pipelines, amassing a comprehensive collection of 36 diverse datasets. Through a meticulous review of existing literature, we offer a holistic perspective on the methodologies, tools and algorithms underpinning the analysis of this multifaceted genomic feature. We illuminate the vast array of approaches deployed, encompassing pivotal aspects such as data preparation pipeline, preprocessing, statistical features and modelling techniques. Beyond this, we rigorously assess the strengths and limitations inherent in these bioinformatics pipelines, shedding light on the interplay between data quality and the performance of deep learning models, ultimately advancing our comprehension of genomic intricacies.

4.
Genes (Basel) ; 15(3)2024 03 11.
Artigo em Inglês | MEDLINE | ID: mdl-38540411

RESUMO

BACKGROUND: The advancement of next-generation sequencing (NGS) technologies provides opportunities for large-scale Pharmacogenetic (PGx) studies and pre-emptive PGx testing to cover a wide range of genotypes present in diverse populations. However, NGS-based PGx testing is limited by the lack of comprehensive computational tools to support genetic data analysis and clinical decisions. METHODS: Bioinformatics utilities specialized for human genomics and the latest cloud-based technologies were used to develop a bioinformatics pipeline for analyzing the genomic sequence data and reporting PGx genotypes. A database was created and integrated in the pipeline for filtering the actionable PGx variants and clinical interpretations. Strict quality verification procedures were conducted on variant calls with the whole genome sequencing (WGS) dataset of the 1000 Genomes Project (G1K). The accuracy of PGx allele identification was validated using the WGS dataset of the Pharmacogenetics Reference Materials from the Centers for Disease Control and Prevention (CDC). RESULTS: The newly created bioinformatics pipeline, Pgxtools, can analyze genomic sequence data, identify actionable variants in 13 PGx relevant genes, and generate reports annotated with specific interpretations and recommendations based on clinical practice guidelines. Verified with two independent methods, we have found that Pgxtools consistently identifies variants more accurately than the results in the G1K dataset on GRCh37 and GRCh38. CONCLUSIONS: Pgxtools provides an integrated workflow for large-scale genomic data analysis and PGx clinical decision support. Implemented with cloud-native technologies, it is highly portable in a wide variety of environments from a single laptop to High-Performance Computing (HPC) clusters and cloud platforms for different production scales and requirements.


Assuntos
Farmacogenética , Testes Farmacogenômicos , Humanos , Farmacogenética/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genômica/métodos , Biologia Computacional
5.
Sci Rep ; 14(1): 4068, 2024 02 19.
Artigo em Inglês | MEDLINE | ID: mdl-38374282

RESUMO

The gut microbiome is a diverse ecosystem, dominated by bacteria; however, fungi, phages/viruses, archaea, and protozoa are also important members of the gut microbiota. Exploration of taxonomic compositions beyond bacteria as well as an understanding of the interaction between the bacteriome with the other members is limited using 16S rDNA sequencing. Here, we developed a pipeline enabling the simultaneous interrogation of the gut microbiome (bacteriome, mycobiome, archaeome, eukaryome, DNA virome) and of antibiotic resistance genes based on optimized long-read shotgun metagenomics protocols and custom bioinformatics. Using our pipeline we investigated the longitudinal composition of the gut microbiome in an exploratory clinical study in patients undergoing allogeneic hematopoietic stem cell transplantation (alloHSCT; n = 31). Pre-transplantation microbiomes exhibited a 3-cluster structure, characterized by Bacteroides spp. /Phocaeicola spp., mixed composition and Enterococcus abundances. We revealed substantial inter-individual and temporal variabilities of microbial domain compositions, human DNA, and antibiotic resistance genes during the course of alloHSCT. Interestingly, viruses and fungi accounted for substantial proportions of microbiome content in individual samples. In the course of HSCT, bacterial strains were stable or newly acquired. Our results demonstrate the disruptive potential of alloHSCTon the gut microbiome and pave the way for future comprehensive microbiome studies based on long-read metagenomics.


Assuntos
Microbioma Gastrointestinal , Transplante de Células-Tronco Hematopoéticas , Microbiota , Humanos , Microbioma Gastrointestinal/genética , Microbiota/genética , Bactérias/genética , Antibacterianos , Fungos/genética , DNA Ribossômico , Metagenômica/métodos
6.
FEMS Microbiol Lett ; 3712024 Jan 09.
Artigo em Inglês | MEDLINE | ID: mdl-38305133

RESUMO

A comprehensive profiling of microbial diversity is essential to understand the ecosystem functions. Universal primer sets such as the 515Y/926R could amplify a part of 16S and 18S rRNA and infer the diversity of prokaryotes and eukaryotes. However, the analyses of mixed sequencing data pose a bioinformatics challenge; the 16S and 18S rRNA sequences need to be separated first and analysed individually/independently due to variations in the amplicon length. This study describes an alternative strategy, a merging and concatenation workflow, to analyse the mixed amplicon data without separating the 16S and 18S rRNA sequences. The workflow was tested with 24 mock community (MC) samples, and the analyses resolved the composition of prokaryotes and eukaryotes adequately. In addition, there was a strong correlation (cor = 0.950; P-value = 4.754e-10) between the observed and expected abundances in the MC samples, which suggests that the computational approach could infer the microbial proportions accurately. Further, 18 samples collected from the Sundarbans mangrove region were analysed as a case study. The analyses identified Proteobacteria, Bacteroidota, Actinobacteriota, Cyanobacteria, and Crenarchaeota as dominant bacterial phyla and eukaryotic divisions such as Metazoa, Gyrista, Cryptophyta, Chlorophyta, and Dinoflagellata were found to be dominant in the samples. Thus, the results support the applicability of the method in environmental microbiome research. The merging and concatenation workflow presented here requires considerably less computational resources and uses widely/commonly used bioinformatics packages, saving researchers analyses time (for equivalent sample numbers, compared to the conventional approach) required to infer the diversity of major microbial domains from mixed amplicon data at comparable accuracy.


Assuntos
Microbiota , RNA Ribossômico 18S/genética , Fluxo de Trabalho , Microbiota/genética , Bactérias/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Biologia Computacional , RNA Ribossômico 16S/genética
7.
HLA ; 103(1): e15273, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37899688

RESUMO

The complement component 4 gene loci, composed of the C4A and C4B genes and located on chromosome 6, encodes for complement component 4 (C4) proteins, a key intermediate in the classical and lectin pathways of the complement system. The complement system is an important modulator of immune system activity and is also involved in the clearance of immune complexes and cellular debris. C4A and C4B gene loci exhibit copy number variation, with each composite gene varying between 0 and 5 copies per haplotype. C4A and C4B genes also vary in size depending on the presence of the human endogenous retrovirus (HERV) in intron 9, denoted by C4(L) for long-form and C4(S) for short-form, which affects expression and is found in both C4A and C4B. Additionally, human blood group antigens Rodgers and Chido are located on the C4 protein, with the Rodger epitope generally found on C4A protein, and the Chido epitope generally found on C4B protein. C4A and C4B copy number variation has been implicated in numerous autoimmune and pathogenic diseases. Despite the central role of C4 in immune function and regulation, high-throughput genomic sequence analysis of C4A and C4B variants has been impeded by the high degree of sequence similarity and complex genetic variation exhibited by these genes. To investigate C4 variation using genomic sequencing data, we have developed a novel bioinformatic pipeline for comprehensive, high-throughput characterization of human C4A and C4B sequences from short-read sequencing data, named C4Investigator. Using paired-end targeted or whole genome sequence data as input, C4Investigator determines the overall gene copy numbers, as well as C4A, C4B, C4(Rodger), C4(Ch), C4(L), and C4(S). Additionally, C4Ivestigator reports the full overall C4A and C4B aligned sequence, enabling nucleotide level analysis. To demonstrate the utility of this workflow we have analyzed C4A and C4B variation in the 1000 Genomes Project Data set, showing that these genes are highly poly-allelic with many variants that have the potential to impact C4 protein function.


Assuntos
Complemento C4b , Variações do Número de Cópias de DNA , Humanos , Complemento C4b/genética , Alelos , Complemento C4/genética , Genômica , Análise de Sequência , Epitopos
8.
DNA Res ; 31(1)2024 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-38102723

RESUMO

The epigenome, which reflects the modifications on chromatin or DNA sequences, provides crucial insight into gene expression regulation and cellular activity. With the continuous accumulation of epigenomic datasets such as chromatin immunoprecipitation followed by sequencing (ChIP-seq) data, there is a great demand for a streamlined pipeline to consistently process them, especially for large-dataset comparisons involving hundreds of samples. Here, we present Churros, an end-to-end epigenomic analysis pipeline that is environmentally independent and optimized for handling large-scale data. We successfully demonstrated the effectiveness of Churros by analyzing large-scale ChIP-seq datasets with the hg38 or Telomere-to-Telomere (T2T) human reference genome. We found that applying T2T to the typical analysis workflow has important impacts on read mapping, quality checks, and peak calling. We also introduced a useful feature to study context-specific epigenomic landscapes. Churros will contribute a comprehensive and unified resource for analyzing large-scale epigenomic data.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , Epigenômica , Humanos , Cromatina/genética , Regulação da Expressão Gênica , Imunoprecipitação da Cromatina , Análise de Sequência de DNA , Sequenciamento de Nucleotídeos em Larga Escala
9.
Biology (Basel) ; 12(12)2023 Nov 26.
Artigo em Inglês | MEDLINE | ID: mdl-38132293

RESUMO

Transposons are mobile DNA sequences that contribute large fractions of many plant genomes. They provide exclusive resources for tracking gene and genome evolution and for developing molecular tools for basic and applied research. Despite extensive efforts, it is still challenging to accurately annotate transposons, especially for beginners, as transposon prediction requires necessary expertise in both transposon biology and bioinformatics. Moreover, the complexity of plant genomes and the dynamic evolution of transposons also bring difficulties for genome-wide transposon discovery. This review summarizes the three major strategies for transposon detection including repeat-based, structure-based, and homology-based annotation, and introduces the transposon superfamilies identified in plants thus far, and some related bioinformatics resources for detecting plant transposons. Furthermore, it describes transposon classification and explains why the terms 'autonomous' and 'non-autonomous' cannot be used to classify the superfamilies of transposons. Lastly, this review also discusses how to identify misannotated transposons and improve the quality of the transposon database. This review provides helpful information about plant transposons and a beginner's guide on annotating these repetitive sequences.

10.
Environ Sci Pollut Res Int ; 30(56): 118976-118988, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37922087

RESUMO

The COVID-19 pandemic has emphasized the urgency for rapid public health surveillance methods to detect and monitor the transmission of infectious diseases. The wastewater-based epidemiology (WBE) has emerged as a promising tool for proactive analysis and quantification of infectious pathogens within a population before clinical cases emerge. In the present study, we aimed to assess the trend and dynamics of SARS-CoV-2 variants using a longitudinal approach. Our objective included early detection and monitoring of these variants to enhance our understanding of their prevalence and potential impact. To achieve our goals, we conducted real-time quantitative polymerase chain reaction (RT-qPCR) and Illumina sequencing on 442 wastewater (WW) samples collected from 10 sewage treatment plants (STPs) in Pune city, India, spanning from November 2021 to April 2022. Our comprehensive analysis identified 426 distinct lineages representing 17 highly transmissible variants of SARS-CoV-2. Notably, fragments of Omicron variant were detected in WW samples prior to its first clinical detection in Botswana. Furthermore, we observed highly contagious sub-lineages of the Omicron variant, including BA.1 (~28%), BA.1.X (1.0-72%), BA.2 (1.0-18%), BA.2.X (1.0-97.4%) BA.2.12 (0.8-0.25%), BA.2.38 (0.8-1.0%), BA.2.75 (0.01-0.02%), BA.3 (0.09-6.3%), BA.4 (0.24-0.29%), and XBB (0.01-21.83%), with varying prevalence rates. Overall, the present study demonstrated the practicality of WBE in the early detection of SARS-CoV-2 variants, which could help track future outbreaks of SARS-CoV-2. Such approaches could be implicated in monitoring infectious agents before they appear in clinical cases.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , Pandemias , COVID-19/epidemiologia , Índia , Genômica , Águas Residuárias
11.
Genome Med ; 15(1): 94, 2023 11 09.
Artigo em Inglês | MEDLINE | ID: mdl-37946251

RESUMO

BACKGROUND: Whole genome sequencing is increasingly being used for the diagnosis of patients with rare diseases. However, the diagnostic yields of many studies, particularly those conducted in a healthcare setting, are often disappointingly low, at 25-30%. This is in part because although entire genomes are sequenced, analysis is often confined to in silico gene panels or coding regions of the genome. METHODS: We undertook WGS on a cohort of 122 unrelated rare disease patients and their relatives (300 genomes) who had been pre-screened by gene panels or arrays. Patients were recruited from a broad spectrum of clinical specialties. We applied a bioinformatics pipeline that would allow comprehensive analysis of all variant types. We combined established bioinformatics tools for phenotypic and genomic analysis with our novel algorithms (SVRare, ALTSPLICE and GREEN-DB) to detect and annotate structural, splice site and non-coding variants. RESULTS: Our diagnostic yield was 43/122 cases (35%), although 47/122 cases (39%) were considered solved when considering novel candidate genes with supporting functional data into account. Structural, splice site and deep intronic variants contributed to 20/47 (43%) of our solved cases. Five genes that are novel, or were novel at the time of discovery, were identified, whilst a further three genes are putative novel disease genes with evidence of causality. We identified variants of uncertain significance in a further fourteen candidate genes. The phenotypic spectrum associated with RMND1 was expanded to include polymicrogyria. Two patients with secondary findings in FBN1 and KCNQ1 were confirmed to have previously unidentified Marfan and long QT syndromes, respectively, and were referred for further clinical interventions. Clinical diagnoses were changed in six patients and treatment adjustments made for eight individuals, which for five patients was considered life-saving. CONCLUSIONS: Genome sequencing is increasingly being considered as a first-line genetic test in routine clinical settings and can make a substantial contribution to rapidly identifying a causal aetiology for many patients, shortening their diagnostic odyssey. We have demonstrated that structural, splice site and intronic variants make a significant contribution to diagnostic yield and that comprehensive analysis of the entire genome is essential to maximise the value of clinical genome sequencing.


Assuntos
Variação Genética , Doenças Raras , Humanos , Doenças Raras/diagnóstico , Doenças Raras/genética , Sequenciamento Completo do Genoma , Testes Genéticos , Mutação , Proteínas de Ciclo Celular
12.
BMC Res Notes ; 16(1): 248, 2023 Oct 02.
Artigo em Inglês | MEDLINE | ID: mdl-37784104

RESUMO

OBJECTIVE: Black poplar (Populus nigra L.) is a species native to Eurasia with a wide distribution area. It is an ecologically important species from riparian ecosystems, that is used as a parent of interspecific (P. deltoides x P. nigra) cultivated poplar hybrids. Variant detection from transcriptomics sequences of 241 P. nigra individuals, sampled in natural populations from 11 river catchments (in four European countries) is described here. These data provide new valuable resources for population structure analysis, population genomics and genome-wide association studies. DATA DESCRIPTION: We generated transcriptomics data from a mixture of young differentiating xylem and cambium tissues of 480 Populus nigra trees sampled in a common garden experiment located at Orléans (France), corresponding to 241 genotypes (2 clonal replicates per genotype, at maximum) by using RNAseq technology. We launched on the resulting sequences an in-silico pipeline that allowed us to obtain 878,957 biallelic polymorphisms without missing data. More than 99% of these positions are annotated and 98.8% are located on the 19 chromosomes of the P. trichocarpa reference genome. The raw RNAseq sequences are available at the NCBI Sequence Read Archive SPR188754 and the variant dataset at the Recherche Data Gouv repository under https://doi.org/10.15454/8DQXK5 .


Assuntos
Populus , Humanos , Populus/genética , Ecossistema , Estudo de Associação Genômica Ampla , Genótipo , França
13.
BMC Bioinformatics ; 24(1): 317, 2023 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-37608271

RESUMO

BACKGROUND: Transposable elements (TEs) are short, mobile DNA elements that are known to play important roles in the genomes of many eukaryotic species. The identification and categorization of these elements is a critical task for many genomic studies, and the continued increase in the number of de novo assembled genomes demands new tools to improve the efficiency of this process. For this reason, we developed RepBox, a suite of Python scripts that combine several pre-existing family-specific TE detection methods into a single user-friendly pipeline. RESULTS: Based on comparisons of RepBox with the standard TE detection software RepeatModeler, we find that RepBox consistently classifies more elements and is also able to identify a more diverse array of TE families than the existing methods in plant genomes. CONCLUSIONS: The performance of RepBox on two different plant genomes indicates that our toolbox represents a significant improvement over existing TE detection methods, and should facilitate future TE annotation efforts in additional species.


Assuntos
Elementos de DNA Transponíveis , Eucariotos , Humanos , Elementos de DNA Transponíveis/genética , Células Eucarióticas , Genoma de Planta , Genômica
14.
G3 (Bethesda) ; 13(11)2023 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-37585487

RESUMO

Genetic modifiers are variants modulating phenotypic outcomes of a primary detrimental variant. They contribute to rare diseases phenotypic variability, but their identification is challenging. Genetic screening with model organisms is a widely used method for demystifying genetic modifiers. Forward genetics screening followed by whole genome sequencing allows the detection of variants throughout the genome but typically produces thousands of candidate variants making the interpretation and prioritization process very time-consuming and tedious. Despite whole genome sequencing is more time and cost-efficient, usage of computational pipelines specific to modifier identification remains a challenge for biological-experiment-focused laboratories doing research with model organisms. To facilitate a broader implementation of whole genome sequencing in genetic screens, we have developed Model Organism Modifier or MOM, a pipeline as a user-friendly Galaxy workflow. Model Organism Modifier analyses raw short-read whole genome sequencing data and implements tailored filtering to provide a Candidate Variant List short enough to be further manually curated. We provide a detailed tutorial to run the Galaxy workflow Model Organism Modifier and guidelines to manually curate the Candidate Variant Lists. We have tested Model Organism Modifier on published and validated Caenorhabditis elegans modifiers screening datasets. As whole genome sequencing facilitates high-throughput identification of genetic modifiers in model organisms, Model Organism Modifier provides a user-friendly solution to implement the bioinformatics analysis of the short-read datasets in laboratories without expertise or support in Bioinformatics.


Assuntos
Caenorhabditis elegans , Genoma , Animais , Caenorhabditis elegans/genética , Fluxo de Trabalho , Mapeamento Cromossômico , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software
15.
bioRxiv ; 2023 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-37503256

RESUMO

The complement component 4 gene locus, composed of the C4A and C4B genes and located on chromosome 6, encodes for C4 protein, a key intermediate in the classical and lectin pathways of the complement system. The complement system is an important modulator of immune system activity and is also involved in the clearance of immune complexes and cellular debris. The C4 gene locus exhibits copy number variation, with each composite gene varying between 0-5 copies per haplotype, C4 genes also vary in size depending on the presence of the HERV retrovirus in intron 9, denoted by C4(L) for long-form and C4(S) for short-form, which modulates expression and is found in both C4A and C4B. Additionally, human blood group antigens Rodgers and Chido are located on the C4 protein, with the Rodger epitope generally found on C4A protein, and the Chido epitope generally found on C4B protein. C4 copy number variation has been implicated in numerous autoimmune and pathogenic diseases. Despite the central role of C4 in immune function and regulation, high-throughput genomic sequence analysis of C4 variants has been impeded by the high degree of sequence similarity and complex genetic variation exhibited by these genes. To investigate C4 variation using genomic sequencing data, we have developed a novel bioinformatic pipeline for comprehensive, high-throughput characterization of human C4 sequence from short-read sequencing data, named C4Investigator. Using paired-end targeted or whole genome sequence data as input, C4Investigator determines gene copy number for overall C4, C4A, C4B, C4(Rodger), C4(Ch), C4(L), and C4(S), additionally, C4Ivestigator reports the full overall C4 aligned sequence, enabling nucleotide level analysis of C4. To demonstrate the utility of this workflow we have analyzed C4 variation in the 1000 Genomes Project Dataset, showing that the C4 genes are highly poly-allelic with many variants that have the potential to impact C4 protein function.

16.
BMC Bioinformatics ; 24(1): 218, 2023 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-37254048

RESUMO

BACKGROUND: Viral genomics and epidemiology have been increasingly important tools for analysing the spread of key pathogens affecting daily lives of individuals worldwide. With the rapidly expanding scale of pathogen genome sequencing efforts for epidemics and outbreaks efficient workflows in extracting genomic information are becoming increasingly important for answering key research questions. RESULTS: Here we present Genofunc, a toolkit offering a range of command line orientated functions for processing of raw virus genome sequences into aligned and annotated data ready for analysis. The tool contains functions such as genome annotation, feature extraction etc. for processing of large genomic datasets both manual or as part of pipeline such as Snakemake or Nextflow ready for down-stream phylogenetic analysis. Originally designed for a large-scale HIV sequencing project, Genofunc has been benchmarked against annotated sequence gene coordinates from the Los Alamos HIV database as validation with downstream phylogenetic analysis result comparable to past literature as case study. CONCLUSION: Genofunc is implemented fully in Python and licensed under the MIT license. Source code and documentation is available at: https://github.com/xiaoyu518/genofunc .


Assuntos
Genômica , Infecções por HIV , Humanos , Filogenia , Genoma Viral , Mapeamento Cromossômico , Software
17.
BMC Bioinformatics ; 24(1): 135, 2023 Apr 05.
Artigo em Inglês | MEDLINE | ID: mdl-37020193

RESUMO

BACKGROUND: Population structure and cryptic relatedness between individuals (samples) are two major factors affecting false positives in genome-wide association studies (GWAS). In addition, population stratification and genetic relatedness in genomic selection in animal and plant breeding can affect prediction accuracy. The methods commonly used for solving these problems are principal component analysis (to adjust for population stratification) and marker-based kinship estimates (to correct for the confounding effects of genetic relatedness). Currently, many tools and software are available that analyze genetic variation among individuals to determine population structure and genetic relationships. However, none of these tools or pipelines perform such analyses in a single workflow and visualize all the various results in a single interactive web application. RESULTS: We developed PSReliP, a standalone, freely available pipeline for the analysis and visualization of population structure and relatedness between individuals in a user-specified genetic variant dataset. The analysis stage of PSReliP is responsible for executing all steps of data filtering and analysis and contains an ordered sequence of commands from PLINK, a whole-genome association analysis toolset, along with in-house shell scripts and Perl programs that support data pipelining. The visualization stage is provided by Shiny apps, an R-based interactive web application. In this study, we describe the characteristics and features of PSReliP and demonstrate how it can be applied to real genome-wide genetic variant data. CONCLUSIONS: The PSReliP pipeline allows users to quickly analyze genetic variants such as single nucleotide polymorphisms and small insertions or deletions at the genome level to estimate population structure and cryptic relatedness using PLINK software and to visualize the analysis results in interactive tables, plots, and charts using Shiny technology. The analysis and assessment of population stratification and genetic relatedness can aid in choosing an appropriate approach for the statistical analysis of GWAS data and predictions in genomic selection. The various outputs from PLINK can be used for further downstream analysis. The code and manual for PSReliP are available at https://github.com/solelena/PSReliP .


Assuntos
Estudo de Associação Genômica Ampla , Software , Animais , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Genoma , Fluxo de Trabalho
18.
Methods Mol Biol ; 2621: 27-37, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37041438

RESUMO

Clinically relevant sequencing methodologies continue to expand in number, diversity, complexity, and scale. This evolving and varied landscape requires unique implementations in all aspects of the assay, including the wet bench, bioinformatics, and reporting. Following implementation, the informatics of many of these tests continue to change over time, from software and annotation source updates, guidelines, and knowledgebase changes to changes in underlying information technology (IT) infrastructure. Key principles can be applied when implementing the informatics of a new clinical test which can greatly improve the lab's ability to deal with these updates rapidly and reliably. In this chapter, we discuss a variety of informatics issues which span all NGS applications. In particular, there is the need for implementing a reliable, repeatable, redundant, and version-controlled bioinformatics pipeline and architecture and a discussion of common methodologies to address these needs.


Assuntos
Biologia Computacional , Informática , Biologia Computacional/métodos , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodos
19.
J Inherit Metab Dis ; 46(2): 206-219, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36752951

RESUMO

Oligosaccharidoses, sphingolipidoses and mucolipidoses are lysosomal storage disorders (LSDs) in which defective breakdown of glycan-side chains of glycosylated proteins and glycolipids leads to the accumulation of incompletely degraded oligosaccharides within lysosomes. In metabolic laboratories, these disorders are commonly diagnosed by thin-layer chromatography (TLC) but more recently also mass spectrometry-based approaches have been published. To expand the possibilities to screen for these diseases, we developed an ultra-high-performance liquid chromatography (UHPLC) with a high-resolution accurate mass (HRAM) mass spectrometry (MS) screening platform, together with an open-source iterative bioinformatics pipeline. This pipeline generates comprehensive biomarker profiles and allows for extensive quality control (QC) monitoring. Using this platform, we were able to identify α-mannosidosis, ß-mannosidosis, α-N-acetylgalactosaminidase deficiency, sialidosis, galactosialidosis, fucosidosis, aspartylglucosaminuria, GM1 gangliosidosis, GM2 gangliosidosis (M. Sandhoff) and mucolipidosis II/III in patient samples. Aberrant urinary oligosaccharide excretions were also detected for other disorders, including NGLY1 congenital disorder of deglycosylation, sialic acid storage disease, MPS type IV B and GSD II (Pompe disease). For the latter disorder, we identified heptahexose (Hex7), as a potential urinary biomarker, in addition to glucose tetrasaccharide (Glc4), for the diagnosis and monitoring of young onset cases of Pompe disease. Occasionally, so-called "neonate" biomarker profiles were observed in young patients, which were probably due to nutrition. Our UHPLC/HRAM-MS screening platform can easily be adopted in biochemical laboratories and allows for simple and robust screening and straightforward interpretation of the screening results to detect disorders in which aberrant oligosaccharides accumulate.


Assuntos
Doença de Depósito de Glicogênio Tipo II , Doenças por Armazenamento dos Lisossomos , Mucolipidoses , Mucopolissacaridose IV , Humanos , Cromatografia Líquida de Alta Pressão/métodos , Doença de Depósito de Glicogênio Tipo II/diagnóstico , Doenças por Armazenamento dos Lisossomos/diagnóstico , Mucolipidoses/diagnóstico , Espectrometria de Massas em Tandem/métodos , Oligossacarídeos/química
20.
Trop Med Int Health ; 28(3): 186-193, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36599816

RESUMO

OBJECTIVES: Low-capital-layout sequencing options from Oxford Nanopore Technologies (ONT) could assist in expanding HIV drug resistance testing to resource-limited settings. HIV drug resistance mutations often occur as mixtures, but current ONT pipelines provide a consensus sequence only. Moreover, there is no integrated pipeline that provides a drug resistance report from an ONT sequence file without intervention from skilled bioinformaticists. We therefore investigated Nano-RECall, which provides seamless drug resistance interpretation while requiring low-read coverage ONT sequence data from affordable Flongle or MinION flow cells and which provides mutation mixtures similar to Sanger Sequencing. METHODS: We compared Sanger sequencing to ONT sequencing of the same HIV-1 subtype C polymerase chain reaction (PCR) amplicons, respectively using RECall and the novel Nano-RECall bioinformatics pipelines. Amplicons were from separate assays: (a) Applied Biosystems HIV-1 Genotyping Kit (ThermoFisher) spanning protease (PR) to reverse transcriptase (RT) (PR-RT) (n = 46) and (b) homebrew integrase (IN) (n = 21). The agreement between Sanger sequences and ONT sequences was assessed at nucleotide level, and at codon level for Stanford HIV drug resistance database mutations at an optimal ONT read depth of 400 reads only. RESULTS: The average sequence similarity between ONT and Sanger sequences was 99.3% (95% CI: 99.1%-99.4%) for PR-RT and 99.6% (95% CI: 99.4%-99.7%) for INT. Drug resistance mutations did not differ for 21 IN specimens; 8 mutations were detected by both ONT- and Sanger sequencing. For the 46 PR and RT specimens, 245 mutations were detected by either ONT or Sanger, of these 238 (97.1%) were detected by both. CONCLUSIONS: The Nano-RECall pipeline, freely available as a downloadable application on a Windows computer, provides Sanger-equivalent HIV drug resistance interpretation. This novel pipeline combined with a simple workflow and multiplexing samples on ONT flow-cells would contribute to making HIV drug resistance sequencing feasible for resource-limited settings.


Assuntos
Farmacorresistência Viral , Infecções por HIV , HIV-1 , Sequenciamento por Nanoporos , Humanos , Infecções por HIV/tratamento farmacológico , Soropositividade para HIV/diagnóstico , Soropositividade para HIV/terapia , HIV-1/genética , Mutação , Farmacorresistência Viral/genética , Sequenciamento por Nanoporos/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA