Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 11 de 11
Filtrer
Plus de filtres











Base de données
Gamme d'année
1.
BMC Genomics ; 23(Suppl 3): 445, 2022 Dec 29.
Article de Anglais | MEDLINE | ID: mdl-36581824

RÉSUMÉ

BACKGROUND: Bacterial genotyping is a crucial process in outbreak investigation and epidemiological studies. Several typing methods such as pulsed-field gel electrophoresis, multilocus sequence typing (MLST) and whole genome sequencing are currently used in routine clinical practice. However, these methods are costly, time-consuming and have high computational demands. An alternative to these methods is mini-MLST, a quick, cost-effective and robust method based on high-resolution melting analysis. Nevertheless, no standardized approach to identify markers suitable for mini-MLST exists. Here, we present a pipeline for variable fragment detection in unmapped reads based on a modified hybrid assembly approach using data from one sequencing platform. RESULTS: In routine assembly against the reference sequence, high variable reads are not aligned and remain unmapped. If de novo assembly of them is performed, variable genomic regions can be located in created scaffolds. Based on the variability rates calculation, it is possible to find a highly variable region with the same discriminatory power as seven housekeeping gene fragments used in MLST. In the work presented here, we show the capability of identifying one variable fragment in de novo assembled scaffolds of 21 Escherichia coli genomes and three variable regions in scaffolds of 31 Klebsiella pneumoniae genomes. For each identified fragment, the melting temperatures are calculated based on the nearest neighbor method to verify the mini-MLST's discriminatory power. CONCLUSIONS: A pipeline for a modified hybrid assembly approach consisting of reference-based mapping and de novo assembly of unmapped reads is presented. This approach can be employed for the identification of highly variable genomic fragments in unmapped reads. The identified variable regions can then be used in efficient laboratory methods for bacterial typing such as mini-MLST with high discriminatory power, fully replacing expensive methods such as MLST. The results can and will be delivered in a shorter time, which allows immediate and fast infection monitoring in clinical practice.


Sujet(s)
Bactéries , Génome , Typage par séquençage multilocus/méthodes , Génotype , Bactéries/génétique , Techniques de typage bactérien/méthodes , Escherichia coli/génétique
2.
J Pers Med ; 12(6)2022 Jun 07.
Article de Anglais | MEDLINE | ID: mdl-35743724

RÉSUMÉ

Based on several reports that indicate the presence of blood microbiota in patients with diseases, we became interested in identifying the presence of bacteria in the blood of healthy individuals. Using 37 samples from 5 families, we extracted sequences that were not mapped to the human reference genome and mapped them to the bacterial reference genome for characterization. Proteobacteria account for more than 95% of the blood microbiota. The results of clustering by means of principal component analysis showed similar patterns for each age group. We observed that the class Gammaproteobacteria was significantly higher in the elderly group (over 60 years old), whereas the arcsine square root-transformed relative abundance of the classes Alphaproteobacteria, Deltaproteobacteria, and Clostridia was significantly lower (p < 0.05). In addition, the diversity among the groups showed a significant difference (p < 0.05) in the elderly group. This result provides meaningful evidence of a consistent phenomenon that chronic diseases associated with aging are accompanied by metabolic endotoxemia and chronic inflammation.

3.
BMC Genomics ; 22(1): 857, 2021 Nov 27.
Article de Anglais | MEDLINE | ID: mdl-34837950

RÉSUMÉ

BACKGROUND: As a powerful tool, RNA-Seq has been widely used in various studies. Usually, unmapped RNA-seq reads have been considered as useless and been trashed or ignored. RESULTS: We develop a strategy to mining the full length sequence by unmapped reads combining with specific reverse transcription primers design and high throughput sequencing. In this study, we salvage 36 unmapped reads from standard RNA-Seq data and randomly select one 149 bp read as a model. Specific reverse transcription primers are designed to amplify its both ends, followed by next generation sequencing. Then we design a statistical model based on power law distribution to estimate its integrality and significance. Further, we validate it by Sanger sequencing. The result shows that the full length is 1556 bp, with insertion mutations in microsatellite structure. CONCLUSION: We believe this method would be a useful strategy to extract the sequences information from the unmapped RNA-seq data. Further, it is an alternative way to get the full length sequence of unknown cDNA.


Sujet(s)
Séquençage nucléotidique à haut débit , ADN complémentaire , RNA-Seq , Analyse de séquence d'ARN ,
4.
Genomics ; 113(1 Pt 2): 1189-1198, 2021 01.
Article de Anglais | MEDLINE | ID: mdl-33301893

RÉSUMÉ

Numerous viral sequences have been reported in the whole-genome sequencing (WGS) data of human blood. However, it is not clear to what degree the virus-mappable reads represent true viral sequences rather than random-mapping or noise originating from sample preparation, sequencing processes, or other sources. Identification of patterns of virus-mappable reads may generate novel indicators for evaluating the origins of these viral sequences. We characterized paired-end unmapped reads and reads aligned to viral references in human WGS datasets, then compared patterns of the virus-mappable reads among DNA sources and sequencing facilities which produced these datasets. We then examined potential origins of the source- and facility-associated viral reads. The proportions of clean unmapped reads among the seven sequencing facilities were significantly different (P < 2 × 10-16). We identified 260,339 reads that were mappable to a total of 99 viral references in 2535 samples. The majority (86.7%) of these virus-mappable reads (corresponding to 47 viral references), which can be classified into four groups based on their distinct patterns, were strongly associated with sequencing facility or DNA source (adjusted P value <0.01). Possible origins of these reads include artificial sequences in library preparation, recombinant vectors in cell culture, and phages co-contaminated with their host bacteria. The sequencing facility-associated virus-mappable reads and patterns were repeatedly observed in other datasets produced in the same facilities. We have constructed an analytic framework and profiled the unmapped reads mappable to viral references. The results provide a new understanding of sequencing facility- and DNA source-associated batch effects in deep sequencing data and may facilitate improved bioinformatics filtering of reads.


Sujet(s)
Services de laboratoire d'analyses médicales/normes , Gènes viraux , Génome humain , Séquençage du génome entier/normes , Sang/virologie , Contamination par de l'ADN , Humains , Métagénome , Rapport signal-bruit , Virome , Séquençage du génome entier/méthodes
5.
Brief Bioinform ; 21(2): 676-686, 2020 03 23.
Article de Anglais | MEDLINE | ID: mdl-30815667

RÉSUMÉ

A widely used approach in transcriptome analysis is the alignment of short reads to a reference genome. However, owing to the deficiencies of specially designed analytical systems, short reads unmapped to the genome sequence are usually ignored, resulting in the loss of significant biological information and insights. To fill this gap, we present Comprehensive Assembly and Functional annotation of Unmapped RNA-Seq data (CAFU), a Galaxy-based framework that can facilitate the large-scale analysis of unmapped RNA sequencing (RNA-Seq) reads from single- and mixed-species samples. By taking advantage of machine learning techniques, CAFU addresses the issue of accurately identifying the species origin of transcripts assembled using unmapped reads from mixed-species samples. CAFU also represents an innovation in that it provides a comprehensive collection of functions required for transcript confidence evaluation, coding potential calculation, sequence and expression characterization and function annotation. These functions and their dependencies have been integrated into a Galaxy framework that provides access to CAFU via a user-friendly interface, dramatically simplifying complex exploration tasks involving unmapped RNA-Seq reads. CAFU has been validated with RNA-Seq data sets from wheat and Zea mays (maize) samples. CAFU is freely available via GitHub: https://github.com/cma2015/CAFU.


Sujet(s)
Biologie informatique/méthodes , Analyse de séquence d'ARN/méthodes , Gènes de plante , Humains , ARN messager/génétique , Triticum/génétique , Interface utilisateur , Zea mays/génétique
6.
BMC Bioinformatics ; 20(Suppl 4): 168, 2019 Apr 18.
Article de Anglais | MEDLINE | ID: mdl-30999839

RÉSUMÉ

BACKGROUND: Next Generation Sequencing (NGS) experiments produce millions of short sequences that, mapped to a reference genome, provide biological insights at genomic, transcriptomic and epigenomic level. Typically the amount of reads that correctly maps to the reference genome ranges between 70% and 90%, leaving in some cases a consistent fraction of unmapped sequences. This 'misalignment' can be ascribed to low quality bases or sequence differences between the sample reads and the reference genome. Investigating the source of the unmapped reads is definitely important to better assess the quality of the whole experiment and to check for possible downstream or upstream 'contamination' from exogenous nucleic acids. RESULTS: Here we propose DecontaMiner, a tool to unravel the presence of contaminating sequences among the unmapped reads. It uses a subtraction approach to identify bacteria, fungi and viruses genome contamination. DecontaMiner generates several output files to track all the processed reads, and to provide a complete report of their characteristics. The good quality matches on microorganism genomes are counted and compared among samples. DecontaMiner builds an offline HTML page containing summary statistics and plots. The latter are obtained using the state-of-the-art D3 javascript libraries. DecontaMiner has been mainly used to detect contamination in human RNA-Seq data. The software is freely available at http://www-labgtp.na.icar.cnr.it/decontaminer . CONCLUSIONS: DecontaMiner is a tool designed and developed to investigate the presence of contaminating sequences in unmapped NGS data. It can suggest the presence of contaminating organisms in sequenced samples, that might derive either from laboratory contamination or from their biological source, and in both cases can be considered as worthy of further investigation and experimental validation. The novelty of DecontaMiner is mainly represented by its easy integration with the standard procedures of NGS data analysis, while providing a complete, reliable, and automatic pipeline.


Sujet(s)
Contamination par de l'ADN , Séquençage nucléotidique à haut débit/méthodes , Bactéries/génétique , Champignons/génétique , Humains , Logiciel , Virus/génétique
7.
Front Genet ; 10: 213, 2019.
Article de Anglais | MEDLINE | ID: mdl-30930939

RÉSUMÉ

Colorectal cancer is the third most common cancer worldwide with abysmal survival, thus requiring novel therapy strategies. Numerous studies have frequently observed infiltrating bacteria within the primary tumor tissues derived from patients. These studies have implicated the relative abundance of these bacteria as a contributing factor in tumor progression. Infiltrating bacteria are believed to be among the major drivers of tumorigenesis, progression, and metastasis and, hence, promising targets for new treatments. However, measuring their abundance directly remains challenging. One potential approach is to use the unmapped reads of host whole genome sequencing (hWGS) data, which previous studies have considered as contaminants and discarded. Here, we developed rigorous bioinformatics and statistical procedures to identify tumor-infiltrating bacteria associated with colorectal cancer from such whole genome sequencing data. Our approach used the reads of whole genome sequencing data of colon adenocarcinoma tissues not mapped to the human reference genome, including unmapped paired-end read pairs and single-end reads, the mates of which were mapped. We assembled the unmapped read pairs, remapped all those reads to the collection of human microbiome reference, and then computed their relative abundance of microbes by maximum likelihood (ML) estimation. We analyzed and compared the relative abundance and diversity of infiltrating bacteria between primary tumor tissues and associated normal blood samples. Our results showed that primary tumor tissues contained far more diverse total infiltrating bacteria than normal blood samples. The relative abundance of Bacteroides fragilis, Bacteroides dorei, and Fusobacterium nucleatum was significantly higher in primary colorectal tumors. These three bacteria were among the top ten microbes in the primary tumor tissues, yet were rarely found in normal blood samples. As a validation step, most of these bacteria were also closely associated with colorectal cancer in previous studies with alternative approaches. In summary, our approach provides a new analytic technique for investigating the infiltrating bacterial community within tumor tissues. Our novel cloud-based bioinformatics and statistical pipelines to analyze the infiltrating bacteria in colorectal tumors using the unmapped reads of whole genome sequences can be freely accessed from GitHub at https://github.com/gutmicrobes/UMIB.git.

8.
BMC Genomics ; 20(1): 19, 2019 Jan 08.
Article de Anglais | MEDLINE | ID: mdl-30621573

RÉSUMÉ

BACKGROUND: A widely used approach in next-generation sequencing projects is the alignment of reads to a reference genome. Despite methodological and hardware improvements which have enhanced the efficiency and accuracy of alignments, a significant percentage of reads frequently remain unmapped. Usually, unmapped reads are discarded from the analysis process, but significant biological information and insights can be uncovered from these data. We explored the unmapped DNA (normal and bisulfite treated) and RNA sequence reads of the great tit (Parus major) reference genome individual. From the unmapped reads we generated de novo assemblies, after which the generated sequence contigs were aligned to the NCBI non-redundant nucleotide database using BLAST, identifying the closest known matching sequence. RESULTS: Many of the aligned contigs showed sequence similarity to different bird species and genes that were absent in the great tit reference assembly. Furthermore, there were also contigs that represented known P. major pathogenic species. Most interesting were several species of blood parasites such as Plasmodium and Trypanosoma. CONCLUSIONS: Our analyses revealed that meaningful biological information can be found when further exploring unmapped reads. For instance, it is possible to discover sequences that are either absent or misassembled in the reference genome, and sequences that indicate infection or sample contamination. In this study we also propose strategies to aid the capture and interpretation of this information from unmapped reads.


Sujet(s)
ADN/génétique , Génome/génétique , ARN/génétique , Oiseaux chanteurs/génétique , Animaux , Génomique , Séquençage nucléotidique à haut débit , Alignement de séquences
9.
Front Microbiol ; 9: 3266, 2018.
Article de Anglais | MEDLINE | ID: mdl-30705670

RÉSUMÉ

The term microbiome describes the genetic material encoding the various microbial populations that inhabit our body. Whilst colonization of various body niches (e.g., the gut) by dynamic communities of microorganisms is now universally accepted, the existence of microbial populations in other "classically sterile" locations, including the blood, is a relatively new concept. The presence of bacteria-specific DNA in the blood has been reported in the literature for some time, yet the true origin of this is still the subject of much deliberation. The aim of this study was to investigate the phenomenon of a "blood microbiome" by providing a comprehensive description of bacterially derived nucleic acids using a range of complementary molecular and classical microbiological techniques. For this purpose we utilized a set of plasma samples from healthy subjects (n = 5) and asthmatic subjects (n = 5). DNA-level analyses involved the amplification and sequencing of the 16S rRNA gene. RNA-level analyses were based upon the de novo assembly of unmapped mRNA reads and subsequent taxonomic identification. Molecular studies were complemented by viability data from classical aerobic and anaerobic microbial culture experiments. At the phylum level, the blood microbiome was predominated by Proteobacteria, Actinobacteria, Firmicutes, and Bacteroidetes. The key phyla detected were consistent irrespective of molecular method (DNA vs. RNA), and consistent with the results of other published studies. In silico comparison of our data with that of the Human Microbiome Project revealed that members of the blood microbiome were most likely to have originated from the oral or skin communities. To our surprise, aerobic and anaerobic cultures were positive in eight of out the ten donor samples investigated, and we reflect upon their source. Our data provide further evidence of a core blood microbiome, and provide insight into the potential source of the bacterial DNA/RNA detected in the blood. Further, data reveal the importance of robust experimental procedures, and identify areas for future consideration.

10.
Genomics ; 109(1): 36-42, 2017 01.
Article de Anglais | MEDLINE | ID: mdl-27913251

RÉSUMÉ

Usually, reads from transcriptome sequencing data unmapped to the target species' reference genome are disregarded. A recent RNAseq project on the new fatal disease Bovine Neonatal Pancytopenia had indicated an unexplained immune response signature to a double-stranded RNA virus. To unravel its background, contigs were de novo assembled from unmapped RNAseq reads and aligned against the bovine genome assemblies and multispecies NCBI databases. Lack of genuine virus sequence contigs rejected the hypothesis of a live virus being causal for the unexplained immune response. Alignment data also demonstrated incomplete bovine reference genome assemblies. In addition, we found that several parasite and virus genome reference assemblies in NCBI were contaminated with bovine DNA and confirmed recombination of bovine DNA into BVD virus strains. Exploring unmapped reads can extract useful biological information regarding the presence of microorganisms and can highlight issues with reference genome assemblies of host and pathogen species.


Sujet(s)
Bovins/génétique , Génome , Séquençage nucléotidique à haut débit/normes , Analyse de séquence d'ARN/normes , Animaux , Bovins/microbiologie , Bovins/parasitologie , Bovins/virologie , Biologie informatique , Femelle
11.
Genomics ; 104(6 Pt B): 453-8, 2014 Dec.
Article de Anglais | MEDLINE | ID: mdl-25173571

RÉSUMÉ

Several studies have demonstrated that unmapped reads in next generation sequencing data could be used to identify infectious agents or structural variants, but there has been no intensive effort to analyze and classify all non-human sequences found in individual large data sets. To identify commonality in non-human sequences by infectious agents and putative contamination events, we analyzed non-human sequences in 150 genomic sequencing data files from the 1000 Genomes Project and observed that 0.13% of reads on average showed similarities to non-human genomes. We compared results among different sample groups divided based on ethnicities, sequencing centers and enrichment methods (whole genome sequencing vs. exome sequencing) and found that sequencing centers had specific signatures of contaminating genomes as 'time stamps'. We also observed many unmapped reads that falsely indicated contamination because of the high similarity of human sequences to sequences in non-human genome assemblies such as mouse and Nicotiana.


Sujet(s)
Contamination par de l'ADN , Génome humain , ADN bactérien/composition chimique , ADN des plantes/composition chimique , ADN viral/composition chimique , Humains
SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE