Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 55
Filtrar
1.
Sci Adv ; 10(16): eadk4825, 2024 Apr 19.
Artículo en Inglés | MEDLINE | ID: mdl-38630812

RESUMEN

The ability of epithelial monolayers to self-organize into a dynamic polarized state, where cells migrate in a uniform direction, is essential for tissue regeneration, development, and tumor progression. However, the mechanisms governing long-range polar ordering of motility direction in biological tissues remain unclear. Here, we investigate the self-organizing behavior of quiescent epithelial monolayers that transit to a dynamic state with long-range polar order upon growth factor exposure. We demonstrate that the heightened self-propelled activity of monolayer cells leads to formation of vortex-antivortex pairs that undergo sequential annihilation, ultimately driving the spread of long-range polar order throughout the system. A computational model, which treats the monolayer as an active elastic solid, accurately replicates this behavior, and weakening of cell-to-cell interactions impedes vortex-antivortex annihilation and polar ordering. Our findings uncover a mechanism in epithelia, where elastic solid material characteristics, activated self-propulsion, and topology-mediated guidance converge to fuel a highly efficient polar self-ordering activity.


Asunto(s)
Comunicación Celular , Movimiento Celular , Epitelio
2.
Nat Commun ; 15(1): 1791, 2024 Feb 29.
Artículo en Inglés | MEDLINE | ID: mdl-38424056

RESUMEN

Stool samples for fecal immunochemical tests (FIT) are collected in large numbers worldwide as part of colorectal cancer screening programs. Employing FIT samples from 1034 CRCbiome participants, recruited from a Norwegian colorectal cancer screening study, we identify, annotate and characterize more than 18000 DNA viruses, using shotgun metagenome sequencing. Only six percent of them are assigned to a known taxonomic family, with Microviridae being the most prevalent viral family. Linking individual profiles to comprehensive lifestyle and demographic data shows 17/25 of the variables to be associated with the gut virome. Physical activity, smoking, and dietary fiber consumption exhibit strong and consistent associations with both diversity and relative abundance of individual viruses, as well as with enrichment for auxiliary metabolic genes. We demonstrate the suitability of FIT samples for virome analysis, opening an opportunity for large-scale studies of this enigmatic part of the gut microbiome. The diverse viral populations and their connections to the individual lifestyle uncovered herein paves the way for further exploration of the role of the gut virome in health and disease.


Asunto(s)
Neoplasias Colorrectales , Virus , Humanos , Viroma , Virus ADN/genética , Virus/genética , ADN , Neoplasias Colorrectales/diagnóstico , Neoplasias Colorrectales/genética
3.
BMC Bioinformatics ; 24(1): 371, 2023 Oct 02.
Artículo en Inglés | MEDLINE | ID: mdl-37784008

RESUMEN

BACKGROUND: Shotgun metagenome sequencing data obtained from a host environment will usually be contaminated with sequences from the host organism. Host sequences should be removed before further analysis to avoid biases, reduce downstream computational load, or ensure privacy in the case of a human host. The tools that we identified, as designed specifically to perform host contamination sequence removal, were either outdated, not maintained, or complicated to use. Consequently, we have developed HoCoRT, a fast and user-friendly tool that implements several methods for optimised host sequence removal. We have evaluated the speed and accuracy of these methods. RESULTS: HoCoRT is an open-source command-line tool for host contamination removal. It is designed to be easy to install and use, offering a one-step option for genome indexing. HoCoRT employs a variety of well-known mapping, classification, and alignment methods to classify reads. The user can select the underlying classification method and its parameters, allowing adaptation to different scenarios. Based on our investigation of various methods and parameters using synthetic human gut and oral microbiomes, and on assessment of publicly available data, we provide recommendations for typical datasets with short and long reads. CONCLUSIONS: To decontaminate a human gut microbiome with short reads using HoCoRT, we found the optimal combination of speed and accuracy with BioBloom, Bowtie2 in end-to-end mode, and HISAT2. Kraken2 consistently demonstrated the highest speed, albeit with a trade-off in accuracy. The same applies to an oral microbiome, but here Bowtie2 was notably slower than the other tools. For long reads, the detection of human host reads is more difficult. In this case, a combination of Kraken2 and Minimap2 achieved the highest accuracy and detected 59% of human reads. In comparison to the dedicated DeconSeq tool, HoCoRT using Bowtie2 in end-to-end mode proved considerably faster and slightly more accurate. HoCoRT is available as a Bioconda package, and the source code can be accessed at https://github.com/ignasrum/hocort along with the documentation. It is released under the MIT licence and is compatible with Linux and macOS (except for the BioBloom module).


Asunto(s)
Microbiota , Programas Informáticos , Humanos , Metagenoma , Análisis de Secuencia de ADN/métodos , Microbiota/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
4.
PLoS One ; 18(7): e0286330, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37467208

RESUMEN

Many high-throughput sequencing datasets can be represented as objects with coordinates along a reference genome. Currently, biological investigations often involve a large number of such datasets, for example representing different cell types or epigenetic factors. Drawing overall conclusions from a large collection of results for individual datasets may be challenging and time-consuming. Meaningful interpretation often requires the results to be aggregated according to metadata that represents biological characteristics of interest. In this light, we here propose the hierarchical Genomic Suite HyperBrowser (hGSuite), an open-source extension to the GSuite HyperBrowser platform, which aims to provide a means for extracting key results from an aggregated collection of high-throughput DNA sequencing data. The hGSuite utilizes a metadata-informed data cube to calculate various statistics across the multiple dimensions of the datasets. With this work, we show that the hGSuite and its associated data cube methodology offers a quick and accessible way for exploratory analysis of large genomic datasets. The web-based toolkit named hGsuite Hyperbrowser is available at https://hyperbrowser.uio.no/hgsuite under a GPLv3 license.


Asunto(s)
Metadatos , Programas Informáticos , Genómica/métodos , Genoma , Internet
5.
Genome Biol ; 23(1): 247, 2022 11 30.
Artículo en Inglés | MEDLINE | ID: mdl-36451166

RESUMEN

DNA loop extrusion emerges as a key process establishing genome structure and function. We introduce MoDLE, a computational tool for fast, stochastic modeling of molecular contacts from DNA loop extrusion capable of simulating realistic contact patterns genome wide in a few minutes. MoDLE accurately simulates contact maps in concordance with existing molecular dynamics approaches and with Micro-C data and does so orders of magnitude faster than existing approaches. MoDLE runs efficiently on machines ranging from laptops to high performance computing clusters and opens up for exploratory and predictive modeling of 3D genome structure in a wide range of settings.


Asunto(s)
ADN
6.
Bioinformatics ; 38(17): 4230-4232, 2022 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-35852318

RESUMEN

MOTIVATION: Adaptive immune receptor (AIR) repertoires (AIRRs) record past immune encounters with exquisite specificity. Therefore, identifying identical or similar AIR sequences across individuals is a key step in AIRR analysis for revealing convergent immune response patterns that may be exploited for diagnostics and therapy. Existing methods for quantifying AIRR overlap scale poorly with increasing dataset numbers and sizes. To address this limitation, we developed CompAIRR, which enables ultra-fast computation of AIRR overlap, based on either exact or approximate sequence matching. RESULTS: CompAIRR improves computational speed 1000-fold relative to the state of the art and uses only one-third of the memory: on the same machine, the exact pairwise AIRR overlap of 104 AIRRs with 105 sequences is found in ∼17 min, while the fastest alternative tool requires 10 days. CompAIRR has been integrated with the machine learning ecosystem immuneML to speed up commonly used AIRR-based machine learning applications. AVAILABILITY AND IMPLEMENTATION: CompAIRR code and documentation are available at https://github.com/uio-bmi/compairr. Docker images are available at https://hub.docker.com/r/torognes/compairr. The code to replicate the synthetic datasets, scripts for benchmarking and creating figures, and all raw data underlying the figures are available at https://github.com/uio-bmi/compairr-benchmarking. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Ecosistema , Programas Informáticos , Humanos , Aprendizaje Automático , Benchmarking
7.
Scand J Immunol ; 94(1): e13050, 2021 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-34643957

RESUMEN

C-type lectin-like domain family 16 member A (CLEC16A) is associated with autoimmune disorders, including multiple sclerosis (MS), but its functional relevance is not completely understood. CLEC16A is expressed in several immune cells, where it affects autophagic processes and receptor expression. Recently, we reported that the risk genotype of an MS-associated single nucleotide polymorphism in CLEC16A intron 19 is associated with higher expression of CLEC16A in CD4+ T cells. Here, we show that CLEC16A expression is induced in CD4+ T cells upon T cell activation. By the use of imaging flow cytometry and confocal microscopy, we demonstrate that CLEC16A is located in Rab4a-positive recycling endosomes in Jurkat TAg T cells. CLEC16A knock-down in Jurkat cells resulted in lower cell surface expression of the T cell receptor, however, this did not have a major impact on T cell activation response in vitro in Jurkat nor in human, primary CD4+ T cells.


Asunto(s)
Linfocitos T CD4-Positivos/inmunología , Predisposición Genética a la Enfermedad/genética , Lectinas Tipo C/genética , Proteínas de Transporte de Monosacáridos/genética , Esclerosis Múltiple/genética , Receptores de Antígenos de Linfocitos T/biosíntesis , Proteínas de Unión al GTP rab4/metabolismo , Línea Celular Tumoral , Endosomas/metabolismo , Citometría de Flujo , Humanos , Células Jurkat , Activación de Linfocitos/inmunología , Microscopía Confocal , Esclerosis Múltiple/inmunología , Polimorfismo de Nucleótido Simple/genética
8.
BMC Cancer ; 21(1): 930, 2021 Aug 18.
Artículo en Inglés | MEDLINE | ID: mdl-34407780

RESUMEN

BACKGROUND: Colorectal cancer (CRC) screening reduces CRC incidence and mortality. However, current screening methods are either hampered by invasiveness or suboptimal performance, limiting their effectiveness as primary screening methods. To aid in the development of a non-invasive screening test with improved sensitivity and specificity, we have initiated a prospective biomarker study (CRCbiome), nested within a large randomized CRC screening trial in Norway. We aim to develop a microbiome-based classification algorithm to identify advanced colorectal lesions in screening participants testing positive for an immunochemical fecal occult blood test (FIT). We will also examine interactions with host factors, diet, lifestyle and prescription drugs. The prospective nature of the study also enables the analysis of changes in the gut microbiome following the removal of precancerous lesions. METHODS: The CRCbiome study recruits participants enrolled in the Bowel Cancer Screening in Norway (BCSN) study, a randomized trial initiated in 2012 comparing once-only sigmoidoscopy to repeated biennial FIT, where women and men aged 50-74 years at study entry are invited to participate. Since 2017, participants randomized to FIT screening with a positive test result have been invited to join the CRCbiome study. Self-reported diet, lifestyle and demographic data are collected prior to colonoscopy after the positive FIT-test (baseline). Screening data, including colonoscopy findings are obtained from the BCSN database. Fecal samples for gut microbiome analyses are collected both before and 2 and 12 months after colonoscopy. Samples are analyzed using metagenome sequencing, with taxonomy profiles, and gene and pathway content as primary measures. CRCbiome data will also be linked to national registries to obtain information on prescription histories and cancer relevant outcomes occurring during the 10 year follow-up period. DISCUSSION: The CRCbiome study will increase our understanding of how the gut microbiome, in combination with lifestyle and environmental factors, influences the early stages of colorectal carcinogenesis. This knowledge will be crucial to develop microbiome-based screening tools for CRC. By evaluating biomarker performance in a screening setting, using samples from the target population, the generalizability of the findings to future screening cohorts is likely to be high. TRIAL REGISTRATION: ClinicalTrials.gov Identifier: NCT01538550 .


Asunto(s)
Neoplasias Colorrectales/diagnóstico , Detección Precoz del Cáncer/métodos , Microbioma Gastrointestinal , Estilo de Vida , Anciano , Estudios de Casos y Controles , Colonoscopía , Neoplasias Colorrectales/epidemiología , Neoplasias Colorrectales/microbiología , Femenino , Estudios de Seguimiento , Humanos , Incidencia , Masculino , Persona de Mediana Edad , Noruega/epidemiología , Sangre Oculta , Pronóstico , Estudios Prospectivos , Curva ROC
9.
Bioinformatics ; 38(1): 267-269, 2021 12 22.
Artículo en Inglés | MEDLINE | ID: mdl-34244702

RESUMEN

MOTIVATION: Previously we presented swarm, an open-source amplicon clustering programme that produces fine-scale molecular operational taxonomic units (OTUs) that are free of arbitrary global clustering thresholds. Here, we present swarm v3 to address issues of contemporary datasets that are growing towards tera-byte sizes. RESULTS: When compared with previous swarm versions, swarm v3 has modernized C++ source code, reduced memory footprint by up to 50%, optimized CPU-usage and multithreading (more than 7 times faster with default parameters), and it has been extensively tested for its robustness and logic. AVAILABILITY AND IMPLEMENTATION: Source code and binaries are available at https://github.com/torognes/swarm. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Programas Informáticos , Análisis por Conglomerados
11.
Microbiome ; 9(1): 79, 2021 03 29.
Artículo en Inglés | MEDLINE | ID: mdl-33781324

RESUMEN

BACKGROUND: Studies of shifts in microbial community composition has many applications. For studies at species or subspecies levels, the 16S amplicon sequencing lacks resolution and is often replaced by full shotgun sequencing. Due to higher costs, this restricts the number of samples sequenced. As an alternative to a full shotgun sequencing we have investigated the use of Reduced Metagenome Sequencing (RMS) to estimate the composition of a microbial community. This involves the use of double-digested restriction-associated DNA sequencing, which means only a smaller fraction of the genomes are sequenced. The read sets obtained by this approach have properties different from both amplicon and shotgun data, and analysis pipelines for both can either not be used at all or not explore the full potential of RMS data. RESULTS: We suggest a procedure for analyzing such data, based on fragment clustering and the use of a constrained ordinary least square de-convolution for estimating the relative abundance of all community members. Mock community datasets show the potential to clearly separate strains even when the 16S is 100% identical, and genome-wide differences is < 0.02, indicating RMS has a very high resolution. From a simulation study, we compare RMS to shotgun sequencing and show that we get improved abundance estimates when the community has many very closely related genomes. From a real dataset of infant guts, we show that RMS is capable of detecting a strain diversity gradient for Escherichia coli across time. CONCLUSION: We find that RMS is a good alternative to either metabarcoding or shotgun sequencing when it comes to resolving microbial communities at the strain level. Like shotgun metagenomics, it requires a good database of reference genomes and is well suited for studies of the human gut or other communities where many reference genomes exist. A data analysis pipeline is offered, as an R package at https://github.com/larssnip/microRMS . Video abstract.


Asunto(s)
Metagenoma , Microbiota , Humanos , Metagenómica , Microbiota/genética , ARN Ribosómico 16S/genética , Análisis de Secuencia de ADN
12.
Comput Struct Biotechnol J ; 18: 2877-2889, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33163148

RESUMEN

DNA methylation (5mC) and hydroxymethylation (5hmC) are chemical modifications of cytosine bases which play a crucial role in epigenetic gene regulation. However, cost, data complexity and unavailability of comprehensive analytical tools is one of the major challenges in exploring these epigenetic marks. Hydroxymethylation-and Methylation-Sensitive Tag sequencing (HMST-seq) is one of the most cost-effective techniques that enables simultaneous detection of 5mC and 5hmC at single base pair resolution. We present HMST-Seq-Analyzer as a comprehensive and robust method for performing simultaneous differential methylation analysis on 5mC and 5hmC data sets. HMST-Seq-Analyzer can detect Differentially Methylated Regions (DMRs), annotate them, give a visual overview of methylation status and also perform preliminary quality check on the data. In addition to HMST-Seq, our tool can be used on whole-genome bisulfite sequencing (WGBS) and reduced representation bisulfite sequencing (RRBS) data sets as well. The tool is written in Python with capacity to process data in parallel and is available at (https://hmst-seq.github.io/hmst/).

13.
BMC Bioinformatics ; 21(1): 66, 2020 Feb 21.
Artículo en Inglés | MEDLINE | ID: mdl-32085722

RESUMEN

BACKGROUND: Advances in whole genome sequencing strategies have provided the opportunity for genomic and comparative genomic analysis of a vast variety of organisms. The analysis results are highly dependent on the quality of the genome assemblies used. Assessment of the assembly accuracy may significantly increase the reliability of the analysis results and is therefore of great importance. RESULTS: Here, we present a new tool called NucBreak aimed at localizing structural errors in assemblies, including insertions, deletions, duplications, inversions, and different inter- and intra-chromosomal rearrangements. The approach taken by existing alternative tools is based on analysing reads that do not map properly to the assembly, for instance discordantly mapped reads, soft-clipped reads and singletons. NucBreak uses an entirely different and unique method to localise the errors. It is based on analysing the alignments of reads that are properly mapped to an assembly and exploit information about the alternative read alignments. It does not annotate detected errors. We have compared NucBreak with other existing assembly accuracy assessment tools, namely Pilon, REAPR, and FRCbam as well as with several structural variant detection tools, including BreakDancer, Lumpy, and Wham, by using both simulated and real datasets. CONCLUSIONS: The benchmarking results have shown that NucBreak in general predicts assembly errors of different types and sizes with relatively high sensitivity and with lower false discovery rate than the other tools. Such a balance between sensitivity and false discovery rate makes NucBreak a good alternative to the existing assembly accuracy assessment tools and SV detection tools. NucBreak is freely available at https://github.com/uio-bmi/NucBreak under the MPL license.


Asunto(s)
Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Genoma , Reproducibilidad de los Resultados , Programas Informáticos
14.
NAR Cancer ; 2(3): zcaa019, 2020 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-33554121

RESUMEN

In B lymphocytes, the uracil N-glycosylase (UNG) excises genomic uracils made by activation-induced deaminase (AID), thus underpinning antibody gene diversification and oncogenic chromosomal translocations, but also initiating faithful DNA repair. Ung-/- mice develop B-cell lymphoma (BCL). However, since UNG has anti- and pro-oncogenic activities, its tumor suppressor relevance is unclear. Moreover, how the constant DNA damage and repair caused by the AID and UNG interplay affects B-cell fitness and thereby the dynamics of cell populations in vivo is unknown. Here, we show that UNG specifically protects the fitness of germinal center B cells, which express AID, and not of any other B-cell subset, coincident with AID-induced telomere damage activating p53-dependent checkpoints. Consistent with AID expression being detrimental in UNG-deficient B cells, Ung-/- mice develop BCL originating from activated B cells but lose AID expression in the established tumor. Accordingly, we find that UNG is rarely lost in human BCL. The fitness preservation activity of UNG contingent to AID expression was confirmed in a B-cell leukemia model. Hence, UNG, typically considered a tumor suppressor, acquires tumor-enabling activity in cancer cell populations that express AID by protecting cell fitness.

15.
Sci Rep ; 7(1): 7199, 2017 08 03.
Artículo en Inglés | MEDLINE | ID: mdl-28775312

RESUMEN

Both a DNA lesion and an intermediate for antibody maturation, uracil is primarily processed by base excision repair (BER), either initiated by uracil-DNA glycosylase (UNG) or by single-strand selective monofunctional uracil DNA glycosylase (SMUG1). The relative in vivo contributions of each glycosylase remain elusive. To assess the impact of SMUG1 deficiency, we measured uracil and 5-hydroxymethyluracil, another SMUG1 substrate, in Smug1 -/- mice. We found that 5-hydroxymethyluracil accumulated in Smug1 -/- tissues and correlated with 5-hydroxymethylcytosine levels. The highest increase was found in brain, which contained about 26-fold higher genomic 5-hydroxymethyluracil levels than the wild type. Smug1 -/- mice did not accumulate uracil in their genome and Ung -/- mice showed slightly elevated uracil levels. Contrastingly, Ung -/- Smug1 -/- mice showed a synergistic increase in uracil levels with up to 25-fold higher uracil levels than wild type. Whole genome sequencing of UNG/SMUG1-deficient tumours revealed that combined UNG and SMUG1 deficiency leads to the accumulation of mutations, primarily C to T transitions within CpG sequences. This unexpected sequence bias suggests that CpG dinucleotides are intrinsically more mutation prone. In conclusion, we showed that SMUG1 efficiently prevent genomic uracil accumulation, even in the presence of UNG, and identified mutational signatures associated with combined UNG and SMUG1 deficiency.


Asunto(s)
Citosina/metabolismo , Fosfatos de Dinucleósidos/metabolismo , Uracil-ADN Glicosidasa/deficiencia , Uracilo/metabolismo , Animales , Islas de CpG , Desaminación , Genoma , Genómica/métodos , Ratones , Ratones Noqueados , Mutación
16.
BMC Bioinformatics ; 18(1): 338, 2017 Jul 12.
Artículo en Inglés | MEDLINE | ID: mdl-28701187

RESUMEN

BACKGROUND: Comparing sets of sequences is a situation frequently encountered in bioinformatics, examples being comparing an assembly to a reference genome, or two genomes to each other. The purpose of the comparison is usually to find where the two sets differ, e.g. to find where a subsequence is repeated or deleted, or where insertions have been introduced. Such comparisons can be done using whole-genome alignments. Several tools for making such alignments exist, but none of them 1) provides detailed information about the types and locations of all differences between the two sets of sequences, 2) enables visualisation of alignment results at different levels of detail, and 3) carefully takes genomic repeats into consideration. RESULTS: We here present NucDiff, a tool aimed at locating and categorizing differences between two sets of closely related DNA sequences. NucDiff is able to deal with very fragmented genomes, repeated sequences, and various local differences and structural rearrangements. NucDiff determines differences by a rigorous analysis of alignment results obtained by the NUCmer, delta-filter and show-snps programs in the MUMmer sequence alignment package. All differences found are categorized according to a carefully defined classification scheme covering all possible differences between two sequences. Information about the differences is made available as GFF3 files, thus enabling visualisation using genome browsers as well as usage of the results as a component in an analysis pipeline. NucDiff was tested with varying parameters for the alignment step and compared with existing alternatives, called QUAST and dnadiff. CONCLUSIONS: We have developed a whole genome alignment difference classification scheme together with the program NucDiff for finding such differences. The proposed classification scheme is comprehensive and can be used by other tools. NucDiff performs comparably to QUAST and dnadiff but gives much more detailed results that can easily be visualized. NucDiff is freely available on https://github.com/uio-cels/NucDiff under the MPL license.


Asunto(s)
ADN/química , Interfaz Usuario-Computador , Secuencia de Bases , Genómica , Internet , Alineación de Secuencia
17.
mSystems ; 1(1)2016.
Artículo en Inglés | MEDLINE | ID: mdl-27822515

RESUMEN

Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of subsequent analysis steps. Here, we evaluated the performance of recently released state-of-the-art open-source clustering software products, namely, OTUCLUST, Swarm, SUMACLUST, and SortMeRNA, against current principal options (UCLUST and USEARCH) in QIIME, hierarchical clustering methods in mothur, and USEARCH's most recent clustering algorithm, UPARSE. All the latest open-source tools showed promising results, reporting up to 60% fewer spurious OTUs than UCLUST, indicating that the underlying clustering algorithm can vastly reduce the number of these derived OTUs. Furthermore, we observed that stringent quality filtering, such as is done in UPARSE, can cause a significant underestimation of species abundance and diversity, leading to incorrect biological results. Swarm, SUMACLUST, and SortMeRNA have been included in the QIIME 1.9.0 release. IMPORTANCE Massive collections of next-generation sequencing data call for fast, accurate, and easily accessible bioinformatics algorithms to perform sequence clustering. A comprehensive benchmark is presented, including open-source tools and the popular USEARCH suite. Simulated, mock, and environmental communities were used to analyze sensitivity, selectivity, species diversity (alpha and beta), and taxonomic composition. The results demonstrate that recent clustering algorithms can significantly improve accuracy and preserve estimated diversity without the application of aggressive filtering. Moreover, these tools are all open source, apply multiple levels of multithreading, and scale to the demands of modern next-generation sequencing data, which is essential for the analysis of massive multidisciplinary studies such as the Earth Microbiome Project (EMP) (J. A. Gilbert, J. K. Jansson, and R. Knight, BMC Biol 12:69, 2014, http://dx.doi.org/10.1186/s12915-014-0069-1).

18.
BMC Genomics ; 17(1): 791, 2016 10 10.
Artículo en Inglés | MEDLINE | ID: mdl-27724857

RESUMEN

BACKGROUND: As an intracellular human pathogen, Mycobacterium tuberculosis (Mtb) is facing multiple stressful stimuli inside the macrophage and the granuloma. Understanding Mtb responses to stress is essential to identify new virulence factors and pathways that play a role in the survival of the tubercle bacillus. The main goal of this study was to map the regulatory networks of differentially expressed (DE) transcripts in Mtb upon various forms of genotoxic stress. We exposed Mtb cells to oxidative (H2O2 or paraquat), nitrosative (DETA/NO), or alkylation (MNNG) stress or mitomycin C, inducing double-strand breaks in the DNA. Total RNA was isolated from treated and untreated cells and subjected to high-throughput deep sequencing. The data generated was analysed to identify DE genes encoding mRNAs, non-coding RNAs (ncRNAs), and the genes potentially targeted by ncRNAs. RESULTS: The most significant transcriptomic alteration with more than 700 DE genes was seen under nitrosative stress. In addition to genes that belong to the replication, recombination and repair (3R) group, mainly found under mitomycin C stress, we identified DE genes important for bacterial virulence and survival, such as genes of the type VII secretion system (T7SS) and the proline-glutamic acid/proline-proline-glutamic acid (PE/PPE) family. By predicting the structures of hypothetical proteins (HPs) encoded by DE genes, we found that some of these HPs might be involved in mycobacterial genome maintenance. We also applied a state-of-the-art method to predict potential target genes of the identified ncRNAs and found that some of these could regulate several genes that might be directly involved in the response to genotoxic stress. CONCLUSIONS: Our study reflects the complexity of the response of Mtb in handling genotoxic stress. In addition to genes involved in genome maintenance, other potential key players, such as the members of the T7SS and PE/PPE gene family, were identified. This plethora of responses is detected not only at the level of DE genes encoding mRNAs but also at the level of ncRNAs and their potential targets.


Asunto(s)
Daño del ADN , Regulación Bacteriana de la Expresión Génica/efectos de los fármacos , Mycobacterium tuberculosis/genética , Transcriptoma , Análisis por Conglomerados , Daño del ADN/efectos de los fármacos , Perfilación de la Expresión Génica , Humanos , Peróxido de Hidrógeno/toxicidad , Metilnitronitrosoguanidina/toxicidad , Mycobacterium tuberculosis/efectos de los fármacos , Sistemas de Secreción Tipo VII/genética
19.
PeerJ ; 4: e2584, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27781170

RESUMEN

BACKGROUND: VSEARCH is an open source and free of charge multithreaded 64-bit tool for processing and preparing metagenomics, genomics and population genomics nucleotide sequence data. It is designed as an alternative to the widely used USEARCH tool (Edgar, 2010) for which the source code is not publicly available, algorithm details are only rudimentarily described, and only a memory-confined 32-bit version is freely available for academic use. METHODS: When searching nucleotide sequences, VSEARCH uses a fast heuristic based on words shared by the query and target sequences in order to quickly identify similar sequences, a similar strategy is probably used in USEARCH. VSEARCH then performs optimal global sequence alignment of the query against potential target sequences, using full dynamic programming instead of the seed-and-extend heuristic used by USEARCH. Pairwise alignments are computed in parallel using vectorisation and multiple threads. RESULTS: VSEARCH includes most commands for analysing nucleotide sequences available in USEARCH version 7 and several of those available in USEARCH version 8, including searching (exact or based on global alignment), clustering by similarity (using length pre-sorting, abundance pre-sorting or a user-defined order), chimera detection (reference-based or de novo), dereplication (full length or prefix), pairwise alignment, reverse complementation, sorting, and subsampling. VSEARCH also includes commands for FASTQ file processing, i.e., format detection, filtering, read quality statistics, and merging of paired reads. Furthermore, VSEARCH extends functionality with several new commands and improvements, including shuffling, rereplication, masking of low-complexity sequences with the well-known DUST algorithm, a choice among different similarity definitions, and FASTQ file format conversion. VSEARCH is here shown to be more accurate than USEARCH when performing searching, clustering, chimera detection and subsampling, while on a par with USEARCH for paired-ends read merging. VSEARCH is slower than USEARCH when performing clustering and chimera detection, but significantly faster when performing paired-end reads merging and dereplication. VSEARCH is available at https://github.com/torognes/vsearch under either the BSD 2-clause license or the GNU General Public License version 3.0. DISCUSSION: VSEARCH has been shown to be a fast, accurate and full-fledged alternative to USEARCH. A free and open-source versatile tool for sequence analysis is now available to the metagenomics community.

20.
BMC Genomics ; 17: 51, 2016 Jan 14.
Artículo en Inglés | MEDLINE | ID: mdl-26764020

RESUMEN

BACKGROUND: With advances in next generation sequencing technology and analysis methods, single nucleotide variants (SNVs) and indels can be detected with high sensitivity and specificity in exome sequencing data. Recent studies have demonstrated the ability to detect disease-causing copy number variants (CNVs) in exome sequencing data. However, exonic CNV prediction programs have shown high false positive CNV counts, which is the major limiting factor for the applicability of these programs in clinical studies. RESULTS: We have developed a tool (cnvScan) to improve the clinical utility of computational CNV prediction in exome data. cnvScan can accept input from any CNV prediction program. cnvScan consists of two steps: CNV screening and CNV annotation. CNV screening evaluates CNV prediction using quality scores and refines this using an in-house CNV database, which greatly reduces the false positive rate. The annotation step provides functionally and clinically relevant information using multiple source datasets. We assessed the performance of cnvScan on CNV predictions from five different prediction programs using 64 exomes from Primary Immunodeficiency (PIDD) patients, and identified PIDD-causing CNVs in three individuals from two different families. CONCLUSIONS: In summary, cnvScan reduces the time and effort required to detect disease-causing CNVs by reducing the false positive count and providing annotation. This improves the clinical utility of CNV detection in exome data.


Asunto(s)
Variaciones en el Número de Copia de ADN/genética , Exoma/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Algoritmos , Exones/genética , Femenino , Humanos , Masculino , Anotación de Secuencia Molecular , Mutación
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...