Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 42
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Anal Chem ; 96(5): 1825-1833, 2024 02 06.
Artículo en Inglés | MEDLINE | ID: mdl-38275837

RESUMEN

Cancer onset and progression are known to be regulated by genetic and epigenetic events, including RNA modifications (a.k.a. epitranscriptomics). So far, more than 150 chemical modifications have been described in all RNA subtypes, including messenger, ribosomal, and transfer RNAs. RNA modifications and their regulators are known to be implicated in all steps of post-transcriptional regulation. The dysregulation of this complex yet delicate balance can contribute to disease evolution, particularly in the context of carcinogenesis, where cells are subjected to various stresses. We sought to discover RNA modifications involved in cancer cell adaptation to inhospitable environments, a peculiar feature of cancer stem cells (CSCs). We were particularly interested in the RNA marks that help the adaptation of cancer cells to suspension culture, which is often used as a surrogate to evaluate the tumorigenic potential. For this purpose, we designed an experimental pipeline consisting of four steps: (1) cell culture in different growth conditions to favor CSC survival; (2) simultaneous RNA subtype (mRNA, rRNA, tRNA) enrichment and RNA hydrolysis; (3) the multiplex analysis of nucleosides by LC-MS/MS followed by statistical/bioinformatic analysis; and (4) the functional validation of identified RNA marks. This study demonstrates that the RNA modification landscape evolves along with the cancer cell phenotype under growth constraints. Remarkably, we discovered a short epitranscriptomic signature, conserved across colorectal cancer cell lines and associated with enrichment in CSCs. Functional tests confirmed the importance of selected marks in the process of adaptation to suspension culture, confirming the validity of our approach and opening up interesting prospects in the field.


Asunto(s)
Neoplasias , Procesamiento Postranscripcional del ARN , Cromatografía Liquida , Espectrometría de Masas en Tándem , ARN/genética , ARN/metabolismo , ARN de Transferencia/genética , ARN de Transferencia/metabolismo , Neoplasias/genética
2.
Bioinformatics ; 39(12)2023 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-37975872

RESUMEN

MOTIVATION: Phylogenetic placement enables phylogenetic analysis of massive collections of newly sequenced DNA, when de novo tree inference is too unreliable or inefficient. Assuming that a high-quality reference tree is available, the idea is to seek the correct placement of the new sequences in that tree. Recently, alignment-free approaches to phylogenetic placement have emerged, both to circumvent the need to align the new sequences and to avoid the calculations that typically follow the alignment step. A promising approach is based on the inference of k-mers that can be potentially related to the reference sequences, also called phylo-k-mers. However, its usage is limited by the time and memory-consuming stage of reference data preprocessing and the large numbers of k-mers to consider. RESULTS: We suggest a filtering method for selecting informative phylo-k-mers based on mutual information, which can significantly improve the efficiency of placement, at the cost of a small loss in placement accuracy. This method is implemented in IPK, a new tool for computing phylo-k-mers that significantly outperforms the software previously available. We also present EPIK, a new software for phylogenetic placement, supporting filtered phylo-k-mer databases. Our experiments on real-world data show that EPIK is the fastest phylogenetic placement tool available, when placing hundreds of thousands and millions of queries while still providing accurate placements. AVAILABILITY AND IMPLEMENTATION: IPK and EPIK are freely available at https://github.com/phylo42/IPK and https://github.com/phylo42/EPIK. Both are implemented in C++ and Python and supported on Linux and MacOS.


Asunto(s)
Algoritmos , Programas Informáticos , Filogenia , Análisis de Secuencia de ADN , Secuencia de Bases
3.
PLoS Comput Biol ; 19(10): e1011522, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37862386

RESUMEN

Gene expression is the synthesis of proteins from the information encoded on DNA. One of the two main steps of gene expression is the translation of messenger RNA (mRNA) into polypeptide sequences of amino acids. Here, by taking into account mRNA degradation, we model the motion of ribosomes along mRNA with a ballistic model where particles advance along a filament without excluded volume interactions. Unidirectional models of transport have previously been used to fit the average density of ribosomes obtained by the experimental ribo-sequencing (Ribo-seq) technique in order to obtain the kinetic rates. The degradation rate is not, however, accounted for and experimental data from different experiments are needed to have enough parameters for the fit. Here, we propose an entirely novel experimental setup and theoretical framework consisting in splitting the mRNAs into categories depending on the number of ribosomes from one to four. We solve analytically the ballistic model for a fixed number of ribosomes per mRNA, study the different regimes of degradation, and propose a criterion for the quality of the inverse fit. The proposed method provides a high sensitivity to the mRNA degradation rate. The additional equations coming from using the monosome (single ribosome) and polysome (arbitrary number) ribo-seq profiles enable us to determine all the kinetic rates in terms of the experimentally accessible mRNA degradation rate.


Asunto(s)
Biosíntesis de Proteínas , Perfilado de Ribosomas , ARN Mensajero/metabolismo , Biosíntesis de Proteínas/genética , Ribosomas/genética , Ribosomas/metabolismo , Proteínas/metabolismo
4.
IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 2889-2897, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37204943

RESUMEN

Finding the correct position of new sequences within an established phylogenetic tree is an increasingly relevant problem in evolutionary bioinformatics and metagenomics. Recently, alignment-free approaches for this task have been proposed. One such approach is based on the concept of phylogenetically-informative k-mers or phylo- k-mers for short. In practice, phylo- k-mers are inferred from a set of related reference sequences and are equipped with scores expressing the probability of their appearance in different locations within the input reference phylogeny. Computing phylo- k-mers, however, represents a computational bottleneck to their applicability in real-world problems such as the phylogenetic analysis of metabarcoding reads and the detection of novel recombinant viruses. Here we consider the problem of phylo- k-mer computation: how can we efficiently find all k-mers whose probability lies above a given threshold for a given tree node? We describe and analyze algorithms for this problem, relying on branch-and-bound and divide-and-conquer techniques. We exploit the redundancy of adjacent windows of the alignment to save on computation. Besides computational complexity analyses, we provide an empirical evaluation of the relative performance of their implementations on simulated and real-world data. The divide-and-conquer algorithms are found to surpass the branch-and-bound approach, especially when many phylo- k-mers are found.

5.
Bioinformatics ; 39(4)2023 04 03.
Artículo en Inglés | MEDLINE | ID: mdl-37010504

RESUMEN

MOTIVATION: Seeking probabilistic motifs in a sequence is a common task to annotate putative transcription factor binding sites or other RNA/DNA binding sites. Useful motif representations include position weight matrices (PWMs), dinucleotide PWMs (di-PWMs), and hidden Markov models (HMMs). Dinucleotide PWMs not only combine the simplicity of PWMs-a matrix form and a cumulative scoring function-but also incorporate dependency between adjacent positions in the motif (unlike PWMs which disregard any dependency). For instance to represent binding sites, the HOCOMOCO database provides di-PWM motifs derived from experimental data. Currently, two programs, SPRy-SARUS and MOODS, can search for occurrences of di-PWMs in sequences. RESULTS: We propose a Python package called dipwmsearch, which provides an original and efficient algorithm for this task (it first enumerates matching words for the di-PWM, and then searches these all at once in the sequence, even if the latter contains IUPAC codes). The user benefits from an easy installation via Pypi or conda, a comprehensive documentation, and executable scripts that facilitate the use of di-PWMs. AVAILABILITY AND IMPLEMENTATION: dipwmsearch is available at https://pypi.org/project/dipwmsearch/ and https://gite.lirmm.fr/rivals/dipwmsearch/ under Cecill license.


Asunto(s)
Algoritmos , Biología Computacional , Sitios de Unión , Unión Proteica , Posición Específica de Matrices de Puntuación
6.
Malar J ; 22(1): 27, 2023 Jan 25.
Artículo en Inglés | MEDLINE | ID: mdl-36698187

RESUMEN

BACKGROUND: Protozoan parasites are known to attach specific and diverse group of proteins to their plasma membrane via a GPI anchor. In malaria parasites, GPI-anchored proteins (GPI-APs) have been shown to play an important role in host-pathogen interactions and a key function in host cell invasion and immune evasion. Because of their immunogenic properties, some of these proteins have been considered as malaria vaccine candidates. However, identification of all possible GPI-APs encoded by these parasites remains challenging due to their sequence diversity and limitations of the tools used for their characterization. METHODS: The FT-GPI software was developed to detect GPI-APs based on the presence of a hydrophobic helix at both ends of the premature peptide. FT-GPI was implemented in C ++and applied to study the GPI-proteome of 46 isolates of the order Haemosporida. Using the GPI proteome of Plasmodium falciparum strain 3D7 and Plasmodium vivax strain Sal-1, a heuristic method was defined to select the most sensitive and specific FT-GPI software parameters. RESULTS: FT-GPI enabled revision of the GPI-proteome of P. falciparum and P. vivax, including the identification of novel GPI-APs. Orthology- and synteny-based analyses showed that 19 of the 37 GPI-APs found in the order Haemosporida are conserved among Plasmodium species. Our analyses suggest that gene duplication and deletion events may have contributed significantly to the evolution of the GPI proteome, and its composition correlates with speciation. CONCLUSION: FT-GPI-based prediction is a useful tool for mining GPI-APs and gaining further insights into their evolution and sequence diversity. This resource may also help identify new protein candidates for the development of vaccines for malaria and other parasitic diseases.


Asunto(s)
Proteínas Ligadas a GPI , Plasmodium falciparum , Plasmodium vivax , Proteoma , Proteínas Protozoarias , Proteínas Ligadas a GPI/genética , Plasmodium falciparum/genética , Plasmodium vivax/genética , Proteoma/análisis , Proteínas Protozoarias/genética
7.
RNA Biol ; 19(1): 132-142, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35067178

RESUMEN

The last decade has seen mRNA modification emerge as a new layer of gene expression regulation. The Fat mass and obesity-associated protein (FTO) was the first identified eraser of N6-methyladenosine (m6A) adducts, the most widespread modification in eukaryotic messenger RNA. This discovery, of a reversible and dynamic RNA modification, aided by recent technological advances in RNA mass spectrometry and sequencing has led to the birth of the field of epitranscriptomics. FTO crystallized much of the attention of epitranscriptomics researchers and resulted in the publication of numerous, yet contradictory, studies describing the regulatory role of FTO in gene expression and central biological processes. These incongruities may be explained by a wide spectrum of FTO substrates and RNA sequence preferences: FTO binds multiple RNA species (mRNA, snRNA and tRNA) and can demethylate internal m6A in mRNA and snRNA, N6,2'-O-dimethyladenosine (m6Am) adjacent to the mRNA cap, and N1-methyladenosine (m1A) in tRNA. Here, we review current knowledge related to FTO function in healthy and cancer cells. In particular, we emphasize the divergent role(s) attributed to FTO in different tissues and subcellular and molecular contexts.


Asunto(s)
Tejido Adiposo/metabolismo , Dioxigenasa FTO Dependiente de Alfa-Cetoglutarato/genética , Dioxigenasa FTO Dependiente de Alfa-Cetoglutarato/metabolismo , Regulación de la Expresión Génica , Neoplasias/etiología , Neoplasias/metabolismo , Adenosina/análogos & derivados , Tejido Adiposo/anatomía & histología , Adiposidad , Catálisis , Susceptibilidad a Enfermedades , Epigénesis Genética , Homeostasis , Humanos , Neoplasias/patología , Especificidad de Órganos , Procesamiento Postranscripcional del ARN , ARN Mensajero/genética , ARN Nuclear Pequeño/genética , ARN de Transferencia/genética , Proteínas de Unión al ARN , Especificidad por Sustrato
8.
Nat Commun ; 13(1): 173, 2022 01 10.
Artículo en Inglés | MEDLINE | ID: mdl-35013311

RESUMEN

Mechanisms of drug-tolerance remain poorly understood and have been linked to genomic but also to non-genomic processes. 5-fluorouracil (5-FU), the most widely used chemotherapy in oncology is associated with resistance. While prescribed as an inhibitor of DNA replication, 5-FU alters all RNA pathways. Here, we show that 5-FU treatment leads to the production of fluorinated ribosomes exhibiting altered translational activities. 5-FU is incorporated into ribosomal RNAs of mature ribosomes in cancer cell lines, colorectal xenografts, and human tumors. Fluorinated ribosomes appear to be functional, yet, they display a selective translational activity towards mRNAs depending on the nature of their 5'-untranslated region. As a result, we find that sustained translation of IGF-1R mRNA, which encodes one of the most potent cell survival effectors, promotes the survival of 5-FU-treated colorectal cancer cells. Altogether, our results demonstrate that "man-made" fluorinated ribosomes favor the drug-tolerant cellular phenotype by promoting translation of survival genes.


Asunto(s)
Antimetabolitos Antineoplásicos/farmacología , Neoplasias Colorrectales/tratamiento farmacológico , ADN de Neoplasias/genética , Tolerancia a Medicamentos/genética , Fluorouracilo/farmacología , Biosíntesis de Proteínas/efectos de los fármacos , Receptor IGF Tipo 1/genética , Línea Celular Tumoral , Supervivencia Celular/efectos de los fármacos , Neoplasias Colorrectales/genética , Neoplasias Colorrectales/metabolismo , Neoplasias Colorrectales/patología , Replicación del ADN , ADN de Neoplasias/metabolismo , Resistencia a Antineoplásicos/genética , Células HCT116 , Halogenación , Humanos , ARN Mensajero/genética , ARN Mensajero/metabolismo , ARN Ribosómico/genética , ARN Ribosómico/metabolismo , Receptor IGF Tipo 1/agonistas , Receptor IGF Tipo 1/metabolismo , Ribosomas/efectos de los fármacos , Ribosomas/genética , Ribosomas/metabolismo , Ensayos Antitumor por Modelo de Xenoinjerto
9.
Nat Commun ; 12(1): 1716, 2021 03 19.
Artículo en Inglés | MEDLINE | ID: mdl-33741917

RESUMEN

Cancer stem cells (CSCs) are a small but critical cell population for cancer biology since they display inherent resistance to standard therapies and give rise to metastases. Despite accruing evidence establishing a link between deregulation of epitranscriptome-related players and tumorigenic process, the role of messenger RNA (mRNA) modifications in the regulation of CSC properties remains poorly understood. Here, we show that the cytoplasmic pool of fat mass and obesity-associated protein (FTO) impedes CSC abilities in colorectal cancer through its N6,2'-O-dimethyladenosine (m6Am) demethylase activity. While m6Am is strategically located next to the m7G-mRNA cap, its biological function is not well understood and has not been addressed in cancer. Low FTO expression in patient-derived cell lines elevates m6Am level in mRNA which results in enhanced in vivo tumorigenicity and chemoresistance. Inhibition of the nuclear m6Am methyltransferase, PCIF1/CAPAM, fully reverses this phenotype, stressing the role of m6Am modification in stem-like properties acquisition. FTO-mediated regulation of m6Am marking constitutes a reversible pathway controlling CSC abilities. Altogether, our findings bring to light the first biological function of the m6Am modification and its potential adverse consequences for colorectal cancer management.


Asunto(s)
Dioxigenasa FTO Dependiente de Alfa-Cetoglutarato/metabolismo , Neoplasias Colorrectales/metabolismo , Citoplasma/metabolismo , Desmetilación , Proteínas Adaptadoras Transductoras de Señales/metabolismo , Adenosina/metabolismo , Dioxigenasa FTO Dependiente de Alfa-Cetoglutarato/genética , Línea Celular Tumoral , Núcleo Celular/metabolismo , Neoplasias Colorrectales/genética , Regulación Neoplásica de la Expresión Génica , Silenciador del Gen , Humanos , Metiltransferasas/metabolismo , Proteínas Nucleares/metabolismo , ARN Mensajero/metabolismo
10.
Bioinformatics ; 36(22-23): 5351-5360, 2021 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-33331849

RESUMEN

MOTIVATION: Novel recombinant viruses may have important medical and evolutionary significance, as they sometimes display new traits not present in the parental strains. This is particularly concerning when the new viruses combine fragments coming from phylogenetically distinct viral types. Here, we consider the task of screening large collections of sequences for such novel recombinants. A number of methods already exist for this task. However, these methods rely on complex models and heavy computations that are not always practical for a quick scan of a large number of sequences. RESULTS: We have developed SHERPAS, a new program to detect novel recombinants and provide a first estimate of their parental composition. Our approach is based on the precomputation of a large database of 'phylogenetically-informed k-mers', an idea recently introduced in the context of phylogenetic placement in metagenomics. Our experiments show that SHERPAS is hundreds to thousands of times faster than existing software, and enables the analysis of thousands of whole genomes, or long-sequencing reads, within minutes or seconds, and with limited loss of accuracy. AVAILABILITY AND IMPLEMENTATION: The source code is freely available for download at https://github.com/phylo42/sherpas. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

11.
Bioinformatics ; 36(21): 5264-5266, 2021 01 29.
Artículo en Inglés | MEDLINE | ID: mdl-32697844

RESUMEN

MOTIVATION: Phylogenetic placement (PP) is a process of taxonomic identification for which several tools are now available. However, it remains difficult to assess which tool is more adapted to particular genomic data or a particular reference taxonomy. We developed Placement Evaluation WOrkflows (PEWO), the first benchmarking tool dedicated to PP assessment. Its automated workflows can evaluate PP at many levels, from parameter optimization for a particular tool, to the selection of the most appropriate genetic marker when PP-based species identifications are targeted. Our goal is that PEWO will become a community effort and a standard support for future developments and applications of PP. AVAILABILITY AND IMPLEMENTATION: https://github.com/phylo42/PEWO. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Benchmarking , Programas Informáticos , Genoma , Filogenia , Flujo de Trabajo
12.
Nat Commun ; 10(1): 3072, 2019 07 11.
Artículo en Inglés | MEDLINE | ID: mdl-31296853

RESUMEN

Faithful transcription initiation is critical for accurate gene expression, yet the mechanisms underlying specific transcription start site (TSS) selection in mammals remain unclear. Here, we show that the histone-fold domain protein NF-Y, a ubiquitously expressed transcription factor, controls the fidelity of transcription initiation at gene promoters in mouse embryonic stem cells. We report that NF-Y maintains the region upstream of TSSs in a nucleosome-depleted state while simultaneously protecting this accessible region against aberrant and/or ectopic transcription initiation. We find that loss of NF-Y binding in mammalian cells disrupts the promoter chromatin landscape, leading to nucleosomal encroachment over the canonical TSS. Importantly, this chromatin rearrangement is accompanied by upstream relocation of the transcription pre-initiation complex and ectopic transcription initiation. Further, this phenomenon generates aberrant extended transcripts that undergo translation, disrupting gene expression profiles. These results suggest NF-Y is a central player in TSS selection in metazoans and highlight the deleterious consequences of inaccurate transcription initiation.


Asunto(s)
Factor de Unión a CCAAT/metabolismo , Nucleosomas/metabolismo , Sitio de Iniciación de la Transcripción , Iniciación de la Transcripción Genética , Animales , Factor de Unión a CCAAT/genética , Línea Celular , Cromatina/genética , Cromatina/metabolismo , Células Madre Embrionarias , Técnicas de Silenciamiento del Gen , Ratones , Nucleosomas/genética , Regiones Promotoras Genéticas/genética , ARN Interferente Pequeño/metabolismo
13.
Bioinformatics ; 35(17): 3163-3165, 2019 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-30649190

RESUMEN

MOTIVATION: The visualization and interpretation of evolutionary spatiotemporal scenarios is broadly and increasingly used in infectious disease research, ecology or agronomy. Using probabilistic frameworks, well-known tools can infer from molecular data ancestral traits for internal nodes in a phylogeny, and numerous phylogenetic rendering tools can display such evolutionary trees. However, visualizing such ancestral information and its uncertainty on the tree remains tedious. For instance, ancestral nodes can be associated to several geographical annotations with close probabilities and thus, several migration or transmission scenarios exist. RESULTS: We expose a web-based tool, named AQUAPONY, that facilitates such operations. Given an evolutionary tree with ancestral (e.g. geographical) annotations, the user can easily control the display of ancestral information on the entire tree or a subtree, and can view alternative phylogeographic scenarios along a branch according to a chosen uncertainty threshold. AQUAPONY interactively visualizes the tree and eases the objective interpretation of evolutionary scenarios. AQUAPONY's implementation makes it highly responsive to user interaction, and instantaneously updates the tree visualizations even for large trees (which can be exported as image files). AVAILABILITY AND IMPLEMENTATION: AQUAPONY is coded in JavaScript/HTML, available under Cecill license, and can be freely used at http://www.atgc-montpellier.fr/aquapony/.


Asunto(s)
Filogenia , Programas Informáticos , Fenotipo , Filogeografía
14.
Malar J ; 16(1): 493, 2017 Dec 19.
Artículo en Inglés | MEDLINE | ID: mdl-29258508

RESUMEN

BACKGROUND: Plasmodium falciparum malaria is one of the most widespread parasitic infections in humans and remains a leading global health concern. Malaria elimination efforts are threatened by the emergence and spread of resistance to artemisinin-based combination therapy, the first-line treatment of malaria. Promising molecular markers and pathways associated with artemisinin drug resistance have been identified, but the underlying molecular mechanisms of resistance remains unknown. The genomic data from early period of emergence of artemisinin resistance (2008-2011) was evaluated, with aim to define k13 associated genetic background in Cambodia, the country identified as epicentre of anti-malarial drug resistance, through characterization of 167 parasite isolates using a panel of 21,257 SNPs. RESULTS: Eight subpopulations were identified suggesting a process of acquisition of artemisinin resistance consistent with an emergence-selection-diffusion model, supported by the shifting balance theory. Identification of population specific mutations facilitated the characterization of a core set of 57 background genes associated with artemisinin resistance and associated pathways. The analysis indicates that the background of artemisinin resistance was not acquired after drug pressure, rather is the result of fixation followed by selection on the daughter subpopulations derived from the ancestral population. CONCLUSIONS: Functional analysis of artemisinin resistance subpopulations illustrates the strong interplay between ubiquitination and cell division or differentiation in artemisinin resistant parasites. The relationship of these pathways with the P. falciparum resistant subpopulation and presence of drug resistance markers in addition to k13, highlights the major role of admixed parasite population in the diffusion of artemisinin resistant background. The diffusion of resistant genes in the Cambodian admixed population after selection resulted from mating of gametocytes of sensitive and resistant parasite populations.


Asunto(s)
Artemisininas/farmacología , Resistencia a Medicamentos , Malaria Falciparum/epidemiología , Plasmodium falciparum/efectos de los fármacos , Plasmodium falciparum/genética , Antimaláricos/farmacología , Cambodia/epidemiología , Genotipo , Humanos , Malaria Falciparum/parasitología , Mutación , Plasmodium falciparum/clasificación , Plasmodium falciparum/metabolismo , Polimorfismo de Nucleótido Simple , Proteínas Protozoarias/genética
15.
Sci Adv ; 3(7): e1700239, 2017 07.
Artículo en Inglés | MEDLINE | ID: mdl-28695208

RESUMEN

Tiny photosynthetic microorganisms that form the picoplankton (between 0.3 and 3 µm in diameter) are at the base of the food web in many marine ecosystems, and their adaptability to environmental change hinges on standing genetic variation. Although the genomic and phenotypic diversity of the bacterial component of the oceans has been intensively studied, little is known about the genomic and phenotypic diversity within each of the diverse eukaryotic species present. We report the level of genomic diversity in a natural population of Ostreococcus tauri (Chlorophyta, Mamiellophyceae), the smallest photosynthetic eukaryote. Contrary to the expectations of clonal evolution or cryptic species, the spectrum of genomic polymorphism observed suggests a large panmictic population (an effective population size of 1.2 × 107) with pervasive evidence of sexual reproduction. De novo assemblies of low-coverage chromosomes reveal two large candidate mating-type loci with suppressed recombination, whose origin may pre-date the speciation events in the class Mamiellophyceae. This high genetic diversity is associated with large phenotypic differences between strains. Strikingly, resistance of isolates to large double-stranded DNA viruses, which abound in their natural environment, is positively correlated with the size of a single hypervariable chromosome, which contains 44 to 156 kb of strain-specific sequences. Our findings highlight the role of viruses in shaping genome diversity in marine picoeukaryotes.


Asunto(s)
Cromosomas , Variación Genética , Genética de Población , Genómica , Fitoplancton/genética , Susceptibilidad a Enfermedades , Evolución Molecular , Genómica/métodos , Mutación , Fenotipo , Filogenia , Fitoplancton/clasificación , Fitoplancton/virología , Polimorfismo de Nucleótido Simple , Selección Genética
16.
Genome Res ; 27(5): 835-848, 2017 05.
Artículo en Inglés | MEDLINE | ID: mdl-28396522

RESUMEN

A viral quasispecies, the ensemble of viral strains populating an infected person, can be highly diverse. For optimal assessment of virulence, pathogenesis, and therapy selection, determining the haplotypes of the individual strains can play a key role. As many viruses are subject to high mutation and recombination rates, high-quality reference genomes are often not available at the time of a new disease outbreak. We present SAVAGE, a computational tool for reconstructing individual haplotypes of intra-host virus strains without the need for a high-quality reference genome. SAVAGE makes use of either FM-index-based data structures or ad hoc consensus reference sequence for constructing overlap graphs from patient sample data. In this overlap graph, nodes represent reads and/or contigs, while edges reflect that two reads/contigs, based on sound statistical considerations, represent identical haplotypic sequence. Following an iterative scheme, a new overlap assembly algorithm that is based on the enumeration of statistically well-calibrated groups of reads/contigs then efficiently reconstructs the individual haplotypes from this overlap graph. In benchmark experiments on simulated and on real deep-coverage data, SAVAGE drastically outperforms generic de novo assemblers as well as the only specialized de novo viral quasispecies assembler available so far. When run on ad hoc consensus reference sequence, SAVAGE performs very favorably in comparison with state-of-the-art reference genome-guided tools. We also apply SAVAGE on two deep-coverage samples of patients infected by the Zika and the hepatitis C virus, respectively, which sheds light on the genetic structures of the respective viral quasispecies.


Asunto(s)
Mapeo Contig/métodos , Genoma Viral , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Mapeo Contig/normas , Genómica/normas , Haplotipos , Hepacivirus/genética , Polimorfismo Genético , Estándares de Referencia , Análisis de Secuencia de ADN/normas , Virus Zika/genética
17.
DNA Res ; 24(3): 303-210, 2017 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-28168289

RESUMEN

Codon usage is biased between lowly and highly expressed genes in a genome-specific manner. This universal bias has been well assessed in some unicellular species, but remains problematic to assess in more complex species. We propose a new method to compute codon usage bias based on genome wide translational data. A new technique based on sequencing of ribosome protected mRNA fragments (Ribo-seq) allowed us to rank genes and compute codon usage bias with high precision for a great variety of species, including mammals. Genes ranking using Ribo-Seq data confirms the influence of the tRNA pool on codon usage bias and shows a decreasing bias in multicellular species. Ribo-Seq analysis also makes possible to detect preferred codons without information on genes function.


Asunto(s)
Codón/genética , Genómica/métodos , Biosíntesis de Proteínas , Transcriptoma , Animales , Codón/análisis , Eucariontes/genética , Evolución Molecular , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , ARN Mensajero , Ribosomas , Análisis de Secuencia de ARN
18.
Bioinformatics ; 33(6): 799-806, 2017 03 15.
Artículo en Inglés | MEDLINE | ID: mdl-27273673

RESUMEN

Motivation: New long read sequencing technologies, like PacBio SMRT and Oxford NanoPore, can produce sequencing reads up to 50 000 bp long but with an error rate of at least 15%. Reducing the error rate is necessary for subsequent utilization of the reads in, e.g. de novo genome assembly. The error correction problem has been tackled either by aligning the long reads against each other or by a hybrid approach that uses the more accurate short reads produced by second generation sequencing technologies to correct the long reads. Results: We present an error correction method that uses long reads only. The method consists of two phases: first, we use an iterative alignment-free correction method based on de Bruijn graphs with increasing length of k -mers, and second, the corrected reads are further polished using long-distance dependencies that are found using multiple alignments. According to our experiments, the proposed method is the most accurate one relying on long reads only for read sets with high coverage. Furthermore, when the coverage of the read set is at least 75×, the throughput of the new method is at least 20% higher. Availability and Implementation: LoRMA is freely available at http://www.cs.helsinki.fi/u/lmsalmel/LoRMA/ . Contact: leena.salmela@cs.helsinki.fi.


Asunto(s)
Análisis de Secuencia de ADN/métodos , Programas Informáticos , Algoritmos , Escherichia coli/genética , Genoma , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Saccharomyces cerevisiae/genética
19.
BMC Bioinformatics ; 17(1): 237, 2016 Jun 16.
Artículo en Inglés | MEDLINE | ID: mdl-27306641

RESUMEN

BACKGROUND: Next Generation Sequencing (NGS) has dramatically enhanced our ability to sequence genomes, but not to assemble them. In practice, many published genome sequences remain in the state of a large set of contigs. Each contig describes the sequence found along some path of the assembly graph, however, the set of contigs does not record all the sequence information contained in that graph. Although many subsequent analyses can be performed with the set of contigs, one may ask whether mapping reads on the contigs is as informative as mapping them on the paths of the assembly graph. Currently, one lacks practical tools to perform mapping on such graphs. RESULTS: Here, we propose a formal definition of mapping on a de Bruijn graph, analyse the problem complexity which turns out to be NP-complete, and provide a practical solution. We propose a pipeline called GGMAP (Greedy Graph MAPping). Its novelty is a procedure to map reads on branching paths of the graph, for which we designed a heuristic algorithm called BGREAT (de Bruijn Graph REAd mapping Tool). For the sake of efficiency, BGREAT rewrites a read sequence as a succession of unitigs sequences. GGMAP can map millions of reads per CPU hour on a de Bruijn graph built from a large set of human genomic reads. Surprisingly, results show that up to 22 % more reads can be mapped on the graph but not on the contig set. CONCLUSIONS: Although mapping reads on a de Bruijn graph is complex task, our proposal offers a practical solution combining efficiency with an improved mapping capacity compared to assembly-based mapping even for complex eukaryotic data.


Asunto(s)
Escherichia coli/genética , Genoma Humano , Genómica/métodos , Algoritmos , Mapeo Contig , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Secuencia de ADN
20.
Gigascience ; 5: 9, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-26870323

RESUMEN

BACKGROUND: With next-generation sequencing (NGS) technologies, the life sciences face a deluge of raw data. Classical analysis processes for such data often begin with an assembly step, needing large amounts of computing resources, and potentially removing or modifying parts of the biological information contained in the data. Our approach proposes to focus directly on biological questions, by considering raw unassembled NGS data, through a suite of six command-line tools. FINDINGS: Dedicated to 'whole-genome assembly-free' treatments, the Colib'read tools suite uses optimized algorithms for various analyses of NGS datasets, such as variant calling or read set comparisons. Based on the use of a de Bruijn graph and bloom filter, such analyses can be performed in a few hours, using small amounts of memory. Applications using real data demonstrate the good accuracy of these tools compared to classical approaches. To facilitate data analysis and tools dissemination, we developed Galaxy tools and tool shed repositories. CONCLUSIONS: With the Colib'read Galaxy tools suite, we enable a broad range of life scientists to analyze raw NGS data. More importantly, our approach allows the maximum biological information to be retained in the data, and uses a very low memory footprint.


Asunto(s)
Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Almacenamiento y Recuperación de la Información/métodos , Programas Informáticos , Secuencia de Bases , Análisis por Conglomerados , Genoma/genética , Genómica/métodos , Datos de Secuencia Molecular , Reproducibilidad de los Resultados
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...