Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
Sci Rep ; 11(1): 761, 2021 01 12.
Article in English | MEDLINE | ID: mdl-33436980

ABSTRACT

Third-generation sequencing technologies allow to sequence long reads of tens of kbp, that are expected to solve various problems. However, they display high error rates, currently capped around 10%. Self-correction is thus regularly used in long reads analysis projects. We introduce CONSENT, a new self-correction method that relies both on multiple sequence alignment and local de Bruijn graphs. To ensure scalability, multiple sequence alignment computation benefits from a new and efficient segmentation strategy, allowing a massive speedup. CONSENT compares well to the state-of-the-art, and performs better on real Oxford Nanopore data. Specifically, CONSENT is the only method that efficiently scales to ultra-long reads, and allows to process a full human dataset, containing reads reaching up to 1.5 Mbp, in 10 days. Moreover, our experiments show that error correction with CONSENT improves the quality of Flye assemblies. Additionally, CONSENT implements a polishing feature, allowing to correct raw assemblies. Our experiments show that CONSENT is 2-38x times faster than other polishing tools, while providing comparable results. Furthermore, we show that, on a human dataset, assembling the raw data and polishing the assembly is less resource consuming than correcting and then assembling the reads, while providing better results. CONSENT is available at https://github.com/morispi/CONSENT .


Subject(s)
Computational Biology/methods , High-Throughput Nucleotide Sequencing/methods , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Algorithms , Animals , Caenorhabditis elegans/genetics , Escherichia coli/genetics , Genome , Humans , Nanopores , Saccharomyces cerevisiae/genetics , Software
2.
Hum Mutat ; 41(10): 1811-1829, 2020 10.
Article in English | MEDLINE | ID: mdl-32741062

ABSTRACT

Discriminating which nucleotide variants cause disease or contribute to phenotypic traits remains a major challenge in human genetics. In theory, any intragenic variant can potentially affect RNA splicing by altering splicing regulatory elements (SREs). However, these alterations are often ignored mainly because pioneer SRE predictors have proved inefficient. Here, we report the first large-scale comparative evaluation of four user-friendly SRE-dedicated algorithms (QUEPASA, HEXplorer, SPANR, and HAL) tested both as standalone tools and in multiple combined ways based on two independent benchmark datasets adding up to >1,300 exonic variants studied at the messenger RNA level and mapping to 89 different disease-causing genes. These methods display good predictive power, based on decision thresholds derived from the receiver operating characteristics curve analyses, with QUEPASA and HAL having the best accuracies either as standalone or in combination. Still, overall there was a tight race between the four predictors, suggesting that all methods may be of use. Additionally, QUEPASA and HEXplorer may be beneficial as well for predicting variant-induced creation of pseudoexons deep within introns. Our study highlights the potential of SRE predictors as filtering tools for identifying disease-causing candidates among the plethora of variants detected by high-throughput DNA sequencing and provides guidance for their use in genomic medicine settings.


Subject(s)
RNA Splicing , Regulatory Sequences, Nucleic Acid , Alternative Splicing , Exons , Humans , Introns/genetics , RNA, Messenger/genetics , Regulatory Sequences, Nucleic Acid/genetics
3.
NAR Genom Bioinform ; 2(1): lqz015, 2020 Mar.
Article in English | MEDLINE | ID: mdl-33575566

ABSTRACT

The error rates of third-generation sequencing data have been capped >5%, mainly containing insertions and deletions. Thereby, an increasing number of diverse long reads correction methods have been proposed. The quality of the correction has huge impacts on downstream processes. Therefore, developing methods allowing to evaluate error correction tools with precise and reliable statistics is a crucial need. These evaluation methods rely on costly alignments to evaluate the quality of the corrected reads. Thus, key features must allow the fast comparison of different tools, and scale to the increasing length of the long reads. Our tool, ELECTOR, evaluates long reads correction and is directly compatible with a wide range of error correction tools. As it is based on multiple sequence alignment, we introduce a new algorithmic strategy for alignment segmentation, which enables us to scale to large instances using reasonable resources. To our knowledge, we provide the unique method that allows producing reproducible correction benchmarks on the latest ultra-long reads (>100 k bases). It is also faster than the current state-of-the-art on other datasets and provides a wider set of metrics to assess the read quality improvement after correction. ELECTOR is available on GitHub (https://github.com/kamimrcht/ELECTOR) and Bioconda.

4.
Front Neurosci ; 13: 948, 2019.
Article in English | MEDLINE | ID: mdl-31619945

ABSTRACT

Neuropeptides exert essential functions in animal physiology by controlling e.g., reproduction, development, growth, energy homeostasis, cardiovascular activity and stress response. Thus, identification of neuropeptides has been a very active field of research over the last decades. This review article presents the various methods used to discover novel bioactive peptides in vertebrates. Initially identified on the basis of their biological activity, some neuropeptides have also been discovered for their ability to bind/activate a specific receptor or based on their biochemical characteristics such as C-terminal amidation which concerns half of the known neuropeptides. More recently, sequencing of the genome of many representative species has facilitated peptidomic approaches using mass spectrometry and in silico screening of genomic libraries. Through these different approaches, more than a hundred of bioactive neuropeptides have already been identified in vertebrates. Nevertheless, researchers continue to find new neuropeptides or to identify novel functions of neuropeptides that had not been detected previously, as it was recently the case for nociceptin.

5.
Anthropol Anz ; 75(4): 325-338, 2018 Dec 11.
Article in English | MEDLINE | ID: mdl-30422147

ABSTRACT

The aim of this work was to analyse the diet of a Merovingian population sample of 80 individuals buried at Norroy-le-Veneur, France, with regard to their social status and chronology. A carbon and nitrogen stable isotope analysis of human adult bone collagen and related fauna from the same cemetery showed a diet based primarily on C3 plants, supplemented with animal protein in a range comparable to other contemporary sites. No significant contribution of C4 plants (e.g. millet) or marine-derived protein was detected. In terms of socio-economic stratification, individuals buried with rich grave good assemblages formed a narrow group with a significantly higher mean of δ13C than low-ranking individuals. We argue that this may represent a step in the gradual formation of the dietary exclusivity of Frankish elites, following a progressive rise in power of the Merovingian nobility. Also, during the timespan of the cemetery there was a population-wide decrease of 0.3 ‰ in the mean value of δ13C. The role of the Christian conversion of the population is questioned, but another factor influencing diet might have played a role.


Subject(s)
Cemeteries/history , Diet/history , Adult , Animals , Carbon Isotopes/analysis , Collagen/chemistry , France , History, Medieval , Humans , Nitrogen Isotopes/analysis
6.
Bioinformatics ; 34(24): 4213-4222, 2018 12 15.
Article in English | MEDLINE | ID: mdl-29955770

ABSTRACT

Motivation: The recent rise of long read sequencing technologies such as Pacific Biosciences and Oxford Nanopore allows to solve assembly problems for larger and more complex genomes than what allowed short reads technologies. However, these long reads are very noisy, reaching an error rate of around 10-15% for Pacific Biosciences, and up to 30% for Oxford Nanopore. The error correction problem has been tackled by either self-correcting the long reads, or using complementary short reads in a hybrid approach. However, even though sequencing technologies promise to lower the error rate of the long reads below 10%, it is still higher in practice, and correcting such noisy long reads remains an issue. Results: We present HG-CoLoR, a hybrid error correction method that focuses on a seed-and-extend approach based on the alignment of the short reads to the long reads, followed by the traversal of a variable-order de Bruijn graph, built from the short reads. Our experiments show that HG-CoLoR manages to efficiently correct highly noisy long reads that display an error rate as high as 44%. When compared to other state-of-the-art long read error correction methods, our experiments also show that HG-CoLoR provides the best trade-off between runtime and quality of the results, and is the only method able to efficiently scale to eukaryotic genomes. Availability and implementation: HG-CoLoR is implemented is C++, supported on Linux platforms and freely available at https://github.com/morispi/HG-CoLoR. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
High-Throughput Nucleotide Sequencing , Nanopores , Sequence Analysis, DNA , Software , Algorithms , Computational Biology
7.
BMC Bioinformatics ; 13 Suppl 14: S9, 2012.
Article in English | MEDLINE | ID: mdl-23095660

ABSTRACT

BACKGROUND: Whole exome sequencing (WES) has become the strategy of choice to identify a coding allelic variant for a rare human monogenic disorder. This approach is a revolution in medical genetics history, impacting both fundamental research, and diagnostic methods leading to personalized medicine. A plethora of efficient algorithms has been developed to ensure the variant discovery. They generally lead to ~20,000 variations that have to be narrow down to find the potential pathogenic allelic variant(s) and the affected gene(s). For this purpose, commonly adopted procedures which implicate various filtering strategies have emerged: exclusion of common variations, type of the allelics variants, pathogenicity effect prediction, modes of inheritance and multiple individuals for exome comparison. To deal with the expansion of WES in medical genomics individual laboratories, new convivial and versatile software tools have to implement these filtering steps. Non-programmer biologists have to be autonomous combining themselves different filtering criteria and conduct a personal strategy depending on their assumptions and study design. RESULTS: We describe EVA (Exome Variation Analyzer), a user-friendly web-interfaced software dedicated to the filtering strategies for medical WES. Thanks to different modules, EVA (i) integrates and stores annotated exome variation data as strictly confidential to the project owner, (ii) allows to combine the main filters dealing with common variations, molecular types, inheritance mode and multiple samples, (iii) offers the browsing of annotated data and filtered results in various interactive tables, graphical visualizations and statistical charts, (iv) and finally offers export files and cross-links to external useful databases and softwares for further prioritization of the small subset of sorted candidate variations and genes. We report a demonstrative case study that allowed to identify a new candidate gene related to a rare form of Alzheimer disease. CONCLUSIONS: EVA is developed to be a user-friendly, versatile, and efficient-filtering assisting software for WES. It constitutes a platform for data storage and for drastic screening of clinical relevant genetics variations by non-programmer geneticists. Thereby, it provides a response to new needs at the expanding era of medical genomics investigated by WES for both fundamental research and clinical diagnostics.


Subject(s)
Alzheimer Disease/genetics , Exome , Polymorphism, Single Nucleotide , Sequence Analysis, DNA/methods , Software , Algorithms , Databases, Genetic , Humans , Sequence Analysis, DNA/instrumentation
SELECTION OF CITATIONS
SEARCH DETAIL