Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
Mais filtros

Bases de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 22(1): 120, 2021 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-33711922

RESUMO

BACKGROUND: Recently, copy number variations (CNV) impacting genes involved in oncogenic pathways have attracted an increasing attention to manage disease susceptibility. CNV is one of the most important somatic aberrations in the genome of tumor cells. Oncogene activation and tumor suppressor gene inactivation are often attributed to copy number gain/amplification or deletion, respectively, in many cancer types and stages. Recent advances in next generation sequencing protocols allow for the addition of unique molecular identifiers (UMI) to each read. Each targeted DNA fragment is labeled with a unique random nucleotide sequence added to sequencing primers. UMI are especially useful for CNV detection by making each DNA molecule in a population of reads distinct. RESULTS: Here, we present molecular Copy Number Alteration (mCNA), a new methodology allowing the detection of copy number changes using UMI. The algorithm is composed of four main steps: the construction of UMI count matrices, the use of control samples to construct a pseudo-reference, the computation of log-ratios, the segmentation and finally the statistical inference of abnormal segmented breaks. We demonstrate the success of mCNA on a dataset of patients suffering from Diffuse Large B-cell Lymphoma and we highlight that mCNA results have a strong correlation with comparative genomic hybridization. CONCLUSION: We provide mCNA, a new approach for CNV detection, freely available at https://gitlab.com/pierrejulien.viailly/mcna/ under MIT license. mCNA can significantly improve detection accuracy of CNV changes by using UMI.


Assuntos
Hibridização Genômica Comparativa , Variações do Número de Cópias de DNA , Sequenciamento de Nucleotídeos em Larga Escala , Adulto , Humanos , Masculino , Pessoa de Meia-Idade , Estudos Prospectivos , Análise de Sequência de DNA
2.
Bioinformatics ; 36(9): 2718-2724, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-31985795

RESUMO

MOTIVATION: Next-generation sequencing has become the go-to standard method for the detection of single-nucleotide variants in tumor cells. The use of such technologies requires a PCR amplification step and a sequencing step, steps in which artifacts are introduced at very low frequencies. These artifacts are often confused with true low-frequency variants that can be found in tumor cells and cell-free DNA. The recent use of unique molecular identifiers (UMI) in targeted sequencing protocols has offered a trustworthy approach to filter out artefactual variants and accurately call low-frequency variants. However, the integration of UMI analysis in the variant calling process led to developing tools that are significantly slower and more memory consuming than raw-reads-based variant callers. RESULTS: We present UMI-VarCal, a UMI-based variant caller for targeted sequencing data with better sensitivity compared to other variant callers. Being developed with performance in mind, UMI-VarCal stands out from the crowd by being one of the few variant callers that do not rely on SAMtools to do their pileup. Instead, at its core runs an innovative homemade pileup algorithm specifically designed to treat the UMI tags in the reads. After the pileup, a Poisson statistical test is applied at every position to determine if the frequency of the variant is significantly higher than the background error noise. Finally, an analysis of UMI tags is performed, a strand bias and a homopolymer length filter are applied to achieve better accuracy. We illustrate the results obtained using UMI-VarCal through the sequencing of tumor samples and we show how UMI-VarCal is both faster and more sensitive than other publicly available solutions. AVAILABILITY AND IMPLEMENTATION: The entire pipeline is available at https://gitlab.com/vincent-sater/umi-varcal-master under MIT license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Reação em Cadeia da Polimerase
3.
Bioinformatics ; 34(24): 4213-4222, 2018 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-29955770

RESUMO

Motivation: The recent rise of long read sequencing technologies such as Pacific Biosciences and Oxford Nanopore allows to solve assembly problems for larger and more complex genomes than what allowed short reads technologies. However, these long reads are very noisy, reaching an error rate of around 10-15% for Pacific Biosciences, and up to 30% for Oxford Nanopore. The error correction problem has been tackled by either self-correcting the long reads, or using complementary short reads in a hybrid approach. However, even though sequencing technologies promise to lower the error rate of the long reads below 10%, it is still higher in practice, and correcting such noisy long reads remains an issue. Results: We present HG-CoLoR, a hybrid error correction method that focuses on a seed-and-extend approach based on the alignment of the short reads to the long reads, followed by the traversal of a variable-order de Bruijn graph, built from the short reads. Our experiments show that HG-CoLoR manages to efficiently correct highly noisy long reads that display an error rate as high as 44%. When compared to other state-of-the-art long read error correction methods, our experiments also show that HG-CoLoR provides the best trade-off between runtime and quality of the results, and is the only method able to efficiently scale to eukaryotic genomes. Availability and implementation: HG-CoLoR is implemented is C++, supported on Linux platforms and freely available at https://github.com/morispi/HG-CoLoR. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Nanoporos , Análise de Sequência de DNA , Software , Algoritmos , Biologia Computacional
4.
Genes Chromosomes Cancer ; 55(3): 251-67, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26608593

RESUMO

Despite the many efforts already spent to enumerate somatic mutations in diffuse large B-cell lymphoma (DLBCL), previous whole-genome and whole-exome studies conducted on patients of mixed outcomes failed at characterizing the 30% of patients who will relapse or resist current immunochemotherapies. To address this issue, we performed whole-exome sequencing of normal/tumoral DNA pairs in 14 relapsed/refractory (R/R) patients subclassified by full-transcriptome arrays (six activated B-cell like, three germinal center B-cell like, and five primary mediastinal B-cell lymphomas), from the LNH-03 LYSA clinical trial program. Aside from well-known DLBCL features, gene and pathway level recurrence analyses proposed several interesting leads including TBL1XR1 and activating mutations in IRF4 or in the insulin regulation pathway. Sequencing-based copy number analysis defined 23 short recurrently altered regions involving genes such as REL, CDKN2A, HYAL2, and TP53. Moreover, it highlighted mutations in genes such as GNA13, CARD11, MFHAS1, and PCLO as associated with secondary variant allele amplification events. The five primary mediastinal B-cell lymphomas (PMBL), while unexpected in a R/R cohort, showed a significantly higher mutation rate (P = 0.003) and provided many insights on this classical Hodgkin lymphoma related subtype. Novel genes such as XPO1, MFHAS1, and ITPKB were found particularly mutated, along with various cytokine-based signaling pathways. Among these analyses, somatic events in the NF-κB pathway were found preponderant in the three DLBCL subtypes, confirming its major implication in DLBCL aggressiveness and pinpointing several new candidate genes.


Assuntos
Exoma , Linfoma Difuso de Grandes Células B/genética , Mutação , Recidiva Local de Neoplasia/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , DNA de Neoplasias/genética , Feminino , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Fatores Reguladores de Interferon/metabolismo , Linfoma Difuso de Grandes Células B/metabolismo , Masculino , Pessoa de Meia-Idade , NF-kappa B/metabolismo , Transdução de Sinais
5.
Bioinformatics ; 30(15): 2204-5, 2014 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-24753490

RESUMO

SUMMARY: Thanks to its free licensing and the development of initiatives like Bioconductor, R has become an essential part of the bioinformatics toolbox in the past years and is more and more confronted with genomically located data. While separate solutions are available to manipulate and visualize such data, no R package currently offers the efficiency required for computationally intensive tasks such as interactive genome browsing. The package proposed here fulfills this specific need, providing a multilevel interface suitable for most needs, from a completely interfaced genome browser to low-level classes and methods. Its time and memory efficiency have been challenged in a human dataset, where it outperformed existing solutions by several orders of magnitude. AVAILABILITY AND IMPLEMENTATION: R sources and packages are freely available at the CRAN repository and dedicated Web site: http://bioinformatics.ovsa.fr/Rgb. Distributed under the GPL 3 license, compatible with most operating systems (Windows, Linux, Mac OS) and architectures. CONTACT: maressyl@gmail.com or fabrice.jardin@chb.unicancer.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica/métodos , Navegador , Algoritmos , Genoma Humano/genética , Humanos , Interface Usuário-Computador
6.
BMC Bioinformatics ; 13 Suppl 14: S11, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23095521

RESUMO

BACKGROUND: The Internet is a major source of health information but most seekers are not familiar with medical vocabularies. Hence, their searches fail due to bad query formulation. Several methods have been proposed to improve information retrieval: query expansion, syntactic and semantic techniques or knowledge-based methods. However, it would be useful to clean those queries which are misspelled. In this paper, we propose a simple yet efficient method in order to correct misspellings of queries submitted by health information seekers to a medical online search tool. METHODS: In addition to query normalizations and exact phonetic term matching, we tested two approximate string comparators: the similarity score function of Stoilos and the normalized Levenshtein edit distance. We propose here to combine them to increase the number of matched medical terms in French. We first took a sample of query logs to determine the thresholds and processing times. In the second run, at a greater scale we tested different combinations of query normalizations before or after misspelling correction with the retained thresholds in the first run. RESULTS: According to the total number of suggestions (around 163, the number of the first sample of queries), at a threshold comparator score of 0.3, the normalized Levenshtein edit distance gave the highest F-Measure (88.15%) and at a threshold comparator score of 0.7, the Stoilos function gave the highest F-Measure (84.31%). By combining Levenshtein and Stoilos, the highest F-Measure (80.28%) is obtained with 0.2 and 0.7 thresholds respectively. However, queries are composed by several terms that may be combination of medical terms. The process of query normalization and segmentation is thus required. The highest F-Measure (64.18%) is obtained when this process is realized before spelling-correction. CONCLUSIONS: Despite the widely known high performance of the normalized edit distance of Levenshtein, we show in this paper that its combination with the Stoilos algorithm improved the results for misspelling correction of user queries. Accuracy is improved by combining spelling, phoneme-based information and string normalizations and segmentations into medical terms. These encouraging results have enabled the integration of this method into two projects funded by the French National Research Agency-Technologies for Health Care. The first aims to facilitate the coding process of clinical free texts contained in Electronic Health Records and discharge summaries, whereas the second aims at improving information retrieval through Electronic Health Records.


Assuntos
Algoritmos , Armazenamento e Recuperação da Informação , Informática Médica/métodos , Humanos , Internet , Idioma , Informática Médica/instrumentação , Vocabulário Controlado
7.
BMC Bioinformatics ; 13 Suppl 14: S9, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23095660

RESUMO

BACKGROUND: Whole exome sequencing (WES) has become the strategy of choice to identify a coding allelic variant for a rare human monogenic disorder. This approach is a revolution in medical genetics history, impacting both fundamental research, and diagnostic methods leading to personalized medicine. A plethora of efficient algorithms has been developed to ensure the variant discovery. They generally lead to ~20,000 variations that have to be narrow down to find the potential pathogenic allelic variant(s) and the affected gene(s). For this purpose, commonly adopted procedures which implicate various filtering strategies have emerged: exclusion of common variations, type of the allelics variants, pathogenicity effect prediction, modes of inheritance and multiple individuals for exome comparison. To deal with the expansion of WES in medical genomics individual laboratories, new convivial and versatile software tools have to implement these filtering steps. Non-programmer biologists have to be autonomous combining themselves different filtering criteria and conduct a personal strategy depending on their assumptions and study design. RESULTS: We describe EVA (Exome Variation Analyzer), a user-friendly web-interfaced software dedicated to the filtering strategies for medical WES. Thanks to different modules, EVA (i) integrates and stores annotated exome variation data as strictly confidential to the project owner, (ii) allows to combine the main filters dealing with common variations, molecular types, inheritance mode and multiple samples, (iii) offers the browsing of annotated data and filtered results in various interactive tables, graphical visualizations and statistical charts, (iv) and finally offers export files and cross-links to external useful databases and softwares for further prioritization of the small subset of sorted candidate variations and genes. We report a demonstrative case study that allowed to identify a new candidate gene related to a rare form of Alzheimer disease. CONCLUSIONS: EVA is developed to be a user-friendly, versatile, and efficient-filtering assisting software for WES. It constitutes a platform for data storage and for drastic screening of clinical relevant genetics variations by non-programmer geneticists. Thereby, it provides a response to new needs at the expanding era of medical genomics investigated by WES for both fundamental research and clinical diagnostics.


Assuntos
Doença de Alzheimer/genética , Exoma , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos , Software , Algoritmos , Bases de Dados Genéticas , Humanos , Análise de Sequência de DNA/instrumentação
8.
Methods Mol Biol ; 2493: 235-245, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35751818

RESUMO

The rapid transition from traditional sequencing methods to Next-Generation Sequencing (NGS) has allowed for a faster and more accurate detection of somatic variants (Single-Nucleotide Variant (SNV) and Copy Number Variation (CNV)) in tumor cells. NGS technologies require a succession of steps during which false variants can be silently added at low frequencies. Filtering these artifacts can be a rather difficult task especially when the experiments are designed to look for very low frequency variants. Recently, adding unique molecular barcodes called UMI (Unique Molecular Identifier) to the DNA fragments appears to be a very effective strategy to specifically filter out false variants from the variant calling results (Kukita et al. DNA Res 22(4):269-277, 2015; Newman et al. Nat Biotechnol 34(5):547-555, 2016; Schmitt et al. Proc Natl Acad Sci U S A 109(36):14508-14513). Here, we describe UMI-VarCal (Sater et al. Bioinformatics 36:2718-2724, 2020), which can use the UMI information from UMI-tagged reads to offer a faster and more accurate variant calling analysis.


Assuntos
Variações do Número de Cópias de DNA , Sequenciamento de Nucleotídeos em Larga Escala , Artefatos , Biologia Computacional , DNA/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos
9.
BMC Bioinformatics ; 12: 242, 2011 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-21682852

RESUMO

BACKGROUND: High Throughput Sequencing (HTS) is now heavily exploited for genome (re-) sequencing, metagenomics, epigenomics, and transcriptomics and requires different, but computer intensive bioinformatic analyses. When a reference genome is available, mapping reads on it is the first step of this analysis. Read mapping programs owe their efficiency to the use of involved genome indexing data structures, like the Burrows-Wheeler transform. Recent solutions index both the genome, and the k-mers of the reads using hash-tables to further increase efficiency and accuracy. In various contexts (e.g. assembly or transcriptome analysis), read processing requires to determine the sub-collection of reads that are related to a given sequence, which is done by searching for some k-mers in the reads. Currently, many developments have focused on genome indexing structures for read mapping, but the question of read indexing remains broadly unexplored. However, the increase in sequence throughput urges for new algorithmic solutions to query large read collections efficiently. RESULTS: Here, we present a solution, named Gk arrays, to index large collections of reads, an algorithm to build the structure, and procedures to query it. Once constructed, the index structure is kept in main memory and is repeatedly accessed to answer queries like "given a k-mer, get the reads containing this k-mer (once/at least once)". We compared our structure to other solutions that adapt uncompressed indexing structures designed for long texts and show that it processes queries fast, while requiring much less memory. Our structure can thus handle larger read collections. We provide examples where such queries are adapted to different types of read analysis (SNP detection, assembly, RNA-Seq). CONCLUSIONS: Gk arrays constitute a versatile data structure that enables fast and more accurate read analysis in various contexts. The Gk arrays provide a flexible brick to design innovative programs that mine efficiently genomics, epigenomics, metagenomics, or transcriptomics reads. The Gk arrays library is available under Cecill (GPL compliant) license from http://www.atgc-montpellier.fr/ngs/.


Assuntos
Algoritmos , Biologia Computacional/métodos , Computadores , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Software
10.
Comput Struct Biotechnol J ; 19: 5811-5825, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34765096

RESUMO

MicroRNAs (miRNAs) are small noncoding RNAs that regulate gene expression at the posttranscriptional level. Because of their wide network of interactions, miRNAs have become the focus of many studies over the past decade, particularly in animal species. To streamline the number of potential wet lab experiments, the use of miRNA target prediction tools is currently the first step undertaken. However, the predictions made may vary considerably depending on the tool used, which is mostly due to the complex and still not fully understood mechanism of action of miRNAs. The discrepancies complicate the choice of the tool for miRNA target prediction. To provide a comprehensive view of this issue, we highlight in this review the main characteristics of miRNA-target interactions in bilaterian animals, describe the prediction models currently used, and provide some insights for the evaluation of predictor performance.

11.
Sci Rep ; 11(1): 761, 2021 01 12.
Artigo em Inglês | MEDLINE | ID: mdl-33436980

RESUMO

Third-generation sequencing technologies allow to sequence long reads of tens of kbp, that are expected to solve various problems. However, they display high error rates, currently capped around 10%. Self-correction is thus regularly used in long reads analysis projects. We introduce CONSENT, a new self-correction method that relies both on multiple sequence alignment and local de Bruijn graphs. To ensure scalability, multiple sequence alignment computation benefits from a new and efficient segmentation strategy, allowing a massive speedup. CONSENT compares well to the state-of-the-art, and performs better on real Oxford Nanopore data. Specifically, CONSENT is the only method that efficiently scales to ultra-long reads, and allows to process a full human dataset, containing reads reaching up to 1.5 Mbp, in 10 days. Moreover, our experiments show that error correction with CONSENT improves the quality of Flye assemblies. Additionally, CONSENT implements a polishing feature, allowing to correct raw assemblies. Our experiments show that CONSENT is 2-38x times faster than other polishing tools, while providing comparable results. Furthermore, we show that, on a human dataset, assembling the raw data and polishing the assembly is less resource consuming than correcting and then assembling the reads, while providing better results. CONSENT is available at https://github.com/morispi/CONSENT .


Assuntos
Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Animais , Caenorhabditis elegans/genética , Escherichia coli/genética , Genoma , Humanos , Nanoporos , Saccharomyces cerevisiae/genética , Software
12.
Stud Health Technol Inform ; 160(Pt 2): 1040-4, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20841842

RESUMO

BACKGROUND: CCAM is a French terminology for coding clinical procedures. CCAM is a multi-hierarchical structured classification for procedures used in France for reimbursement in health care, which is external to UMLS. OBJECTIVE: The objective of this work is to describe a French lexical approach allowing mapping CCAM procedures to the UMLS Metathesaurus to achieve interoperability to multiple international terminologies. This approach used a preliminary step intended to take only the significant characters used to code CCAM corresponding to anatomical and actions axes. RESULTS: According to the 7,926 CCAM codes used in this study, 5,212 possible matches (exact matching, single to multiple matching, partial matching) are found using the French CCAM to UMLS based mapping, 65% of the corresponding anatomical terms in the CCAM code are mapped to at least one UMLS Concept and 37% of the corresponding action terms in the CCAM code are mapped to at least one UMLS Concept. For all the exact matches found (n=200), 91% were rated by a human expert as narrower than the mapped UMLS Concepts, while only 3% were irrelevant.


Assuntos
Codificação Clínica , Terminologia como Assunto , Unified Medical Language System , Indexação e Redação de Resumos , Atenção à Saúde/normas , França , Humanos , Unified Medical Language System/normas
13.
NAR Genom Bioinform ; 2(1): lqz015, 2020 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-33575566

RESUMO

The error rates of third-generation sequencing data have been capped >5%, mainly containing insertions and deletions. Thereby, an increasing number of diverse long reads correction methods have been proposed. The quality of the correction has huge impacts on downstream processes. Therefore, developing methods allowing to evaluate error correction tools with precise and reliable statistics is a crucial need. These evaluation methods rely on costly alignments to evaluate the quality of the corrected reads. Thus, key features must allow the fast comparison of different tools, and scale to the increasing length of the long reads. Our tool, ELECTOR, evaluates long reads correction and is directly compatible with a wide range of error correction tools. As it is based on multiple sequence alignment, we introduce a new algorithmic strategy for alignment segmentation, which enables us to scale to large instances using reasonable resources. To our knowledge, we provide the unique method that allows producing reproducible correction benchmarks on the latest ultra-long reads (>100 k bases). It is also faster than the current state-of-the-art on other datasets and provides a wider set of metrics to assess the read quality improvement after correction. ELECTOR is available on GitHub (https://github.com/kamimrcht/ELECTOR) and Bioconda.

14.
Comput Struct Biotechnol J ; 18: 2270-2280, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32952940

RESUMO

MOTIVATION: With Next Generation Sequencing becoming more affordable every year, NGS technologies asserted themselves as the fastest and most reliable way to detect Single Nucleotide Variants (SNV) and Copy Number Variations (CNV) in cancer patients. These technologies can be used to sequence DNA at very high depths thus allowing to detect abnormalities in tumor cells with very low frequencies. Multiple variant callers are publicly available and are usually efficient at calling out variants. However, when frequencies begin to drop under 1%, the specificity of these tools suffers greatly as true variants at very low frequencies can be easily confused with sequencing or PCR artifacts. The recent use of Unique Molecular Identifiers (UMI) in NGS experiments has offered a way to accurately separate true variants from artifacts. UMI-based variant callers are slowly replacing raw-read based variant callers as the standard method for an accurate detection of variants at very low frequencies. However, benchmarking done in the tools publication are usually realized on real biological data in which real variants are not known, making it difficult to assess their accuracy. RESULTS: We present UMI-Gen, a UMI-based read simulator for targeted sequencing paired-end data. UMI-Gen generates reference reads covering the targeted regions at a user customizable depth. After that, using a number of control files, it estimates the background error rate at each position and then modifies the generated reads to mimic real biological data. Finally, it will insert real variants in the reads from a list provided by the user. AVAILABILITY: The entire pipeline is available at https://gitlab.com/vincent-sater/umigen under MIT license.

15.
Stud Health Technol Inform ; 150: 233-7, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19745303

RESUMO

This paper proposes a methodology to achieve the automatic inheritance of SNOMED CT relations applied to MeSH preferred terms using UMLS as knowledge source server. We propose an interoperability wildcard to achieve this objective. A quantitative and a qualitative analysis were performed on top four SNOMED CT relations inherited between MeSH preferred terms. A total of 12,030 couples of MeSH preferred terms are in relation via at least one SNOMED CT relationship. For the top-four relations inherited between MeSH preferred terms, overall 79.25% of them are relevant, 16.25% as intermediate and 4.5% as irrelevant, as judged by a medical librarian. This work should lead to an optimization of multi-terminology indexing tools, multi-terminology information retrieval and navigation among a multi-terminology server.


Assuntos
Medical Subject Headings , Systematized Nomenclature of Medicine , Terminologia como Assunto , Algoritmos
16.
Front Genet ; 10: 1330, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-32047509

RESUMO

microRNAs are noncoding RNAs which downregulate a large number of target mRNAs and modulate cell activity. Despite continued progress, bioinformatics prediction of microRNA targets remains a challenge since available software still suffer from a lack of accuracy and sensitivity. Moreover, these tools show fairly inconsistent results from one another. Thus, in an attempt to circumvent these difficulties, we aggregated all human results of four important prediction algorithms (miRanda, PITA, SVmicrO, and TargetScan) showing additional characteristics in order to rerank them into a single list. Instead of deciding which prediction tool to use, our method clearly helps biologists getting the best microRNA target predictions from all aggregated databases. The resulting database is freely available through a webtool called miRabel which can take either a list of miRNAs, genes, or signaling pathways as search inputs. Receiver operating characteristic curves and precision-recall curves analysis carried out using experimentally validated data and very large data sets show that miRabel significantly improves the prediction of miRNA targets compared to the four algorithms used separately. Moreover, using the same analytical methods, miRabel shows significantly better predictions than other popular algorithms such as MBSTAR, miRWalk, ExprTarget and miRMap. Interestingly, an F-score analysis revealed that miRabel also significantly improves the relevance of the top results. The aggregation of results from different databases is therefore a powerful and generalizable approach to many other species to improve miRNA target predictions. Thus, miRabel is an efficient tool to guide biologists in their search for miRNA targets and integrate them into a biological context.

17.
Front Neurosci ; 13: 948, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31619945

RESUMO

Neuropeptides exert essential functions in animal physiology by controlling e.g., reproduction, development, growth, energy homeostasis, cardiovascular activity and stress response. Thus, identification of neuropeptides has been a very active field of research over the last decades. This review article presents the various methods used to discover novel bioactive peptides in vertebrates. Initially identified on the basis of their biological activity, some neuropeptides have also been discovered for their ability to bind/activate a specific receptor or based on their biochemical characteristics such as C-terminal amidation which concerns half of the known neuropeptides. More recently, sequencing of the genome of many representative species has facilitated peptidomic approaches using mass spectrometry and in silico screening of genomic libraries. Through these different approaches, more than a hundred of bioactive neuropeptides have already been identified in vertebrates. Nevertheless, researchers continue to find new neuropeptides or to identify novel functions of neuropeptides that had not been detected previously, as it was recently the case for nociceptin.

18.
Stud Health Technol Inform ; 136: 235-40, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18487737

RESUMO

OBJECTIVE: The neighbors of a document are those documents in a corpus that are most similar to it. The objective of this paper is to develop and evaluate the related resources algorithm (CISMeF-RRA) in the context of a quality-controlled health gateway on the Internet CISMeF. METHOD: CISMeF-RRA is inspired by the PubMed Related Citations Articles. CISMeF-RRA combines statistical distances with a semantic distance using MeSH terms/qualifiers. MATERIAL: In this feasibility study an evaluation was performed using 50 CISMeF resources randomly chosen. RESULTS: Overall, 49% of the related documents were ranked as relevant. CONCLUSION: if this feasibility study is confirmed by another evaluation of more resources, CISMeF-RRA will be implemented in the CISMeF catalog.


Assuntos
Indexação e Redação de Resumos , Bases de Dados Bibliográficas , Armazenamento e Recuperação da Informação , Internet , Controle de Qualidade , Algoritmos , Estudos de Viabilidade , Humanos , MEDLINE , Medical Subject Headings , PubMed , Semântica , Vocabulário Controlado
19.
Sci Rep ; 8(1): 14340, 2018 09 25.
Artigo em Inglês | MEDLINE | ID: mdl-30254372

RESUMO

Phaeodactylum tricornutum is the most studied diatom encountered principally in coastal unstable environments. It has been hypothesized that the great adaptability of P. tricornutum is probably due to its pleomorphism. Indeed, P. tricornutum is an atypical diatom since it can display three morphotypes: fusiform, triradiate and oval. Currently, little information is available regarding the physiological significance of this morphogenesis. In this study, we adapted P. tricornutum Pt3 strain to obtain algal culture particularly enriched in one dominant morphotype: fusiform, triradiate or oval. These cultures were used to run high-throughput RNA-Sequencing. The whole mRNA transcriptome of each morphotype was determined. Pairwise comparisons highlighted biological processes and molecular functions which are up- and down-regulated. Finally, intersection analysis allowed us to identify the specific features from the oval morphotype which is of particular interest as it is often described to be more resistant to stresses. This study represent the first transcriptome wide characterization of the three morphotypes from P. tricornutum performed on cultures specifically enriched issued from the same Pt3 strain. This work represents an important step for the understanding of the morphogenesis in P. tricornutum and highlights the particular features of the oval morphotype.


Assuntos
Diatomáceas/genética , Fenótipo , Análise de Sequência de RNA , Diatomáceas/fisiologia , Perfilação da Expressão Gênica , Estresse Fisiológico
20.
Int J Data Min Bioinform ; 13(3): 266-88, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26547980

RESUMO

In the last decade, biology and medicine have undergone a fundamental change: next generation sequencing (NGS) technologies have enabled to obtain genomic sequences very quickly and at small costs compared to the traditional Sanger method. These NGS technologies have thus permitted to collect genomic sequences (genes, exomes or even full genomes) of individuals of the same species. These latter sequences are identical to more than 99%. There is thus a strong need for efficient algorithms for indexing and performing fast pattern matching in such specific sets of sequences. In this paper we propose a very efficient algorithm that solves the exact pattern matching problem in a set of highly similar DNA sequences where only the pattern can be pre-processed. This new algorithm extends variants of the Boyer-Moore exact string matching algorithm. Experimental results show that it exhibits the best performances in practice.


Assuntos
Algoritmos , Sequência Conservada/genética , DNA/genética , Reconhecimento Automatizado de Padrão/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Sequência de Bases , Aprendizado de Máquina , Dados de Sequência Molecular , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA