Búsqueda | BVS Bolivia

Improvement of large copy number variant detection by whole genome nanopore sequencing.

Cuenca-Guardiola, Javier; de la Morena-Barrio, Belén; García, Juan L; Sanchis-Juan, Alba; Corral, Javier; Fernández-Breis, Jesualdo T.

J Adv Res ; 50: 145-158, 2023 08.

Artículo en Inglés | MEDLINE | ID: mdl-36323370

RESUMEN

INTRODUCTION: Whole-genome sequencing using nanopore technologies can uncover structural variants, which are DNA rearrangements larger than 50 base pairs. Nanopore technologies can also characterize their boundaries with single-base accuracy, owing to the kilobase-long reads that encompass either full variants or their junctions. Other methods, such as next-generation short read sequencing or PCR assays, are limited in their capabilities to detect or characterize structural variants. However, the existing software for nanopore sequencing data analysis still reports incomplete variant sets, which also contain erroneous calls, a considerable obstacle for the molecular diagnosis or accurate genotyping of populations. METHODS: We compared multiple factors affecting variant calling, such as reference genome version, aligner (minimap2, NGMLR, and lra) choice, and variant caller combinations (Sniffles, CuteSV, SVIM, and NanoVar), to find the optimal group of tools for calling large (>50 kb) deletions and duplications, using data from seven patients exhibiting gross gene defects on SERPINC1 and from a reference variant set as the control. The goal was to obtain the most complete, yet reasonably specific group of large variants using a single cell of PromethION sequencing, which yielded lower depth coverage than short-read sequencing. We also used a custom method for the statistical analysis of the coverage value to refine the resulting datasets. RESULTS: We found that for large deletions and duplications (>50 kb), the existing software performed worse than for smaller ones, in terms of both sensitivity and specificity, and newer tools had not improved this. Our novel software, disCoverage, could polish variant callers' results, improving specificity by up to 62% and sensitivity by 15%, the latter requiring other data or samples. CONCLUSION: We analyzed the current situation of >50-kb copy number variants with nanopore sequencing, which could be improved. The methods presented in this work could help to identify the known deletions and duplications in a set of patients, while also helping to filter out erroneous calls for these variants, which might aid the efforts to characterize a not-yet well-known fraction of genetic variability in the human genome.

Asunto(s)

Secuenciación de Nanoporos , Nanoporos , Humanos , Análisis de Secuencia de ADN/métodos , Variaciones en el Número de Copia de ADN/genética , Genoma Humano

An automated process for supporting decisions in clustering-based data analysis.

Bernabé-Díaz, José Antonio; Franco, Manuel; Vivo, Juana-María; Quesada-Martínez, Manuel; Fernández-Breis, Jesualdo T.

Comput Methods Programs Biomed ; 219: 106765, 2022 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-35367914

RESUMEN

BACKGROUND AND OBJECTIVE: Metrics are commonly used by biomedical researchers and practitioners to measure and evaluate properties of individuals, instruments, models, methods, or datasets. Due to the lack of a standardized validation procedure for a metric, it is assumed that if a metric is appropriate for analyzing a dataset in a certain domain, then it will be appropriate for other datasets in the same domain. However, such generalizability cannot be taken for granted, since the behavior of a metric can vary in different scenarios. The study of such behavior of a metric is the objective of this paper, since it would allow for assessing its reliability before drawing any conclusion about biomedical datasets. METHODS: We present a method to support in evaluating the behavior of quantitative metrics on datasets. Our approach assesses a metric by using clustering-based data analysis, and enhancing the decision-making process in the optimal classification. Our method assesses the metrics by applying two important criteria of the unsupervised classification validation that are calculated on the clusterings generated by the metric, namely stability and goodness of the clusters. The application of our method is facilitated to biomedical researchers by our evaluomeR tool. RESULTS: The analytical power of our methods is shown in the results of the application of our method to analyze (1) the behavior of the impact factor metric for a series of journal categories; (2) which structural metrics provide a better partitioning of the content of a repository of biomedical ontologies, and (3) the heterogeneity sources in effect size metrics of biomedical primary studies. CONCLUSIONS: The use of statistical properties such as stability and goodness of classifications allows for a useful analysis of the behavior of quantitative metrics, which can be used for supporting decisions about which metrics to apply on a certain dataset.

Asunto(s)

Ontologías Biológicas , Análisis de Datos , Benchmarking , Análisis por Conglomerados , Humanos , Reproducibilidad de los Resultados

The gene regulation knowledge commons: the action area of GREEKC.

Kuiper, Martin; Bonello, Joseph; Fernández-Breis, Jesualdo T; Bucher, Philipp; Futschik, Matthias E; Gaudet, Pascale; Kulakovskiy, Ivan V; Licata, Luana; Logie, Colin; Lovering, Ruth C; Makeev, Vsevolod J; Orchard, Sandra; Panni, Simona; Perfetto, Livia; Sant, David; Schulz, Stefan; Vercruysse, Steven; Zerbino, Daniel R; Lægreid, Astrid.

Biochim Biophys Acta Gene Regul Mech ; 1865(1): 194768, 2022 01.

Artículo en Inglés | MEDLINE | ID: mdl-34757206

RESUMEN

As computational modeling becomes more essential to analyze and understand biological regulatory mechanisms, governance of the many databases and knowledge bases that support this domain is crucial to guarantee reliability and interoperability of resources. To address this, the COST Action Gene Regulation Ensemble Effort for the Knowledge Commons (GREEKC, CA15205, www.greekc.org) organized nine workshops in a four-year period, starting September 2016. The workshops brought together a wide range of experts from all over the world working on various steps in the knowledge management process that focuses on understanding gene regulatory mechanisms. The discussions between ontologists, curators, text miners, biologists, bioinformaticians, philosophers and computational scientists spawned a host of activities aimed to standardize and update existing knowledge management workflows and involve end-users in the process of designing the Gene Regulation Knowledge Commons (GRKC). Here the GREEKC consortium describes its main achievements in improving this GRKC.

Asunto(s)

Regulación de la Expresión Génica , Reproducibilidad de los Resultados

BioHackathon 2015: Semantics of data for life sciences and reproducible research.

Vos, Rutger A; Katayama, Toshiaki; Mishima, Hiroyuki; Kawano, Shin; Kawashima, Shuichi; Kim, Jin-Dong; Moriya, Yuki; Tokimatsu, Toshiaki; Yamaguchi, Atsuko; Yamamoto, Yasunori; Wu, Hongyan; Amstutz, Peter; Antezana, Erick; Aoki, Nobuyuki P; Arakawa, Kazuharu; Bolleman, Jerven T; Bolton, Evan; Bonnal, Raoul J P; Bono, Hidemasa; Burger, Kees; Chiba, Hirokazu; Cohen, Kevin B; Deutsch, Eric W; Fernández-Breis, Jesualdo T; Fu, Gang; Fujisawa, Takatomo; Fukushima, Atsushi; García, Alexander; Goto, Naohisa; Groza, Tudor; Hercus, Colin; Hoehndorf, Robert; Itaya, Kotone; Juty, Nick; Kawashima, Takeshi; Kim, Jee-Hyub; Kinjo, Akira R; Kotera, Masaaki; Kozaki, Kouji; Kumagai, Sadahiro; Kushida, Tatsuya; Lütteke, Thomas; Matsubara, Masaaki; Miyamoto, Joe; Mohsen, Attayeb; Mori, Hiroshi; Naito, Yuki; Nakazato, Takeru; Nguyen-Xuan, Jeremy; Nishida, Kozo.

F1000Res ; 9: 136, 2020.

Artículo en Inglés | MEDLINE | ID: mdl-32308977

RESUMEN

We report on the activities of the 2015 edition of the BioHackathon, an annual event that brings together researchers and developers from around the world to develop tools and technologies that promote the reusability of biological data. We discuss issues surrounding the representation, publication, integration, mining and reuse of biological data and metadata across a wide range of biomedical data types of relevance for the life sciences, including chemistry, genotypes and phenotypes, orthology and phylogeny, proteomics, genomics, glycomics, and metabolomics. We describe our progress to address ongoing challenges to the reusability and reproducibility of research results, and identify outstanding issues that continue to impede the progress of bioinformatics research. We share our perspective on the state of the art, continued challenges, and goals for future research and development for the life sciences Semantic Web.

Asunto(s)

Disciplinas de las Ciencias Biológicas , Biología Computacional , Web Semántica , Minería de Datos , Metadatos , Reproducibilidad de los Resultados

LinkEHR-Ed: a multi-reference model archetype editor based on formal semantics.

Maldonado, José A; Moner, David; Boscá, Diego; Fernández-Breis, Jesualdo T; Angulo, Carlos; Robles, Montserrat.

Int J Med Inform ; 78(8): 559-70, 2009 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-19386540

RESUMEN

PURPOSE: To develop a powerful archetype editing framework capable of handling multiple reference models and oriented towards the semantic description and standardization of legacy data. METHODS: The main prerequisite for implementing tools providing enhanced support for archetypes is the clear specification of archetype semantics. We propose a formalization of the definition section of archetypes based on types over tree-structured data. It covers the specialization of archetypes, the relationship between reference models and archetypes and conformance of data instances to archetypes. RESULTS: LinkEHR-Ed, a visual archetype editor based on the former formalization with advanced processing capabilities that supports multiple reference models, the editing and semantic validation of archetypes, the specification of mappings to data sources, and the automatic generation of data transformation scripts, is developed. CONCLUSIONS: LinkEHR-Ed is a useful tool for building, processing and validating archetypes based on any reference model.

Asunto(s)

Sistemas de Registros Médicos Computarizados , Lenguajes de Programación , Simulación por Computador

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA