Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 74
Filtrar
1.
Bioinformatics ; 40(Suppl 1): i20-i29, 2024 06 28.
Artículo en Inglés | MEDLINE | ID: mdl-38940150

RESUMEN

MOTIVATION: We learn more effectively through experience and reflection than through passive reception of information. Bioinformatics offers an excellent opportunity for project-based learning. Molecular data are abundant and accessible in open repositories, and important concepts in biology can be rediscovered by reanalyzing the data. RESULTS: In the manuscript, we report on five hands-on assignments we designed for master's computer science students to train them in bioinformatics for genomics. These assignments are the cornerstones of our introductory bioinformatics course and are centered around the study of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). They assume no prior knowledge of molecular biology but do require programming skills. Through these assignments, students learn about genomes and genes, discover their composition and function, relate SARS-CoV-2 to other viruses, and learn about the body's response to infection. Student evaluation of the assignments confirms their usefulness and value, their appropriate mastery-level difficulty, and their interesting and motivating storyline. AVAILABILITY AND IMPLEMENTATION: The course materials are freely available on GitHub at https://github.com/IB-ULFRI.


Asunto(s)
COVID-19 , Biología Computacional , SARS-CoV-2 , Biología Computacional/métodos , SARS-CoV-2/genética , Humanos , COVID-19/virología , Genómica/métodos , Estudiantes
2.
Genome Res ; 31(8): 1498-1511, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-34183452

RESUMEN

Dictyostelium development begins with single-cell starvation and ends with multicellular fruiting bodies. Developmental morphogenesis is accompanied by sweeping transcriptional changes, encompassing nearly half of the 13,000 genes in the genome. We performed time-series RNA-sequencing analyses of the wild type and 20 mutants to explore the relationships between transcription and morphogenesis. These strains show developmental arrest at different stages, accelerated development, or atypical morphologies. Considering eight major morphological transitions, we identified 1371 milestone genes whose expression changes sharply between consecutive transitions. We also identified 1099 genes as members of 21 regulons, which are groups of genes that remain coordinately regulated despite the genetic, temporal, and developmental perturbations. The gene annotations in these groups validate known transitions and reveal new developmental events. For example, DNA replication genes are tightly coregulated with cell division genes, so they are expressed in mid-development although chromosomal DNA is not replicated. Our data set includes 486 transcriptional profiles that can help identify new relationships between transcription and development and improve gene annotations. We show its utility by showing that cycles of aggregation and disaggregation in allorecognition-defective mutants involve dedifferentiation. We also show sensitivity to genetic and developmental conditions in two commonly used actin genes, act6 and act15, and robustness of the coaA gene. Finally, we propose that gpdA is a better mRNA quantitation standard because it is less sensitive to external conditions than commonly used standards. The data set is available for democratized exploration through the web application dictyExpress and the data mining environment Orange.


Asunto(s)
Dictyostelium , Dictyostelium/genética , Morfogénesis , ARN Mensajero/metabolismo , Regulón , Programas Informáticos
3.
Clin Microbiol Infect ; 27(7): 1039.e1-1039.e7, 2021 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-33838303

RESUMEN

OBJECTIVES: Seroprevalence surveys provide crucial information on cumulative severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) exposure. This Slovenian nationwide population study is the first longitudinal 6-month serosurvey using probability-based samples across all age categories. METHODS: Each participant supplied two blood samples: 1316 samples in April 2020 (first round) and 1211 in October/November 2020 (second round). The first-round sera were tested using Euroimmun Anti-SARS-CoV-2 ELISA IgG (ELISA) and, because of uncertain estimates, were retested using Elecsys Anti-SARS-CoV-2 (Elecsys-N) and Elecsys Anti-SARS-CoV-2 S (Elecsys-S). The second-round sera were concomitantly tested using Elecsys-N/Elecsys-S. RESULTS: The populations of both rounds matched the overall population (n = 3000), with minor settlement type and age differences. The first-round seroprevalence corrected for the ELISA manufacturer's specificity was 2.78% (95% highest density interval [HDI] 1.81%-3.80%), corrected using pooled ELISA specificity calculated from published data 0.93% (95% CI 0.00%-2.65%), and based on Elecsys-N/Elecsys-S results 0.87% (95% HDI 0.40%-1.38%). The second-round unadjusted lower limit of seroprevalence on 11 November 2020 was 4.06% (95% HDI 2.97%-5.16%) and on 3 October 2020, unadjusted upper limit was 4.29% (95% HDI 3.18%-5.47%). CONCLUSIONS: SARS-CoV-2 seroprevalence in Slovenia increased four-fold from late April to October/November 2020, mainly due to a devastating second wave. Significant logistic/methodological challenges accompanied both rounds. The main lessons learned were a need for caution when relying on manufacturer-generated assay evaluation data, the importance of multiple manufacturer-independent assay performance assessments, the need for concomitant use of highly-specific serological assays targeting different SARS-CoV-2 proteins in serosurveys conducted in low-prevalence settings or during epidemic exponential growth and the usefulness of a Bayesian approach for overcoming complex methodological challenges.


Asunto(s)
Prueba Serológica para COVID-19/estadística & datos numéricos , COVID-19/epidemiología , COVID-19/inmunología , Adolescente , Adulto , Distribución por Edad , Anciano , Anciano de 80 o más Años , Anticuerpos Antivirales/sangre , Teorema de Bayes , Niño , Preescolar , Ensayo de Inmunoadsorción Enzimática , Femenino , Humanos , Inmunoglobulina G/sangre , Lactante , Recién Nacido , Masculino , Persona de Mediana Edad , Pandemias , Vigilancia de la Población , Prevalencia , Sensibilidad y Especificidad , Estudios Seroepidemiológicos , Distribución por Sexo , Eslovenia/epidemiología , Adulto Joven
4.
PLoS Comput Biol ; 17(3): e1008671, 2021 03.
Artículo en Inglés | MEDLINE | ID: mdl-33661899

RESUMEN

Overfitting is one of the critical problems in developing models by machine learning. With machine learning becoming an essential technology in computational biology, we must include training about overfitting in all courses that introduce this technology to students and practitioners. We here propose a hands-on training for overfitting that is suitable for introductory level courses and can be carried out on its own or embedded within any data science course. We use workflow-based design of machine learning pipelines, experimentation-based teaching, and hands-on approach that focuses on concepts rather than underlying mathematics. We here detail the data analysis workflows we use in training and motivate them from the viewpoint of teaching goals. Our proposed approach relies on Orange, an open-source data science toolbox that combines data visualization and machine learning, and that is tailored for education in machine learning and explorative data analysis.


Asunto(s)
Biología Computacional , Ciencia de los Datos , Aprendizaje Automático , Modelos Estadísticos , Biología Computacional/educación , Biología Computacional/métodos , Ciencia de los Datos/educación , Ciencia de los Datos/métodos , Humanos , Modelos Biológicos , Programas Informáticos
5.
Nat Commun ; 10(1): 4551, 2019 10 07.
Artículo en Inglés | MEDLINE | ID: mdl-31591416

RESUMEN

Analysis of biomedical images requires computational expertize that are uncommon among biomedical scientists. Deep learning approaches for image analysis provide an opportunity to develop user-friendly tools for exploratory data analysis. Here, we use the visual programming toolbox Orange ( http://orange.biolab.si ) to simplify image analysis by integrating deep-learning embedding, machine learning procedures, and data visualization. Orange supports the construction of data analysis workflows by assembling components for data preprocessing, visualization, and modeling. We equipped Orange with components that use pre-trained deep convolutional networks to profile images with vectors of features. These vectors are used in image clustering and classification in a framework that enables mining of image sets for both novel and experienced users. We demonstrate the utility of the tool in image analysis of progenitor cells in mouse bone healing, identification of developmental competence in mouse oocytes, subcellular protein localization in yeast, and developmental morphology of social amoebae.


Asunto(s)
Biología Computacional/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Aprendizaje Automático , Redes Neurales de la Computación , Animales , Dictyostelium/citología , Dictyostelium/crecimiento & desarrollo , Dictyostelium/metabolismo , Proteínas Fluorescentes Verdes/genética , Proteínas Fluorescentes Verdes/metabolismo , Internet , Estadios del Ciclo de Vida , Ratones Transgénicos , Oocitos/metabolismo , Reproducibilidad de los Resultados , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo
6.
Bioinformatics ; 35(14): i4-i12, 2019 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-31510695

RESUMEN

MOTIVATION: Single-cell RNA sequencing allows us to simultaneously profile the transcriptomes of thousands of cells and to indulge in exploring cell diversity, development and discovery of new molecular mechanisms. Analysis of scRNA data involves a combination of non-trivial steps from statistics, data visualization, bioinformatics and machine learning. Training molecular biologists in single-cell data analysis and empowering them to review and analyze their data can be challenging, both because of the complexity of the methods and the steep learning curve. RESULTS: We propose a workshop-style training in single-cell data analytics that relies on an explorative data analysis toolbox and a hands-on teaching style. The training relies on scOrange, a newly developed extension of a data mining framework that features workflow design through visual programming and interactive visualizations. Workshops with scOrange can proceed much faster than similar training methods that rely on computer programming and analysis through scripting in R or Python, allowing the trainer to cover more ground in the same time-frame. We here review the design principles of the scOrange toolbox that support such workshops and propose a syllabus for the course. We also provide examples of data analysis workflows that instructors can use during the training. AVAILABILITY AND IMPLEMENTATION: scOrange is an open-source software. The software, documentation and an emerging set of educational videos are available at http://singlecell.biolab.si.


Asunto(s)
Biología Computacional , Ciencia de los Datos , Programas Informáticos , Análisis de Secuencia de ARN , Flujo de Trabajo
7.
PLoS One ; 14(6): e0217994, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31185054

RESUMEN

Non-negative matrix tri-factorization (NMTF) is a popular technique for learning low-dimensional feature representation of relational data. Currently, NMTF learns a representation of a dataset through an optimization procedure that typically uses multiplicative update rules. This procedure has had limited success, and its failure cases have not been well understood. We here perform an empirical study involving six large datasets comparing multiplicative update rules with three alternative optimization methods, including alternating least squares, projected gradients, and coordinate descent. We find that methods based on projected gradients and coordinate descent converge up to twenty-four times faster than multiplicative update rules. Furthermore, alternating least squares method can quickly train NMTF models on sparse datasets but often fails on dense datasets. Coordinate descent-based NMTF converges up to sixteen times faster compared to well-established methods.


Asunto(s)
Algoritmos , Modelos Teóricos , Bases de Datos Factuales
8.
BMC Med ; 16(1): 150, 2018 08 27.
Artículo en Inglés | MEDLINE | ID: mdl-30145981

RESUMEN

BACKGROUND: Personalized, precision, P4, or stratified medicine is understood as a medical approach in which patients are stratified based on their disease subtype, risk, prognosis, or treatment response using specialized diagnostic tests. The key idea is to base medical decisions on individual patient characteristics, including molecular and behavioral biomarkers, rather than on population averages. Personalized medicine is deeply connected to and dependent on data science, specifically machine learning (often named Artificial Intelligence in the mainstream media). While during recent years there has been a lot of enthusiasm about the potential of 'big data' and machine learning-based solutions, there exist only few examples that impact current clinical practice. The lack of impact on clinical practice can largely be attributed to insufficient performance of predictive models, difficulties to interpret complex model predictions, and lack of validation via prospective clinical trials that demonstrate a clear benefit compared to the standard of care. In this paper, we review the potential of state-of-the-art data science approaches for personalized medicine, discuss open challenges, and highlight directions that may help to overcome them in the future. CONCLUSIONS: There is a need for an interdisciplinary effort, including data scientists, physicians, patient advocates, regulatory agencies, and health insurance organizations. Partially unrealistic expectations and concerns about data science-based solutions need to be better managed. In parallel, computational methods must advance more to provide direct benefit to clinical practice.


Asunto(s)
Medicina de Precisión/métodos , Humanos , Estudios Prospectivos
9.
Genome Announc ; 6(2)2018 Jan 11.
Artículo en Inglés | MEDLINE | ID: mdl-29326223

RESUMEN

Verticillium nonalfalfae, a soilborne vascular phytopathogenic fungus, causes wilt disease in several crop species. Of great concern are outbreaks of highly aggressive V. nonalfalfae strains, which cause a devastating wilt disease in European hops. We report here the genome sequence and annotation of V. nonalfalfae strain T2, providing genomic information that will allow better understanding of the molecular mechanisms underlying the development of highly aggressive strains.

10.
Nat Commun ; 8(1): 1541, 2017 11 16.
Artículo en Inglés | MEDLINE | ID: mdl-29142246

RESUMEN

The NUDIX enzymes are involved in cellular metabolism and homeostasis, as well as mRNA processing. Although highly conserved throughout all organisms, their biological roles and biochemical redundancies remain largely unclear. To address this, we globally resolve their individual properties and inter-relationships. We purify 18 of the human NUDIX proteins and screen 52 substrates, providing a substrate redundancy map. Using crystal structures, we generate sequence alignment analyses revealing four major structural classes. To a certain extent, their substrate preference redundancies correlate with structural classes, thus linking structure and activity relationships. To elucidate interdependence among the NUDIX hydrolases, we pairwise deplete them generating an epistatic interaction map, evaluate cell cycle perturbations upon knockdown in normal and cancer cells, and analyse their protein and mRNA expression in normal and cancer tissues. Using a novel FUSION algorithm, we integrate all data creating a comprehensive NUDIX enzyme profile map, which will prove fundamental to understanding their biological functionality.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes , Familia de Multigenes , Pirofosfatasas/genética , Células A549 , Línea Celular , Línea Celular Tumoral , Regulación Enzimológica de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Humanos , Células MCF-7 , Filogenia , Pirofosfatasas/clasificación , Pirofosfatasas/metabolismo , Interferencia de ARN , Especificidad por Sustrato , Hidrolasas Nudix
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...