Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
BMC Bioinformatics ; 25(1): 170, 2024 Apr 30.
Artículo en Inglés | MEDLINE | ID: mdl-38689247

RESUMEN

BACKGROUND: Deep neural networks (DNNs) have the potential to revolutionize our understanding and treatment of genetic diseases. An inherent limitation of deep neural networks, however, is their high demand for data during training. To overcome this challenge, other fields, such as computer vision, use various data augmentation techniques to artificially increase the available training data for DNNs. Unfortunately, most data augmentation techniques used in other domains do not transfer well to genomic data. RESULTS: Most genomic data possesses peculiar properties and data augmentations may significantly alter the intrinsic properties of the data. In this work, we propose a novel data augmentation technique for genomic data inspired by biology: point mutations. By employing point mutations as substitutes for codons, we demonstrate that our newly proposed data augmentation technique enhances the performance of DNNs across various genomic tasks that involve coding regions, such as translation initiation and splice site detection. CONCLUSION: Silent and missense mutations are found to positively influence effectiveness, while nonsense mutations and random mutations in non-coding regions generally lead to degradation. Overall, point mutation-based augmentations in genomic datasets present valuable opportunities for improving the accuracy and reliability of predictive models for DNA sequences.


Asunto(s)
Aprendizaje Profundo , Genómica , Mutación Puntual , Genómica/métodos , Humanos , Reproducibilidad de los Resultados , Redes Neurales de la Computación
2.
Sci Data ; 10(1): 716, 2023 10 18.
Artículo en Inglés | MEDLINE | ID: mdl-37853038

RESUMEN

Trypanosomiasis, a neglected tropical disease (NTD), challenges communities in sub-Saharan Africa and Latin America. The World Health Organization underscores the need for practical, field-adaptable diagnostics and rapid screening tools to address the negative impact of NTDs. While artificial intelligence has shown promising results in disease screening, the lack of curated datasets impedes progress. In response to this challenge, we developed the Tryp dataset, comprising microscopy images of unstained thick blood smears containing the Trypanosoma brucei brucei parasite. The Tryp dataset provides bounding box annotations for tightly enclosed regions containing the parasite for 3,085 positive images, and 93 images collected from negative blood samples. The Tryp dataset represents the largest of its kind. Furthermore, we provide a benchmark on three leading deep learning-based object detection techniques that demonstrate the feasibility of AI for this task. Overall, the availability of the Tryp dataset is expected to facilitate research advancements in diagnostic screening for this disease, which may lead to improved healthcare outcomes for the communities impacted.


Asunto(s)
Trypanosoma brucei brucei , Trypanosoma , Tripanosomiasis Africana , Animales , Humanos , Inteligencia Artificial , Microscopía , Enfermedades Desatendidas , Tripanosomiasis Africana/diagnóstico , Tripanosomiasis Africana/parasitología
3.
Bioinformatics ; 39(6)2023 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-37225409

RESUMEN

MOTIVATION: The primary regulatory step for protein synthesis is translation initiation, which makes it one of the fundamental steps in the central dogma of molecular biology. In recent years, a number of approaches relying on deep neural networks (DNNs) have demonstrated superb results for predicting translation initiation sites. These state-of-the art results indicate that DNNs are indeed capable of learning complex features that are relevant to the process of translation. Unfortunately, most of those research efforts that employ DNNs only provide shallow insights into the decision-making processes of the trained models and lack highly sought-after novel biologically relevant observations. RESULTS: By improving upon the state-of-the-art DNNs and large-scale human genomic datasets in the area of translation initiation, we propose an innovative computational methodology to get neural networks to explain what was learned from data. Our methodology, which relies on in silico point mutations, reveals that DNNs trained for translation initiation site detection correctly identify well-established biological signals relevant to translation, including (i) the importance of the Kozak sequence, (ii) the damaging consequences of ATG mutations in the 5'-untranslated region, (iii) the detrimental effect of premature stop codons in the coding region, and (iv) the relative insignificance of cytosine mutations for translation. Furthermore, we delve deeper into the Beta-globin gene and investigate various mutations that lead to the Beta thalassemia disorder. Finally, we conclude our work by laying out a number of novel observations regarding mutations and translation initiation. AVAILABILITY AND IMPLEMENTATION: For data, models, and code, visit github.com/utkuozbulak/mutate-and-observe.


Asunto(s)
Redes Neurales de la Computación , Humanos , Mutación
4.
BMC Bioinformatics ; 24(1): 167, 2023 Apr 25.
Artículo en Inglés | MEDLINE | ID: mdl-37098485

RESUMEN

BACKGROUND: CRISPR-Cas-Docker is a web server for in silico docking experiments with CRISPR RNAs (crRNAs) and Cas proteins. This web server aims at providing experimentalists with the optimal crRNA-Cas pair predicted computationally when prokaryotic genomes have multiple CRISPR arrays and Cas systems, as frequently observed in metagenomic data. RESULTS: CRISPR-Cas-Docker provides two methods to predict the optimal Cas protein given a particular crRNA sequence: a structure-based method (in silico docking) and a sequence-based method (machine learning classification). For the structure-based method, users can either provide experimentally determined 3D structures of these macromolecules or use an integrated pipeline to generate 3D-predicted structures for in silico docking experiments. CONCLUSION: CRISPR-Cas-Docker addresses the need of the CRISPR-Cas community to predict RNA-protein interactions in silico by optimizing multiple stages of computation and evaluation, specifically for CRISPR-Cas systems. CRISPR-Cas-Docker is available at www.crisprcasdocker.org as a web server, and at https://github.com/hshimlab/CRISPR-Cas-Docker as an open-source tool.


Asunto(s)
Sistemas CRISPR-Cas , ARN , ARN/genética , Internet
6.
Biol Direct ; 17(1): 27, 2022 10 07.
Artículo en Inglés | MEDLINE | ID: mdl-36207756

RESUMEN

RNA-protein interactions are crucial for diverse biological processes. In prokaryotes, RNA-protein interactions enable adaptive immunity through CRISPR-Cas systems. These defence systems utilize CRISPR RNA (crRNA) templates acquired from past infections to destroy foreign genetic elements through crRNA-mediated nuclease activities of Cas proteins. Thanks to the programmability and specificity of CRISPR-Cas systems, CRISPR-based antimicrobials have the potential to be repurposed as new types of antibiotics. Unlike traditional antibiotics, these CRISPR-based antimicrobials can be designed to target specific bacteria and minimize detrimental effects on the human microbiome during antibacterial therapy. In this study, we explore the potential of CRISPR-based antimicrobials by optimizing the RNA-protein interactions of crRNAs and Cas13 proteins. CRISPR-Cas13 systems are unique as they degrade specific foreign RNAs using the crRNA template, which leads to non-specific RNase activities and cell cycle arrest. We show that a high proportion of the Cas13 systems have no colocalized CRISPR arrays, and the lack of direct association between crRNAs and Cas proteins may result in suboptimal RNA-protein interactions in the current tools. Here, we investigate the RNA-protein interactions of the Cas13-based systems by curating the validation dataset of Cas13 protein and CRISPR repeat pairs that are experimentally validated to interact, and the candidate dataset of CRISPR repeats that reside on the same genome as the currently known Cas13 proteins. To find optimal CRISPR-Cas13 interactions, we first validate the 3-D structure prediction of crRNAs based on their experimental structures. Next, we test a number of RNA-protein interaction programs to optimize the in silico docking of crRNAs with the Cas13 proteins. From this optimized pipeline, we find a number of candidate crRNAs that have comparable or better in silico docking with the Cas13 proteins of the current tools. This study fully automatizes the in silico optimization of RNA-protein interactions as an efficient preliminary step for designing effective CRISPR-Cas13-based antimicrobials.


Asunto(s)
Sistemas CRISPR-Cas , ARN Bacteriano , Antibacterianos , Bacterias/genética , Humanos , Ribonucleasas/genética , Ribonucleasas/metabolismo
7.
PLoS One ; 17(6): e0269449, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35704628

RESUMEN

Environmental monitoring of microplastics (MP) contamination has become an area of great research interest, given potential hazards associated with human ingestion of MP. In this context, determination of MP concentration is essential. However, cheap, rapid, and accurate quantification of MP remains a challenge to this date. This study proposes a deep learning-based image segmentation method that properly distinguishes fluorescent MP from other elements in a given microscopy image. A total of nine different deep learning models, six of which are based on U-Net, were investigated. These models were trained using at least 20,000 patches sampled from 99 fluorescence microscopy images of MP and their corresponding binary masks. MP-Net, which is derived from U-Net, was found to be the best performing model, exhibiting the highest mean F1-score (0.736) and mean IoU value (0.617). Test-time augmentation (using brightness, contrast, and HSV) was applied to MP-Net for robust learning. However, compared to the results obtained without augmentation, no clear improvement in predictive performance could be observed. Recovery assessment for both spiked and real images showed that, compared to already existing tools for MP quantification, the MP quantities predicted by MP-Net are those closest to the ground truth. This observation suggests that MP-Net allows creating masks that more accurately reflect the quantitative presence of fluorescent MP in microscopy images. Finally, MAP (Microplastics Annotation Package) is introduced, an integrated software environment for automated MP quantification, offering support for MP-Net, already existing MP analysis tools like MP-VAT, manual annotation, and model fine-tuning.


Asunto(s)
Bivalvos , Aprendizaje Profundo , Animales , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Microplásticos , Microscopía Fluorescente , Plásticos
8.
Pharmaceuticals (Basel) ; 15(3)2022 Mar 04.
Artículo en Inglés | MEDLINE | ID: mdl-35337108

RESUMEN

Protein therapeutics play an important role in controlling the functions and activities of disease-causing proteins in modern medicine. Despite protein therapeutics having several advantages over traditional small-molecule therapeutics, further development has been hindered by drug complexity and delivery issues. However, recent progress in deep learning-based protein structure prediction approaches, such as AlphaFold2, opens new opportunities to exploit the complexity of these macro-biomolecules for highly specialised design to inhibit, regulate or even manipulate specific disease-causing proteins. Anti-CRISPR proteins are small proteins from bacteriophages that counter-defend against the prokaryotic adaptive immunity of CRISPR-Cas systems. They are unique examples of natural protein therapeutics that have been optimized by the host-parasite evolutionary arms race to inhibit a wide variety of host proteins. Here, we show that these anti-CRISPR proteins display diverse inhibition mechanisms through accurate structural prediction and functional analysis. We find that these phage-derived proteins are extremely distinct in structure, some of which have no homologues in the current protein structure domain. Furthermore, we find a novel family of anti-CRISPR proteins which are structurally similar to the recently discovered mechanism of manipulating host proteins through enzymatic activity, rather than through direct inference. Using highly accurate structure prediction, we present a wide variety of protein-manipulating strategies of anti-CRISPR proteins for future protein drug design.

9.
Nat Commun ; 12(1): 6414, 2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34741024

RESUMEN

While transcriptome- and proteome-wide technologies to assess processes in protein biogenesis are now widely available, we still lack global approaches to assay post-ribosomal biogenesis events, in particular those occurring in the eukaryotic secretory system. We here develop a method, SECRiFY, to simultaneously assess the secretability of >105 protein fragments by two yeast species, S. cerevisiae and P. pastoris, using custom fragment libraries, surface display and a sequencing-based readout. Screening human proteome fragments with a median size of 50-100 amino acids, we generate datasets that enable datamining into protein features underlying secretability, revealing a striking role for intrinsic disorder and chain flexibility. The SECRiFY methodology generates sufficient amounts of annotated data for advanced machine learning methods to deduce secretability patterns. The finding that secretability is indeed a learnable feature of protein sequences provides a solid base for application-focused studies.


Asunto(s)
Saccharomyces cerevisiae/metabolismo , Humanos , Proteoma/genética , Proteoma/fisiología , Transcriptoma/genética , Transcriptoma/fisiología
10.
Bioinformatics ; 36(21): 5159-5168, 2021 01 29.
Artículo en Inglés | MEDLINE | ID: mdl-32692832

RESUMEN

MOTIVATION: Genetically engineering food crops involves introducing proteins from other species into crop plant species or modifying already existing proteins with gene editing techniques. In addition, newly synthesized proteins can be used as therapeutic protein drugs against diseases. For both research and safety regulation purposes, being able to assess the potential toxicity of newly introduced/synthesized proteins is of high importance. RESULTS: In this study, we present ToxDL, a deep learning-based approach for in silico prediction of protein toxicity from sequence alone. ToxDL consists of (i) a module encompassing a convolutional neural network that has been designed to handle variable-length input sequences, (ii) a domain2vec module for generating protein domain embeddings and (iii) an output module that classifies proteins as toxic or non-toxic, using the outputs of the two aforementioned modules. Independent test results obtained for animal proteins and cross-species transferability results obtained for bacteria proteins indicate that ToxDL outperforms traditional homology-based approaches and state-of-the-art machine-learning techniques. Furthermore, through visualizations based on saliency maps, we are able to verify that the proposed network learns known toxic motifs. Moreover, the saliency maps allow for directed in silico modification of a sequence, thus making it possible to alter its predicted protein toxicity. AVAILABILITY AND IMPLEMENTATION: ToxDL is freely available at http://www.csbio.sjtu.edu.cn/bioinf/ToxDL/. The source code can be found at https://github.com/xypan1232/ToxDL. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Profundo , Aprendizaje Automático , Redes Neurales de la Computación , Proteínas/genética , Programas Informáticos
11.
Phys Rev Lett ; 124(9): 097201, 2020 Mar 06.
Artículo en Inglés | MEDLINE | ID: mdl-32202867

RESUMEN

Although artificial neural networks have recently been proven to provide a promising new framework for constructing quantum many-body wave functions, the parametrization of a quantum wave function with non-abelian symmetries in terms of a Boltzmann machine inherently leads to biased results due to the basis dependence. We demonstrate that this problem can be overcome by sampling in the basis of irreducible representations instead of spins, for which the corresponding ansatz respects the non-abelian symmetries of the system. We apply our methodology to find the ground states of the one-dimensional antiferromagnetic Heisenberg (AFH) model with spin-1/2 and spin-1 degrees of freedom, and obtain a substantially higher accuracy than when using the s_{z} basis as an input to the neural network. The proposed ansatz can target excited states, which is illustrated by calculating the energy gap of the AFH model. We also generalize the framework to the case of anyonic spin chains.

12.
ACS Appl Mater Interfaces ; 11(31): 27997-28004, 2019 Aug 07.
Artículo en Inglés | MEDLINE | ID: mdl-31302998

RESUMEN

Electrochromic devices, serving as smart glasses, have not yet been intelligent enough to regulate lighting conditions independent of external photosensing devices. On the other hand, their bulky sandwich structures have been suffering setbacks utilized for reflective displays in an effort to compete with mature emissive displays. The key to resolve both problems lies in incorporating the photosensing function into electrochromic devices while simplifying their configuration via replacing ionic electrolytes. However, so far it has not yet been achieved because of the essential operating difference between the optoelectronic devices and the ionic devices. Herein, a concept of a smarter and thinner device: "electrochromic photodetector" is proposed to solve such problems. It is all-solid-state and electrolyte-free and operates with a simple thin metal-semiconductor-metal structure via an electrolytic mechanism. As a proof of concept, a configuration of the electrochromic photodetector is presented in this work based on a tungsten trioxide (WO3) thin film deposited on Au electrodes via facile, low-cost solution processes. The electrochromic photodetector switches between its photosensing and electrochromic functions via voltage modulation within 5 V, which is the result of the semiconductor-metal transition. The transition mechanism is further analyzed to be the voltage-triggered reversible oxygen/water vapor adsorption/intercalation from ambient air.

13.
Bioinformatics ; 34(24): 4180-4188, 2018 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-29931149

RESUMEN

Motivation: During the last decade, improvements in high-throughput sequencing have generated a wealth of genomic data. Functionally interpreting these sequences and finding the biological signals that are hallmarks of gene function and regulation is currently mostly done using automated genome annotation platforms, which mainly rely on integrated machine learning frameworks to identify different functional sites of interest, including splice sites. Splicing is an essential step in the gene regulation process, and the correct identification of splice sites is a major cornerstone in a genome annotation system. Results: In this paper, we present SpliceRover, a predictive deep learning approach that outperforms the state-of-the-art in splice site prediction. SpliceRover uses convolutional neural networks (CNNs), which have been shown to obtain cutting edge performance on a wide variety of prediction tasks. We adapted this approach to deal with genomic sequence inputs, and show it consistently outperforms already existing approaches, with relative improvements in prediction effectiveness of up to 80.9% when measured in terms of false discovery rate. However, a major criticism of CNNs concerns their 'black box' nature, as mechanisms to obtain insight into their reasoning processes are limited. To facilitate interpretability of the SpliceRover models, we introduce an approach to visualize the biologically relevant information learnt. We show that our visualization approach is able to recover features known to be important for splice site prediction (binding motifs around the splice site, presence of polypyrimidine tracts and branch points), as well as reveal new features (e.g. several types of exclusion patterns near splice sites). Availability and implementation: SpliceRover is available as a web service. The prediction tool and instructions can be found at http://bioit2.irc.ugent.be/splicerover/. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Automático , Redes Neurales de la Computación , Empalme del ARN , Biología Computacional , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos
14.
Bioinformatics ; 34(3): 425-433, 2018 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-29028894

RESUMEN

Motivation: The past decade has seen the introduction of new technologies that significantly lowered the cost of genome sequencing. As a result, the amount of genomic data that must be stored and transmitted is increasing exponentially. To mitigate storage and transmission issues, we introduce a framework for lossless compression of quality scores. Results: This article proposes AQUa, an adaptive framework for lossless compression of quality scores. To compress these quality scores, AQUa makes use of a configurable set of coding tools, extended with a Context-Adaptive Binary Arithmetic Coding scheme. When benchmarking AQUa against generic single-pass compressors, file sizes are reduced by up to 38.49% when comparing with GNU Gzip and by up to 6.48% when comparing with 7-Zip at the Ultra Setting, while still providing support for random access. When comparing AQUa with the purpose-built, single-pass, and state-of-the-art compressor SCALCE, which does not support random access, file sizes are reduced by up to 21.14%. When comparing AQUa with the purpose-built, dual-pass, and state-of-the-art compressor QVZ, which does not support random access, file sizes are larger by 6.42-33.47%. However, for one test file, the file size is 0.38% smaller, illustrating the strength of our single-pass compression framework. This work has been spurred by the current activity on genomic information representation (MPEG-G) within the ISO/IEC SC29/WG11 technical committee. Availability and implementation: The software is available on Github: https://github.com/tparidae/AQUa. Contact: tom.paridaens@ugent.be.


Asunto(s)
Compresión de Datos/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Metadatos , Programas Informáticos , Algoritmos , Escherichia coli/genética , Genómica/métodos , Humanos , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ARN/métodos
15.
Bioinformatics ; 33(10): 1464-1472, 2017 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-28057687

RESUMEN

MOTIVATION: The past decade has seen the introduction of new technologies that lowered the cost of genomic sequencing increasingly. We can even observe that the cost of sequencing is dropping significantly faster than the cost of storage and transmission. The latter motivates a need for continuous improvements in the area of genomic data compression, not only at the level of effectiveness (compression rate), but also at the level of functionality (e.g. random access), configurability (effectiveness versus complexity, coding tool set …) and versatility (support for both sequenced reads and assembled sequences). In that regard, we can point out that current approaches mostly do not support random access, requiring full files to be transmitted, and that current approaches are restricted to either read or sequence compression. RESULTS: We propose AFRESh, an adaptive framework for no-reference compression of genomic data with random access functionality, targeting the effective representation of the raw genomic symbol streams of both reads and assembled sequences. AFRESh makes use of a configurable set of prediction and encoding tools, extended by a Context-Adaptive Binary Arithmetic Coding scheme (CABAC), to compress raw genetic codes. To the best of our knowledge, our paper is the first to describe an effective implementation CABAC outside of its' original application. By applying CABAC, the compression effectiveness improves by up to 19% for assembled sequences and up to 62% for reads. By applying AFRESh to the genomic symbols of the MPEG genomic compression test set for reads, a compression gain is achieved of up to 51% compared to SCALCE, 42% compared to LFQC and 44% compared to ORCOM. When comparing to generic compression approaches, a compression gain is achieved of up to 41% compared to GNU Gzip and 22% compared to 7-Zip at the Ultra setting. Additionaly, when compressing assembled sequences of the Human Genome, a compression gain is achieved up to 34% compared to GNU Gzip and 16% compared to 7-Zip at the Ultra setting. AVAILABILITY AND IMPLEMENTATION: A Windows executable version can be downloaded at https://github.com/tparidae/AFresh . CONTACT: tom.paridaens@ugent.be.


Asunto(s)
Compresión de Datos/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Algoritmos , Bacterias/genética , Genoma , Genómica/métodos , Humanos , Plantas/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...