Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 165
Filtrar
1.
Genes (Basel) ; 12(11)2021 11 18.
Artigo em Inglês | MEDLINE | ID: mdl-34828413

RESUMO

Inherited bleeding disorders (IBDs) are the most frequent congenital diseases in the Colombian population; three of them are hemophilia A (HA), hemophilia B (HB), and von Willebrand Disease (VWD). Currently, diagnosis relies on multiple clinical laboratory assays to assign a phenotype. Due to the lack of accessibility to these tests, patients can receive an incomplete diagnosis. In these cases, genetic studies reinforce the clinical diagnosis. The present study characterized the molecular genetic basis of 11 HA, three HB, and five VWD patients by sequencing the F8, F9, or the VWF gene. Twelve variations were found in HA patients, four in HB patients, and 19 in WVD patients. From these variations a total of 25 novel variations were found. Disease-causing variations were used as positive controls for validation of the high-resolution melting (HRM) variant-scanning technique. This approach is a low-cost genetic diagnostic method proposed to be incorporated in developing countries. For the data analysis, we developed an accessible open-source code in Python that improves HRM data analysis with better sensitivity of 95% and without bias when using different HRM equipment and software. Analysis of amplicons with a length greater than 300 bp can be performed by implementing an analysis by denaturation domains.


Assuntos
Transtornos Herdados da Coagulação Sanguínea/diagnóstico , Biologia Computacional/métodos , Fator IX/genética , Testes Genéticos/métodos , Hemofilia A/genética , Fator de von Willebrand/genética , Transtornos Herdados da Coagulação Sanguínea/genética , Colômbia , Biologia Computacional/economia , Biologia Computacional/normas , Custos e Análise de Custo , Fator IX/química , Testes Genéticos/economia , Testes Genéticos/normas , Hemofilia A/diagnóstico , Humanos , Domínios Proteicos , Sensibilidade e Especificidade , Fator de von Willebrand/química
2.
Nat Genet ; 53(7): 1104-1111, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-34083788

RESUMO

Inexpensive genotyping methods are essential to modern genomics. Here we present QUILT, which performs diploid genotype imputation using low-coverage whole-genome sequence data. QUILT employs Gibbs sampling to partition reads into maternal and paternal sets, facilitating rapid haploid imputation using large reference panels. We show this partitioning to be accurate over many megabases, enabling highly accurate imputation close to theoretical limits and outperforming existing methods. Moreover, QUILT can impute accurately using diverse technologies, including long reads from Oxford Nanopore Technologies, and a new form of low-cost barcoded Illumina sequencing called haplotagging, with the latter showing improved accuracy at low coverages. Relative to DNA genotyping microarrays, QUILT offers improved accuracy at reduced cost, particularly for diverse populations that are traditionally underserved in modern genomic analyses, with accuracy nearly doubling at rare SNPs. Finally, QUILT can accurately impute (four-digit) human leukocyte antigen types, the first such method from low-coverage sequence data.


Assuntos
Biologia Computacional/métodos , Genótipo , Técnicas de Genotipagem , Sequenciamento Completo do Genoma , Biologia Computacional/economia , Diploide , Humanos , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes , Análise de Sequência de DNA
3.
J Comput Biol ; 28(5): 485-500, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-34014778

RESUMO

Gene expression profiling makes it possible to conduct many biological studies in a variety of fields due to its thorough characterization of cellular states under various experimental conditions. Despite recent advances in high-throughput technology, profiling an entire set of genomes is still difficult and expensive. Due to the high correlation between expression patterns of different genes, the aforementioned problem can be solved with a cost-effective approach that collects only a small subset of genes, called landmark genes, representing the entire set of genes, and infer the remaining genes, called target genes, using a computational model. There are several shallow and deep regression models in literature to estimate the expressions of target genes from the landmark genes. However, the shallow mostly have limited capacity in learning the nonlinear and complex gene expression data and are prone to underfitting, and the deep models generally do not take advantage of correlation among target genes in the learning process and suffer from overfitting. Considering the gene expression inference as a multitask learning problem, we propose a new deep multitask learning algorithm to tackle these issues. Our learning framework automatically learns the correlation between target genes and uses this knowledge to improve its generalization. Specifically, we utilize a subnetwork with low-dimensional latent variables to discover the relationships between target genes and enforce a seamless and easy to implement regularization to our deep regression model. Unlike the existing multitask learning methods that can only deal with dozens or hundreds of tasks, our algorithm is able to efficiently learn the relationships between ∼10,000 target genes and, thus, is scalable to a large number of tasks. Our proposed method outperforms the shallow and deep regression models for gene expression inference and alternative multitask learning algorithms on two large-scale datasets regardless of the network architecture.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes , Algoritmos , Biologia Computacional/economia , Perfilação da Expressão Gênica/economia , Regulação da Expressão Gênica , Aprendizado de Máquina , Modelos Genéticos
4.
N Biotechnol ; 60: 113-123, 2021 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-33045418

RESUMO

In the area of human-made innovations to improve the quality of life, biocatalysis has already had a great impact and contributed enormously to a growing number of catalytic transformations aimed at the detection and analysis of compounds, the bioconversion of starting materials and the preparation of target compounds at any scale, from laboratory small scale to industrial large scale. The key enabling tools which have been developed in biocatalysis over the last decades also provide great opportunities for further development and numerous applications in various sectors of the global bioeconomy. Systems biocatalysis is a modular, bottom-up approach to designing the architecture of enzyme-catalyzed reaction steps in a synthetic route from starting materials to target molecules. The integration of biocatalysis and sustainable chemistry in vitro aims at ideal conversions with high molecular economy and their intensification. Retrosynthetic analysis in the chemical and biological domain has been a valuable tool for target-oriented synthesis while, on the other hand, diversity-oriented synthesis builds on forward-looking analysis. Bioinformatic tools for rapid identification of the required enzyme functions, efficient enzyme production systems, as well as generalized bioprocess design tools, are important for rapid prototyping of the biocatalytic reactions. The tools for enzyme engineering and the reaction engineering of each enzyme-catalyzed one-step reaction are also valuable for coupling reactions. The tools to overcome interaction issues with other components or enzymes are of great interest in designing multi-step reactions as well as in biocatalytic total synthesis.


Assuntos
Biotecnologia , Biologia Computacional , Enzimas/metabolismo , Biocatálise , Biotecnologia/economia , Biologia Computacional/economia , Enzimas/economia , Humanos , Engenharia de Proteínas , Qualidade de Vida
5.
RNA ; 26(11): 1731-1742, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32759389

RESUMO

The measurement of RNA abundance derived from massively parallel sequencing experiments is an essential technique. Methods that reduce ribosomal RNA levels are usually required prior to sequencing library construction because ribosomal RNA typically comprises the vast majority of a total RNA sample. For some experiments, ribosomal RNA depletion is favored over poly(A) selection because it offers a more inclusive representation of the transcriptome. However, methods to deplete ribosomal RNA are generally proprietary, complex, inefficient, applicable to only specific species, or compatible with only a narrow range of RNA input levels. Here, we describe Ribo-Pop (ribosomal RNA depletion for popular use), a simple workflow and antisense oligo design strategy that we demonstrate works over a wide input range and can be easily adapted to any organism with a sequenced genome. We provide a computational pipeline for probe selection, a streamlined 20-min protocol, and ready-to-use oligo sequences for several organisms. We anticipate that our simple and generalizable "open source" design strategy would enable virtually any laboratory to pursue full transcriptome sequencing in their organism of interest with minimal time and resource investment.


Assuntos
Biologia Computacional/métodos , Oligorribonucleotídeos Antissenso/genética , RNA Ribossômico/análise , Sequência de Bases , Biologia Computacional/economia , Análise Custo-Benefício , Sequenciamento de Nucleotídeos em Larga Escala , Sondas de Oligonucleotídeos/genética , RNA Ribossômico/antagonistas & inibidores , Análise de Sequência de RNA/métodos , Fluxo de Trabalho
6.
N Biotechnol ; 59: 88-96, 2020 Nov 25.
Artigo em Inglês | MEDLINE | ID: mdl-32750680

RESUMO

The transition to a sustainable bio-based circular economy requires cutting edge technologies that ensure economic growth with environmentally responsible action. This transition will only be feasible when the opportunities of digitalization are also exploited. Digital methods and big data handling have already found their way into life sciences and generally offer huge potential in various research areas. While computational analyses of microbial metagenome data have become state of the art, the true potential of bioinformatics remains mostly untapped so far. In this article we present challenges and opportunities of digitalization including multi-omics approaches in discovering and exploiting the microbial diversity of the planet with the aim to identify robust biocatalysts for application in sustainable bioprocesses as part of the transition from a fossil-based to a bio-based circular economy. This will contribute to solving global challenges, including utilization of natural resources, food supply, health, energy and the environment.


Assuntos
Biotecnologia/economia , Biologia Computacional/economia , Desenvolvimento Econômico , Enzimas/economia , Metagenômica/economia , Aprendizado Profundo , Enzimas/metabolismo
7.
Nat Methods ; 17(8): 793-798, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32719530

RESUMO

Massively parallel single-cell and single-nucleus RNA sequencing has opened the way to systematic tissue atlases in health and disease, but as the scale of data generation is growing, so is the need for computational pipelines for scaled analysis. Here we developed Cumulus-a cloud-based framework for analyzing large-scale single-cell and single-nucleus RNA sequencing datasets. Cumulus combines the power of cloud computing with improvements in algorithm and implementation to achieve high scalability, low cost, user-friendliness and integrated support for a comprehensive set of features. We benchmark Cumulus on the Human Cell Atlas Census of Immune Cells dataset of bone marrow cells and show that it substantially improves efficiency over conventional frameworks, while maintaining or improving the quality of results, enabling large-scale studies.


Assuntos
Computação em Nuvem/economia , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Biologia Computacional/economia , Sequenciamento de Nucleotídeos em Larga Escala/economia , Análise de Sequência de RNA/economia
9.
PLoS Comput Biol ; 16(3): e1007531, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-32214318

RESUMO

Life scientists are increasingly turning to high-throughput sequencing technologies in their research programs, owing to the enormous potential of these methods. In a parallel manner, the number of core facilities that provide bioinformatics support are also increasing. Notably, the generation of complex large datasets has necessitated the development of bioinformatics support core facilities that aid laboratory scientists with cost-effective and efficient data management, analysis, and interpretation. In this article, we address the challenges-related to communication, good laboratory practice, and data handling-that may be encountered in core support facilities when providing bioinformatics support, drawing on our own experiences working as support bioinformaticians on multidisciplinary research projects. Most importantly, the article proposes a list of guidelines that outline how these challenges can be preemptively avoided and effectively managed to increase the value of outputs to the end user, covering the entire research project lifecycle, including experimental design, data analysis, and management (i.e., sharing and storage). In addition, we highlight the importance of clear and transparent communication, comprehensive preparation, appropriate handling of samples and data using monitoring systems, and the employment of appropriate tools and standard operating procedures to provide effective bioinformatics support.


Assuntos
Biologia Computacional/economia , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Pesquisa Biomédica/economia , Pesquisa Biomédica/métodos , Comunicação , Biologia Computacional/normas , Sequenciamento de Nucleotídeos em Larga Escala/economia , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Projetos de Pesquisa/normas
10.
Methods Mol Biol ; 2072: 39-50, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-31541437

RESUMO

Transposable elements can be highly mutagenic because when they transpose they can insert into genes and disrupt their function, a propensity which has been exploited in many organisms to generate tagged mutant alleles. The Mutator (Mu) family transposon is a family of DNA-type transposons in maize with a particularly high duplication frequency, which results in large numbers of new mutations in lineages that carry active Mu elements. Here we describe a rapid and cost-effective Miseq-based Mu transposon profiling pipeline. This method can also be used for identifying flanking sequences of other types of long insertions such as T-DNAs.


Assuntos
Análise Mutacional de DNA , Elementos de DNA Transponíveis , Sequenciamento de Nucleotídeos em Larga Escala , Mutagênese Insercional , Zea mays/genética , Biologia Computacional/economia , Biologia Computacional/métodos , Análise Custo-Benefício , Análise Mutacional de DNA/economia , Análise Mutacional de DNA/métodos , Bases de Dados Genéticas , Duplicação Gênica , Loci Gênicos , Genoma de Planta , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/economia , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software
11.
Brief Bioinform ; 21(2): 486-497, 2020 03 23.
Artigo em Inglês | MEDLINE | ID: mdl-30753282

RESUMO

A biological network is complex. A group of critical nodes determines the quality and state of such a network. Increasing studies have shown that diseases and biological networks are closely and mutually related and that certain diseases are often caused by errors occurring in certain nodes in biological networks. Thus, studying biological networks and identifying critical nodes can help determine the key targets in treating diseases. The problem is how to find the critical nodes in a network efficiently and with low cost. Existing experimental methods in identifying critical nodes generally require much time, manpower and money. Accordingly, many scientists are attempting to solve this problem by researching efficient and low-cost computing methods. To facilitate calculations, biological networks are often modeled as several common networks. In this review, we classify biological networks according to the network types used by several kinds of common computational methods and introduce the computational methods used by each type of network.


Assuntos
Biologia Computacional/métodos , Algoritmos , Biologia Computacional/economia , Custos e Análise de Custo , Genes Essenciais , Proteínas/metabolismo
12.
J Biotechnol ; 299: 72-78, 2019 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-31054297

RESUMO

Low-coverage massively parallel genome sequencing for non-invasive prenatal testing (NIPT) of common aneuploidies is one of the most rapidly adopted and relatively low-cost DNA tests. Since aggregation of reads from a large number of samples allows overcoming the problems of extremely low coverage of individual samples, we describe the possible re-use of the data generated during NIPT testing for genome scale population specific frequency determination of small DNA variants, requiring no additional costs except of those for the NIPT test itself. We applied our method to a data set comprising of 1501 original NIPT test results and evaluated the findings on different levels, from in silico population frequency comparisons up to wet lab validation analyses using a gold-standard method based on Sanger sequencing. The revealed high reliability of variant calling and allelic frequency determinations suggest that these NIPT data could serve as valuable alternatives to large scale population studies even for smaller countries around the world.


Assuntos
Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Diagnóstico Pré-Natal/métodos , Biologia Computacional/economia , Feminino , Frequência do Gene , Sequenciamento de Nucleotídeos em Larga Escala/economia , Humanos , Gravidez , Diagnóstico Pré-Natal/economia , Reprodutibilidade dos Testes , Eslováquia , Sequenciamento Completo do Genoma/economia
14.
PLoS One ; 14(2): e0209523, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30759172

RESUMO

We present elPrep 4, a reimplementation from scratch of the elPrep framework for processing sequence alignment map files in the Go programming language. elPrep 4 includes multiple new features allowing us to process all of the preparation steps defined by the GATK Best Practice pipelines for variant calling. This includes new and improved functionality for sorting, (optical) duplicate marking, base quality score recalibration, BED and VCF parsing, and various filtering options. The implementations of these options in elPrep 4 faithfully reproduce the outcomes of their counterparts in GATK 4, SAMtools, and Picard, even though the underlying algorithms are redesigned to take advantage of elPrep's parallel execution framework to vastly improve the runtime and resource use compared to these tools. Our benchmarks show that elPrep executes the preparation steps of the GATK Best Practices up to 13x faster on WES data, and up to 7.4x faster for WGS data compared to running the same pipeline with GATK 4, while utilizing fewer compute resources.


Assuntos
Análise de Sequência/métodos , Algoritmos , Biologia Computacional/economia , Biologia Computacional/métodos , Custos e Análise de Custo , Exoma , Análise de Sequência/economia , Software
15.
Brief Bioinform ; 20(4): 1215-1221, 2019 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-29092005

RESUMO

Sustainable noncommercial bioinformatics infrastructures are a prerequisite to use and take advantage of the potential of big data analysis for research and economy. Consequently, funders, universities and institutes as well as users ask for a transparent value model for the tools and services offered. In this article, a generally applicable lightweight method is described by which bioinformatics infrastructure projects can estimate the value of tools and services offered without determining exactly the total costs of ownership. Five representative scenarios for value estimation from a rough estimation to a detailed breakdown of costs are presented. To account for the diversity in bioinformatics applications and services, the notion of service-specific 'service provision units' is introduced together with the factors influencing them and the main underlying assumptions for these 'value influencing factors'. Special attention is given on how to handle personnel costs and indirect costs such as electricity. Four examples are presented for the calculation of the value of tools and services provided by the German Network for Bioinformatics Infrastructure (de.NBI): one for tool usage, one for (Web-based) database analyses, one for consulting services and one for bioinformatics training events. Finally, from the discussed values, the costs of direct funding and the costs of payment of services by funded projects are calculated and compared.


Assuntos
Biologia Computacional/economia , Biologia Computacional/métodos , Software/economia , Big Data/economia , Biologia Computacional/educação , Consultores , Custos e Análise de Custo , Arquitetura de Instituições de Saúde/economia , Humanos , Serviços de Informação/economia , Modelos Econômicos , Navegador/economia
16.
Curr Drug Targets ; 20(5): 488-500, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30091413

RESUMO

BACKGROUND: Globally the number of cancer patients and deaths are continuing to increase yearly, and cancer has, therefore, become one of the world's highest causes of morbidity and mortality. In recent years, the study of anticancer drugs has become one of the most popular medical topics. OBJECTIVE: In this review, in order to study the application of machine learning in predicting anticancer drugs activity, some machine learning approaches such as Linear Discriminant Analysis (LDA), Principal components analysis (PCA), Support Vector Machine (SVM), Random forest (RF), k-Nearest Neighbor (kNN), and Naïve Bayes (NB) were selected, and the examples of their applications in anticancer drugs design are listed. RESULTS: Machine learning contributes a lot to anticancer drugs design and helps researchers by saving time and is cost effective. However, it can only be an assisting tool for drug design. CONCLUSION: This paper introduces the application of machine learning approaches in anticancer drug design. Many examples of success in identification and prediction in the area of anticancer drugs activity prediction are discussed, and the anticancer drugs research is still in active progress. Moreover, the merits of some web servers related to anticancer drugs are mentioned.


Assuntos
Antineoplásicos/síntese química , Biologia Computacional/métodos , Algoritmos , Antineoplásicos/química , Antineoplásicos/farmacologia , Teorema de Bayes , Biologia Computacional/economia , Análise Discriminante , Desenho de Fármacos , Humanos , Análise de Componente Principal , Relação Estrutura-Atividade , Máquina de Vetores de Suporte
18.
Curr Drug Targets ; 20(5): 540-550, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30277150

RESUMO

Enzymes are proteins that act as biological catalysts to speed up cellular biochemical processes. According to their main Enzyme Commission (EC) numbers, enzymes are divided into six categories: EC-1: oxidoreductase; EC-2: transferase; EC-3: hydrolase; EC-4: lyase; EC-5: isomerase and EC-6: synthetase. Different enzymes have different biological functions and acting objects. Therefore, knowing which family an enzyme belongs to can help infer its catalytic mechanism and provide information about the relevant biological function. With the large amount of protein sequences influxing into databanks in the post-genomics age, the annotation of the family for an enzyme is very important. Since the experimental methods are cost ineffective, bioinformatics tool will be a great help for accurately classifying the family of the enzymes. In this review, we summarized the application of machine learning methods in the prediction of enzyme family from different aspects. We hope that this review will provide insights and inspirations for the researches on enzyme family classification.


Assuntos
Biologia Computacional/métodos , Enzimas/classificação , Algoritmos , Animais , Biologia Computacional/economia , Enzimas/genética , Enzimas/metabolismo , Humanos , Aprendizado de Máquina , Anotação de Sequência Molecular , Família Multigênica
19.
BMC Genomics ; 19(1): 574, 2018 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-30068294

RESUMO

BACKGROUND: N6-methyladenosine (m6A) is an important epigenetic modification which plays various roles in mRNA metabolism and embryogenesis directly related to human diseases. To identify m6A in a large scale, machine learning methods have been developed to make predictions on m6A sites. However, there are two main drawbacks of these methods. The first is the inadequate learning of the imbalanced m6A samples which are much less than the non-m6A samples, by their balanced learning approaches. Second, the features used by these methods are not outstanding to represent m6A sequence characteristics. RESULTS: We propose to use cost-sensitive learning ideas to resolve the imbalance data issues in the human mRNA m6A prediction problem. This cost-sensitive approach applies to the entire imbalanced dataset, without random equal-size selection of negative samples, for an adequate learning. Along with site location and entropy features, top-ranked positions with the highest single nucleotide polymorphism specificity in the window sequences are taken as new features in our imbalance learning. On an independent dataset, our overall prediction performance is much superior to the existing predictors. Our method shows stronger robustness against the imbalance changes in the tests on 9 datasets whose imbalance ratios range from 1:1 to 9:1. Our method also outperforms the existing predictors on 1226 individual transcripts. It is found that the new types of features are indeed of high significance in the m6A prediction. The case studies on gene c-Jun and CBFB demonstrate the detailed prediction capacity to improve the prediction performance. CONCLUSION: The proposed cost-sensitive model and the new features are useful in human mRNA m6A prediction. Our method achieves better correctness and robustness than the existing predictors in independent test and case studies. The results suggest that imbalance learning is promising to improve the performance of m6A prediction.


Assuntos
Adenosina/análogos & derivados , Biologia Computacional/métodos , RNA Mensageiro/química , Adenosina/análise , Algoritmos , Biologia Computacional/economia , Humanos , Aprendizado de Máquina
20.
Bioinformatics ; 33(21): 3468-3470, 2017 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-29036499

RESUMO

SUMMARY: The decreasing cost in high-throughput technologies led to a number of sequencing projects consisting of thousands of whole genomes. The paradigm shift from exome to whole genome brings a significant increase in the size of output files. Most of the existing tools which are developed to analyse exome files are not adequate for larger VCF files produced by whole genome studies. In this work we present VCF-Explorer, a variant analysis software capable of handling large files. Memory efficiency and avoiding computationally costly pre-processing step enable to carry out the analysis to be performed with ordinary computers. VCF-Explorer provides an easy to use environment where users can define various types of queries based on variant and sample genotype level annotations. VCF-Explorer can be run in different environments and computational platforms ranging from a standard laptop to a high performance server. AVAILABILITY AND IMPLEMENTATION: VCF-Explorer is freely available at: http://vcfexplorer.sourceforge.net/. CONTACT: mete.akgun@tubitak.gov.tr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica/métodos , Software , Biologia Computacional/economia , Biologia Computacional/métodos , Genômica/economia , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Interface Usuário-Computador , Sequenciamento Completo do Genoma
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA