Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
BMC Bioinformatics ; 24(1): 22, 2023 Jan 19.
Artículo en Inglés | MEDLINE | ID: mdl-36658484

RESUMEN

BACKGROUND: Microbial communities are known to be closely related to many diseases, such as obesity and HIV, and it is of interest to identify differentially abundant microbial species between two or more environments. Since the abundances or counts of microbial species usually have different scales and suffer from zero-inflation or over-dispersion, normalization is a critical step before conducting differential abundance analysis. Several normalization approaches have been proposed, but it is difficult to optimize the characterization of the true relationship between taxa and interesting outcomes.  RESULTS: To avoid the challenge of picking an optimal normalization and accommodate the advantages of several normalization strategies, we propose an omnibus approach. Our approach is based on a Cauchy combination test, which is flexible and powerful by aggregating individual p values. We also consider a truncated test statistic to prevent substantial power loss. We experiment with a basic linear regression model as well as recently proposed powerful association tests for microbiome data and compare the performance of the omnibus approach with individual normalization approaches. Experimental results show that, regardless of simulation settings, the new approach exhibits power that is close to the best normalization strategy, while controling the type I error well.  CONCLUSIONS: The proposed omnibus test releases researchers from choosing among various normalization methods and it is an aggregated method that provides the powerful result to the underlying optimal normalization, which requires tedious trial and error. While the power may not exceed the best normalization, it is always much better than using a poor choice of normalization.


Asunto(s)
Microbiota , Simulación por Computador , Modelos Lineales , Investigación
2.
BMC Bioinformatics ; 21(1): 553, 2020 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-33261552

RESUMEN

BACKGROUND: RNA sequencing allows the study of both gene expression changes and transcribed mutations, providing a highly effective way to gain insight into cancer biology. When planning the sequencing of a large cohort of samples, library size is a fundamental factor affecting both the overall cost and the quality of the results. Here we specifically address how overall library size influences the detection of somatic mutations in RNA-seq data in two acute myeloid leukaemia datasets. RESULTS : We simulated shallower sequencing depths by downsampling 45 acute myeloid leukaemia samples (100 bp PE) that are part of the Leucegene project, which were originally sequenced at high depth. We compared the sensitivity of six methods of recovering validated mutations on the same samples. The methods compared are a combination of three popular callers (MuTect, VarScan, and VarDict) and two filtering strategies. We observed an incremental loss in sensitivity when simulating libraries of 80M, 50M, 40M, 30M and 20M fragments, with the largest loss detected with less than 30M fragments (below 90%, average loss of 7%). The sensitivity in recovering insertions and deletions varied markedly between callers, with VarDict showing the highest sensitivity (60%). Single nucleotide variant sensitivity is relatively consistent across methods, apart from MuTect, whose default filters need adjustment when using RNA-Seq. We also analysed 136 RNA-Seq samples from the TCGA-LAML cohort (50 bp PE) and assessed the change in sensitivity between the initial libraries (average 59M fragments) and after downsampling to 40M fragments. When considering single nucleotide variants in recurrently mutated myeloid genes we found a comparable performance, with a 6% average loss in sensitivity using 40M fragments. CONCLUSIONS: Between 30M and 40M 100 bp PE reads are needed to recover 90-95% of the initial variants on recurrently mutated myeloid genes. To extend this result to another cancer type, an exploration of the characteristics of its mutations and gene expression patterns is suggested.


Asunto(s)
Biblioteca de Genes , Polimorfismo de Nucleótido Simple/genética , RNA-Seq/métodos , Secuencia de Bases , Bases de Datos Genéticas , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Neoplasias/genética
3.
Molecules ; 24(15)2019 Aug 05.
Artículo en Inglés | MEDLINE | ID: mdl-31387220

RESUMEN

Fragment-based drug discovery (FBDD) has become a major strategy to derive novel lead candidates for various therapeutic targets, as it promises efficient exploration of chemical space by employing fragment-sized (MW < 300) compounds. One of the first challenges in implementing a FBDD approach is the design of a fragment library, and more specifically, the choice of its size and individual members. A diverse set of fragments is required to maximize the chances of discovering novel hit compounds. However, the exact diversity of a certain collection of fragments remains underdefined, which hinders direct comparisons among different selections of fragments. Based on structural fingerprints, we herein introduced quantitative metrics for the structural diversity of fragment libraries. Structures of commercially available fragments were retrieved from the ZINC database, from which libraries with sizes ranging from 100 to 100,000 compounds were selected. The selected libraries were evaluated and compared quantitatively, resulting in interesting size-diversity relationships. Our results demonstrated that while library size does matter for its diversity, there exists an optimal size for structural diversity. It is also suggested that such quantitative measures can guide the design of diverse fragment libraries under different circumstances.


Asunto(s)
Descubrimiento de Drogas , Bibliotecas de Moléculas Pequeñas , Bases de Datos Factuales , Diseño de Fármacos , Descubrimiento de Drogas/métodos , Ensayos Analíticos de Alto Rendimiento , Humanos , Relación Estructura-Actividad
4.
Br J Radiol ; 97(1158): 1153-1161, 2024 May 29.
Artículo en Inglés | MEDLINE | ID: mdl-38637944

RESUMEN

OBJECTIVES: The aim of this study was to determine the number of trade-off explored (TO) library plans required for building a RapidPlan (RP) library that would generate the optimal clinical treatment plan. METHODS: We developed 2 RP models, 1 each for the 2 clinical sites, head and neck (HN) and cervix. The models were created using 100 plans and were validated using 70 plans (VP) for each site respectively. Each of the 2 libraries comprising 100 TO plans was divided into 5 different subsets of library plans comprising 20, 40, 60, 80, and 100 plans, leading to 5 different RP models for each site. For every validation patient, a TO plan (TO_VP) was created. For every patient, 5 RP plans were automatically generated using RP models. The dosimetric parameters of the 6 plans (TO_VP + 5 RP plans) were compared using Pearson correlation and Greenhouse-Geisser analysis. RESULTS: Planning target volume (PTV) dose volume parameters PTVD95% in 6 competing plans varied between 97.6 ± 0.7% and 98.1 ± 0.6% in HN cases and 98.8 ± 0.3% and 99.0 ± 0.4% in cervix cases. Overall, for both sites, the mean variations in organ at risk (OAR) doses or volumes were within 50 cGy, 0.5%, and 0.2 cc between library plans, and if TO_VP was included the variations deteriorated to 180 cGy, 0.4%, and 15 cc. All OARs in both sites, except D0.1 ccspine, showed a statistically insignificant variation between all plans. CONCLUSIONS: Dosimetric variation among various output plans generated from 5 RP libraries is minimal and clinically insignificant. The optimal output plan can be derived from the least-weighted library consisting of 20 plans. ADVANCES IN KNOWLEDGE: This article shows that, when the constituent plans are subjected to trade-off exploration, the number of constituent plans for a knowledge-based planning module is not relevant in terms of its dosimetric output.


Asunto(s)
Neoplasias de Cabeza y Cuello , Dosificación Radioterapéutica , Planificación de la Radioterapia Asistida por Computador , Neoplasias del Cuello Uterino , Humanos , Planificación de la Radioterapia Asistida por Computador/métodos , Femenino , Neoplasias de Cabeza y Cuello/radioterapia , Neoplasias del Cuello Uterino/radioterapia , Bases del Conocimiento , Radioterapia de Intensidad Modulada/métodos
5.
Methods Mol Biol ; 1772: 171-189, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29754228

RESUMEN

Saturation mutagenesis is conveniently located between the two extremes of protein engineering, namely random mutagenesis, and rational design. It involves mutating a confined number of target residues to other amino acids, and hence requires knowledge regarding the sites for mutagenesis, but not their final identity. There are many different strategies for performing and designing such experiments, ranging from simple single degenerate codons to codon collections that code for distinct sets of amino acids. Here, we provide detailed information on the Dynamic Management for Codon Compression (DYNAMCC) approaches that allow us to precisely define the desired amino acid composition to be introduced to a specific target site. DYNAMCC allows us to set usage thresholds and to eliminate undesirable stop and wild-type codons, thus allowing us to control library size and subsequently downstream screening efforts. The DYNAMCC algorithms are free of charge and are implemented in a website for easy access and usage: www.dynamcc.com .


Asunto(s)
Codón/genética , Mutagénesis/genética , Algoritmos , Aminoácidos/genética , Compresión de Datos/métodos , Biblioteca de Genes , Ingeniería de Proteínas/métodos
6.
Front Plant Sci ; 9: 108, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29491871

RESUMEN

RNA-Seq is a widely used technology that allows an efficient genome-wide quantification of gene expressions for, for example, differential expression (DE) analysis. After a brief review of the main issues, methods and tools related to the DE analysis of RNA-Seq data, this article focuses on the impact of both the replicate number and library size in such analyses. While the main drawback of previous relevant studies is the lack of generality, we conducted both an analysis of a two-condition experiment (with eight biological replicates per condition) to compare the results with previous benchmark studies, and a meta-analysis of 17 experiments with up to 18 biological conditions, eight biological replicates and 100 million (M) reads per sample. As a global trend, we concluded that the replicate number has a larger impact than the library size on the power of the DE analysis, except for low-expressed genes, for which both parameters seem to have the same impact. Our study also provides new insights for practitioners aiming to enhance their experimental designs. For instance, by analyzing both the sensitivity and specificity of the DE analysis, we showed that the optimal threshold to control the false discovery rate (FDR) is approximately 2-r, where r is the replicate number. Furthermore, we showed that the false positive rate (FPR) is rather well controlled by all three studied R packages: DESeq, DESeq2, and edgeR. We also analyzed the impact of both the replicate number and library size on gene ontology (GO) enrichment analysis. Interestingly, we concluded that increases in the replicate number and library size tend to enhance the sensitivity and specificity, respectively, of the GO analysis. Finally, we recommend to RNA-Seq practitioners the production of a pilot data set to strictly analyze the power of their experimental design, or the use of a public data set, which should be similar to the data set they will obtain. For individuals working on tomato research, on the basis of the meta-analysis, we recommend at least four biological replicates per condition and 20 M reads per sample to be almost sure of obtaining about 1000 DE genes if they exist.

8.
ACS Synth Biol ; 4(5): 604-14, 2015 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-25303315

RESUMEN

Saturation mutagenesis is employed in protein engineering and genome-editing efforts to generate libraries that span amino acid design space. Traditionally, this is accomplished by using degenerate/compressed codons such as NNK (N = A/C/G/T, K = G/T), which covers all amino acids and one stop codon. These solutions suffer from two types of redundancy: (a) different codons for the same amino acid lead to bias, and (b) wild type amino acid is included within the library. These redundancies increase library size and downstream screening efforts. Here, we present a dynamic approach to compress codons for any desired list of amino acids, taking into account codon usage. This results in a unique codon collection for every amino acid to be mutated, with the desired redundancy level. Finally, we demonstrate that this approach can be used to design precise oligo libraries amendable to recombineering and CRISPR-based genome editing to obtain a diverse population with high efficiency.


Asunto(s)
Codón/genética , Mutagénesis/genética , Algoritmos , Aminoácidos/genética , Biblioteca de Genes , Mutación/genética , Oligonucleótidos/genética , Ingeniería de Proteínas/métodos
9.
J Biol Phys ; 28(3): 471-82, 2002 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-23345790

RESUMEN

In an attempt to understand protein evolution, we address the issues ofhow much variety in the sequences is needed to prompt the evolution ofan enzyme from random polypeptides and how does cellular interactionaffect the dynamics of molecular evolution to allow genetic diversity inpopulation. The experimental evolution of phage-displayed randompolypeptides of about 140 amino acid residues panned with transition stateanalogue for an esterase reaction showed that even with a population sizeas small as ten, not only could significant varieties be found but also therandom polypeptides in each of the generation had great promise towardsdeveloping into functional proteins. Hence, it is evident that the enzymeevolution is prompted even within a small local area of the static landscapeof the sequence space. Considering that interaction among living cells is aninevitable event in natural evolution, its role was investigated through threeconsecutive rounds of random mutagenesis on the glutamine synthetasegene and chemostat culture of the transformed Escherichia colicellscontaining the mutated genes. The molecular phylogeny and populationdynamics show the coexistence of some mutants having different level ofglutamine synthetase at each generation. In addition, it was confirmed thatcellular interaction via the medium influences the stability of the coexistenceand bring forth fitness change to the coexisting members of the population,thereby, leading to a dynamical landscape. Based on experimental resultsreflecting the extent of interaction among members in population, here, Iproposed that protein evolution could change its mode from theoptimization on static landscape to diversification on dynamicallandscape.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA