RESUMO
MOTIVATION: Research on human microbiome has suggested associations with human health, opening opportunities to predict health outcomes using microbiome. Studies have also suggested that diverse forms of taxa such as rare taxa that are evolutionally related and abundant taxa that are evolutionally unrelated could be associated with or predictive of a health outcome. Although prediction models were developed for microbiome data, no prediction models currently exist that use multiple forms of microbiome-outcome associations. RESULTS: We developed MK-BMC, a Multi-Kernel framework with Boosted distance Metrics for Classification using microbiome data. We propose to first boost widely used distance metrics for microbiome data using taxon-level association signal strengths to up-weight taxa that are potentially associated with an outcome of interest. We then propose a multi-kernel prediction model with one kernel capturing one form of association between taxa and the outcome, where a kernel measures similarities of microbiome compositions between pairs of samples being transformed from a proposed boosted distance metric. We demonstrated superior prediction performance of (i) boosted distance metrics for microbiome data over original ones and (ii) MK-BMC over competing methods through extensive simulations. We applied MK-BMC to predict thyroid, obesity, and inflammatory bowel disease status using gut microbiome data from the American Gut Project and observed much-improved prediction performance over that of competing methods. The learned kernel weights help us understand contributions of individual microbiome signal forms nicely. AVAILABILITY AND IMPLEMENTATION: Source code together with a sample input dataset is available at https://github.com/HXu06/MK-BMC.
Assuntos
Microbioma Gastrointestinal , Microbiota , Humanos , FilogeniaRESUMO
Although there are a large number of structural variations in the chromosomes of each individual, there is a lack of more accurate methods for identifying clinical pathogenic variants. Here, we proposed SVPath, a machine learning-based method to predict the pathogenicity of deletions, insertions and duplications structural variations that occur in exons. We constructed three types of annotation features for each structural variation event in the ClinVar database. First, we treated complex structural variations as multiple consecutive single nucleotide polymorphisms events, and annotated them with correlation scores based on single nucleic acid substitutions, such as the impact on protein function. Second, we determined which genes the variation occurred in, and constructed gene-based annotation features for each structural variation. Third, we also calculated related features based on the transcriptome, such as histone signal, the overlap ratio of variation and genomic element definitions, etc. Finally, we employed a gradient boosting decision tree machine learning method, and used the deletions, insertions and duplications in the ClinVar database to train a structural variation pathogenicity prediction model SVPath. These structural variations are clearly indicated as pathogenic or benign. Experimental results show that our SVPath has achieved excellent predictive performance and outperforms existing state-of-the-art tools. SVPath is very promising in evaluating the clinical pathogenicity of structural variants. SVPath can be used in clinical research to predict the clinical significance of unknown pathogenicity and new structural variation, so as to explore the relationship between diseases and structural variations in a computational way.
Assuntos
Aprendizado de Máquina , Polimorfismo de Nucleotídeo Único , Éxons , Humanos , Anotação de Sequência Molecular , VirulênciaRESUMO
In this work, alkali metal Rb-loaded ZnO/In2O3 heterojunctions were synthesized using a combination of hydrothermal and impregnation methods. The morphology and structure of the synthesized samples were characterized by X-ray diffraction, field emission scanning electron microscopy, and transmission electron microscopy. The enhancement mechanism of the nitrogen dioxide gas sensing performance of the Rb-loaded ZnO/In2O3 heterojunctions was systematically investigated at room temperature using density-functional theory calculations and experimental validation. The experimental tests showed that the Rb-loaded ZnO/In2O3 sensor achieved an excellent response value of 24.2 for 1 ppm NO2, with response and recovery times of 55 and 21 s, respectively. This result is 20 times higher than that of pure ZnO sensors and two times higher than that of ZnO/In2O3 sensors, indicating that the Rb-loaded ZnO/In2O3 sensor has a more pronounced enhancement in performance for NO2. This study not only revealed the mechanism by which Rb loading affects the electronic structure and gas molecule adsorption behavior on the surface of ZnO/In2O3 heterojunctions but also provides theoretical guidance and technical support for the development of high-performance room-temperature NO2 sensors.
RESUMO
The main challenge in cancer genomics is to distinguish the driver genes from passenger or neutral genes. Cancer genomes exhibit extensive mutational heterogeneity that no two genomes contain exactly the same somatic mutations. Such mutual exclusivity (ME) of mutations has been observed in cancer data and is associated with functional pathways. Analysis of ME patterns may provide useful clues to driver genes or pathways and may suggest novel understandings of cancer progression. In this article, we consider a probabilistic, generative model of ME, and propose a powerful and greedy algorithm to select the mutual exclusivity gene sets. The greedy method includes a pre-selection procedure and a stepwise forward algorithm which can significantly reduce computation time. Power calculations suggest that the new method is efficient and powerful for one ME set or multiple ME sets with overlapping genes. We illustrate this approach by analysis of the whole-exome sequencing data of cancer types from TCGA.
Assuntos
Biologia Computacional , Neoplasias , Algoritmos , Biologia Computacional/métodos , Genômica/métodos , Humanos , Mutação , Neoplasias/genéticaRESUMO
PURPOSE OF REVIEW: The application of tyrosine kinase inhibitor (TKI) has successfully changed the standard of care in epidermal growth factor receptor ( EGFR ) positive non-small cell lung cancer. However, clinical survivals for patients with EGFR exon 20 insertions have failed to improve over the long period and the mutation appeared resistant to EGFR -TKIs. This overview focused on the current treatment strategies, summarized the emerging regimens for patients with EGFR exon 20 insertions, and demonstrated historical challenges and future development. RECENT FINDING: Current clinical trials suggested that several regimens selectively-targeted EGFR exon 20 insertions presented potent antitumor activity, like mobocertinib and the bispecific anti- EGFR-MET monoclonal antibody amivantamab and were approved by Food and Drug Administration (FDA) in patients progressed beyond first-line treatment. Novel treatments, including DZD9008, CLN-081, revealed modest clinical efficacy as well and clinical trials are underway, which may lead to improvement of survival outcomes. SUMMARY: Recent clinical evidence indicates that targeted therapies could improve survival benefits to some extent. More efforts on drug development are underway to bring higher response rates both extracranial and intracranial, sustained clinical remission, and better survival benefits.
Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Estados Unidos , Humanos , Neoplasias Pulmonares/tratamento farmacológico , Neoplasias Pulmonares/genéticaRESUMO
Genome-wide association studies (GWAS) using longitudinal phenotypes collected over time is appealing due to the improvement of power. However, computation burden has been a challenge because of the complex algorithms for modeling the longitudinal data. Approximation methods based on empirical Bayesian estimates (EBEs) from mixed-effects modeling have been developed to expedite the analysis. However, our analysis demonstrated that bias in both association test and estimation for the existing EBE-based methods remains an issue. We propose an incredibly fast and unbiased method (simultaneous correction for EBE, SCEBE) that can correct the bias in the naive EBE approach and provide unbiased P-values and estimates of effect size. Through application to Alzheimer's Disease Neuroimaging Initiative data with 6 414 695 single nucleotide polymorphisms, we demonstrated that SCEBE can efficiently perform large-scale GWAS with longitudinal outcomes, providing nearly 10 000 times improvement of computational efficiency and shortening the computation time from months to minutes. The SCEBE package and the example datasets are available at https://github.com/Myuan2019/SCEBE.
Assuntos
Algoritmos , Doença de Alzheimer/genética , Polimorfismo de Nucleotídeo Único , Software , Estudo de Associação Genômica Ampla , HumanosRESUMO
Identifying the interactions between T-cell receptor (TCRs) and human antigens is a crucial step in developing new vaccines, diagnostics, and immunotherapy. Current methods primarily focus on learning binding patterns from known TCR binding repertoires by using sequence information alone without considering the binding specificity of new antigens or exogenous peptides that have not appeared in the training set. Furthermore, the spatial structure of antigens plays a critical role in immune studies and immunotherapy, which should be addressed properly in the identification of interacting TCR-antigen pairs. In this study, we introduced a novel deep learning framework based on generative graph structures, GGNpTCR, for predicting interactions between TCR and peptides from sequence information. Results of real data analysis indicate that our model achieved excellent prediction for new antigens unseen in the training data set, making significant improvements compared to existing methods. We also applied the model to a large COVID-19 data set with no antigens in the training data set, and the improvement was also significant. Furthermore, through incorporation of additional supervised mechanisms, GGNpTCR demonstrated the ability to precisely forecast the locations of peptide-TCR interactions within 3D configurations. This enhancement substantially improved the model's interpretability. In summary, based on the performance on multiple data sets, GGNpTCR has made significant progress in terms of performance, universality, and interpretability.
Assuntos
Peptídeos , Linfócitos T , Humanos , Linfócitos T/metabolismo , Peptídeos/química , Receptores de Antígenos de Linfócitos T/química , Receptores de Antígenos de Linfócitos T/metabolismo , Imunidade , Redes Neurais de ComputaçãoRESUMO
Aim: To assess the effectiveness of different types of taxanes, including nab-paclitaxel, paclitaxel and docetaxel, and further compare the effectiveness of taxane-based chemotherapy, taxane-based chemotherapy plus angiogenesis inhibitors or taxane-based chemotherapy plus immune checkpoint inhibitors in HER2-altered non-small-cell lung cancer in the second- or third-line setting. Materials & methods: A total of 52 patients were included in the study. Progression-free survival was compared between subgroups. Results: A clinically meaningful improvement in progression-free survival was observed among patients in the nab-paclitaxel group compared with the docetaxel group. Taxane-based chemotherapy plus immune checkpoint inhibitors achieved longer progression-free survival than taxane-based chemotherapy. There was no difference between taxane-based chemotherapy plus immune checkpoint inhibitors and taxane-based chemotherapy plus angiogenesis inhibitors. Conclusion: Nab-paclitaxel appears to be a reasonable alternative to docetaxel. Chemotherapy plus immune checkpoint inhibitors might yield more survival benefits than chemotherapy alone.
Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Humanos , Carcinoma Pulmonar de Células não Pequenas/tratamento farmacológico , Docetaxel/uso terapêutico , Inibidores da Angiogênese/uso terapêutico , Inibidores de Checkpoint Imunológico/efeitos adversos , Neoplasias Pulmonares/tratamento farmacológico , Paclitaxel/efeitos adversos , Taxoides/uso terapêutico , Imunoterapia , Protocolos de Quimioterapia Combinada Antineoplásica/efeitos adversosRESUMO
BACKGROUND: Although targeted agents have been gradually applied in the treatment of HER2-mutated non-small cell lung cancer (NSCLC) in recent years, patients' therapeutic demands are far from being met. PATHER2 was the first phase 2 trial to explore the efficacy and safety of the HER2-targeted tyrosine kinase inhibitor (TKI) pyrotinib plus the antiangiogenic agent apatinib in previously treated HER2-altered metastatic NSCLC patients. METHODS: HER2-mutated or HER2-amplified metastatic NSCLC patients who had failed at least first-line chemotherapy or HER2-targeted TKIs received oral pyrotinib 400 mg plus apatinib 250 mg once daily until disease progression, intolerable toxicity, or death. The primary endpoint was the investigator-assessed objective response rate (ORR). RESULTS: Between March 2019 and December 2020, 33 patients were enrolled; 13 (39.4%) presented brain metastases, and 16 (48.5%) had received at least two lines of prior chemotherapy or HER2-targeted TKIs. As of September 20, 2021, the median follow-up duration was 11.3 (range, 3.5-26.0) months. The investigator-assessed ORR was 51.5% (17/33; 95% CI, 33.5 to 69.2%), and the disease control rate was 93.9% (31/33; 95% CI, 79.8 to 99.3%). The median duration of response, progression-free survival, and overall survival were 6.0 (95% CI, 4.4 to 8.6) months, 6.9 (95% CI, 5.8 to 8.5) months, and 14.8 (95% CI, 10.4 to 23.8) months, respectively. The most frequent grade ≥ 3 treatment-related adverse events included diarrhea (3.0%) and hypertension (9.1%). No treatment-related deaths were reported. CONCLUSIONS: Pyrotinib plus apatinib demonstrated promising antitumor activity and a manageable safety profile in HER2-mutated or HER2-amplified metastatic NSCLC patients. TRIAL REGISTRATION: Chinese Clinical Trial Registry Identifier: ChiCTR1900021684 .
Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Acrilamidas , Aminoquinolinas , Protocolos de Quimioterapia Combinada Antineoplásica , Humanos , Estudos Prospectivos , PiridinasRESUMO
BACKGROUND: Cytokines have been reported to alter the response to immune checkpoint inhibitors (ICIs) in patients with the tumor in accordance with their plasma concentrations. Here, we aimed to identify the key cytokines which influenced the responses and stimulated resistance to ICIs and tried to improve immunological response and develop novel clinical treatments in non-small cell lung cancer (NSCLC). METHODS: The promising predictive cytokines were analyzed via the multi-analyte flow assay. Next, we explored the correlation baseline level of plasma cytokines and clinical outcomes in 45 NSCLC patients treated with ICIs. The mechanism of the potential candidate cytokine in predicting response and inducing resistance to ICIs was then investigated. RESULTS: We found NSCLC with a low baseline concentration of IL-6 in plasma specimens or tumor tissues could derive more benefit from ICIs based on the patient cohort. Further analyses revealed that a favorable relationship between PD-L1 and IL-6 expression was seen in NSCLC specimens. Results in vitro showed that PD-L1 expression in the tumor was enhanced by IL-6 via the JAK1/Stat3 pathway, which induced immune evasion. Notably, an adverse correlation was found between IL-6 levels and CD8+ T cells. And a positive association between IL-6 levels and myeloid-derived suppressor cells, M2 macrophages and regulator T cells was also seen in tumor samples, which may result in an inferior response to ICIs. Results of murine models of NSCLC suggested that the dual blockade of IL-6 and PD-L1 attenuated tumor growth. Further analyses detected that the inhibitor of IL-6 stimulated the infiltration of CD8+ T cells and yielded the inflammatory phenotype. CONCLUSIONS: This study elucidated the role of baseline IL-6 levels in predicting the responses and promoting resistance to immunotherapy in patients with NSCLC. Our results indicated that the treatment targeting IL-6 may be beneficial for ICIs in NSCLC.
Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Animais , Antígeno B7-H1 , Biomarcadores , Linfócitos T CD8-Positivos/patologia , Carcinoma Pulmonar de Células não Pequenas/tratamento farmacológico , Carcinoma Pulmonar de Células não Pequenas/genética , Humanos , Imunoterapia/métodos , Interleucina-6 , Neoplasias Pulmonares/tratamento farmacológico , Neoplasias Pulmonares/genética , CamundongosRESUMO
MOTIVATION: Predicting entity relationship can greatly benefit important biomedical problems. Recently, a large amount of biomedical heterogeneous networks (BioHNs) are generated and offer opportunities for developing network-based learning approaches to predict relationships among entities. However, current researches slightly explored BioHNs-based self-supervised representation learning methods, and are hard to simultaneously capturing local- and global-level association information among entities. RESULTS: In this study, we propose a BioHN-based self-supervised representation learning approach for entity relationship predictions, termed BioERP. A self-supervised meta path detection mechanism is proposed to train a deep Transformer encoder model that can capture the global structure and semantic feature in BioHNs. Meanwhile, a biomedical entity mask learning strategy is designed to reflect local associations of vertices. Finally, the representations from different task models are concatenated to generate two-level representation vectors for predicting relationships among entities. The results on eight datasets show BioERP outperforms 30 state-of-the-art methods. In particular, BioERP reveals great performance with results close to 1 in terms of AUC and AUPR on the drug-target interaction predictions. In summary, BioERP is a promising bio-entity relationship prediction approach. AVAILABILITY AND IMPLEMENTATION: Source code and data can be downloaded from https://github.com/pengsl-lab/BioERP.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Aprendizado Profundo , Software , SemânticaRESUMO
MOTIVATION: With the emerging of high-dimensional genomic data, genetic analysis such as genome-wide association studies (GWAS) have played an important role in identifying disease-related genetic variants and novel treatments. Complex longitudinal phenotypes are commonly collected in medical studies. However, since limited analytical approaches are available for longitudinal traits, these data are often underutilized. In this article, we develop a high-throughput machine learning approach for multilocus GWAS using longitudinal traits by coupling Empirical Bayesian Estimates from mixed-effects modeling with a novel â0-norm algorithm. RESULTS: Extensive simulations demonstrated that the proposed approach not only provided accurate selection of single nucleotide polymorphisms (SNPs) with comparable or higher power but also robust control of false positives. More importantly, this novel approach is highly scalable and could be approximately >1000 times faster than recently published approaches, making genome-wide multilocus analysis of longitudinal traits possible. In addition, our proposed approach can simultaneously analyze millions of SNPs if the computer memory allows, thereby potentially allowing a true multilocus analysis for high-dimensional genomic data. With application to the data from Alzheimer's Disease Neuroimaging Initiative, we confirmed that our approach can identify well-known SNPs associated with AD and were much faster than recently published approaches (≥6000 times). AVAILABILITY AND IMPLEMENTATION: The source code and the testing datasets are available at https://github.com/Myuan2019/EBE_APML0. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Estudo de Associação Genômica Ampla , Software , Algoritmos , Teorema de Bayes , Fenótipo , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Mutual exclusivity analyses provide an effective tool to identify driver genes from passenger genes for cancer studies. Various algorithms have been developed for the detection of mutual exclusivity, but controlling false positive and improving accuracy remain challenging. We propose a forward selection algorithm for identification of mutually exclusive gene sets (FSME) in this paper. The method includes an initial search of seed pair of mutually exclusive (ME) genes and subsequently including more genes into the current ME set. Simulations demonstrated that, compared to recently published approaches (i.e., CoMEt, WExT, and MEGSA), FSME could provide higher precision or recall rate to identify ME gene sets, and had superior control of false positive rates. With application to TCGA real data sets for AML, BRCA, and GBM, we confirmed that FSME can be utilized to discover cancer driver genes.
Assuntos
Algoritmos , Biologia Computacional/métodos , Regulação Neoplásica da Expressão Gênica , Neoplasias/genética , Carcinogênese/genética , Reações Falso-Positivas , Humanos , Cadeias de Markov , Método de Monte Carlo , Mutagênese/genética , OncogenesRESUMO
Reducing false discoveries caused by population stratification (PS) has always been a challenge in genome-wide association studies (GWAS). The current literature established several single marker approaches including genomic control (GC), EIGENSTRAT and generalized linear mixed model association test (GMMAT) and multi-marker methods such as LASSO mixed model (LASSOMM). However, the single-marker methods require prespecifying an arbitrary p value threshold in the selection process, likely resulting in suboptimal precision or recall. On the other hand, it appears that LASSOMM is extremely computationally intensive and may not suitable for large-scale GWAS. In this paper, we proposed a simple multi-marker approach (PCA-LASSO) combining principal component analysis (PCA) and least absolute shrinkage and selection operator (LASSO). We utilize PCA to correct for the confounding effects of PS and LASSO with built-in cross-validation for a data-driven selection. Compared to the current single-marker approaches, the proposed PCA-LASSO provides optimal balance between precision and recall, and consequently superior F1 scores. Similarly, compared to LASSOMM, PCA-LASSO markedly increases the precision while minimizing the loss of recall, and therefore improves the overall F1 score in presence of PS. More importantly, PCA-LASSO drastically reduces the computational time by > 1000 times when compared to LASSOMM. We applied PCA-LASSO to a real dataset of Alzheimer's disease and successfully identified SNP rs429358 (Gene APOE4) which has been widely reported to be associated with the onset and elevated risk of Alzheimer's disease. In conclusion, PCA-LASSO is a simple, fast, but accurate approach for GWAS in presence of latent PS.
Assuntos
Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/normas , Doença de Alzheimer/genética , Conjuntos de Dados como Assunto , Genômica , Humanos , Análise de Componente Principal , Fatores de TempoRESUMO
Background: In genetic association studies with quantitative trait loci (QTL), the association between a candidate genetic marker and the trait of interest is commonly examined by the omnibus F test or by the t-test corresponding to a given genetic model or mode of inheritance. It is known that the t-test with a correct model specification is more powerful than the F test. However, since the underlying genetic model is rarely known in practice, the use of a model-specific t-test may incur substantial power loss. Robust-efficient tests, such as the Maximin Efficiency Robust Test (MERT) and MAX3 have been proposed in the literature. Methods: In this paper, we propose a novel two-step robust-efficient approach, namely, the genetic model selection (GMS) method for quantitative trait analysis. GMS selects a genetic model by testing Hardy-Weinberg disequilibrium (HWD) with extremal samples of the population in the first step and then applies the corresponding genetic model-specific t-test in the second step. Results: Simulations show that GMS is not only more efficient than MERT and MAX3, but also has comparable power to the optimal t-test when the genetic model is known. Conclusion: Application to the data from Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort demonstrates that the proposed approach can identify meaningful biological SNPs on chromosome 19.
RESUMO
Although rapid progress has been made in computational approaches for prioritizing cancer driver genes, research is far from achieving the ultimate goal of discovering a complete catalog of genes truly associated with cancer. Driver gene lists predicted from these computational tools lack consistency and are prone to false positives. Here, we developed an approach (DriverML) integrating Rao's score test and supervised machine learning to identify cancer driver genes. The weight parameters in the score statistics quantified the functional impacts of mutations on the protein. To obtain optimized weight parameters, the score statistics of prior driver genes were maximized on pan-cancer training data. We conducted rigorous and unbiased benchmark analysis and comparisons of DriverML with 20 other existing tools in 31 independent datasets from The Cancer Genome Atlas (TCGA). Our comprehensive evaluations demonstrated that DriverML was robust and powerful among various datasets and outperformed the other tools with a better balance of precision and sensitivity. In vitro cell-based assays further proved the validity of the DriverML prediction of novel driver genes. In summary, DriverML uses an innovative, machine learning-based approach to prioritize cancer driver genes and provides dramatic improvements over currently existing methods. Its source code is available at https://github.com/HelloYiHan/DriverML.
Assuntos
Regulação Neoplásica da Expressão Gênica , Aprendizado de Máquina/estatística & dados numéricos , Proteínas de Neoplasias/genética , Neoplasias/genética , Oncogenes , Software , Atlas como Assunto , Proteínas de Ciclo Celular/genética , Proteínas de Ciclo Celular/metabolismo , Linhagem Celular Tumoral , Movimento Celular , Proliferação de Células , Conjuntos de Dados como Assunto , Humanos , Método de Monte Carlo , Mutação , Proteínas de Neoplasias/metabolismo , Neoplasias/diagnóstico , Neoplasias/patologia , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismoRESUMO
Studies about radiation damage in vivo are very significant for healthy risk assessment as well as cancer radiotherapy. Ceramide as a second messenger has been found to be related to radiation-induced apoptosis. However, the detailed mechanisms in living systems are still not fully understood. In the present study, the effects of ceramide in gamma radiation-induced response were investigated using Caenorhabditis elegans. Our results indicated that ceramide was required for gamma radiation-induced whole-body germ cell apoptosis by the production of radical oxygen species and decrease of mitochondrial transmembrane potential. Using genetic ceramide synthase-related mutated strains and exogenous C16-ceramide, we illustrated that ceramide could regulate DNA damage response (DDR) pathway to mediate radiation-induced germ cell apoptosis. Moreover, ceramide was found to function epistatic to pmk-1 and mpk-1 in MAPK pathway to promote radiation-induced apoptosis in Caenorhabditis elegans. These results demonstrated ceramide could potentially mediated gamma radiation-induced apoptosis through regulating mitochondrial function, DDR pathway and MAPK pathway.
Assuntos
Caenorhabditis elegans/fisiologia , Ceramidas/farmacologia , Protetores contra Radiação/farmacologia , Animais , Apoptose/efeitos dos fármacos , Caenorhabditis elegans/metabolismo , Caenorhabditis elegans/efeitos da radiação , Proteínas de Caenorhabditis elegans/genética , Ceramidas/metabolismo , Dano ao DNA , Células Germinativas/efeitos dos fármacos , Mitocôndrias/efeitos dos fármacos , Radiação , Espécies Reativas de Oxigênio/metabolismoRESUMO
MOTIVATION: Estimating haplotype frequencies from genotype data plays an important role in genetic analysis. In silico methods are usually computationally involved since phase information is not available. Due to tight linkage disequilibrium and low recombination rates, the number of haplotypes observed in human populations is far less than all the possibilities. This motivates us to solve the estimation problem by maximizing the sparsity of existing haplotypes. Here, we propose a new algorithm by applying the compressive sensing (CS) theory in the field of signal processing, compressive sensing haplotype inference (CSHAP), to solve the sparse representation of haplotype frequencies based on allele frequencies and between-allele co-variances. RESULTS: Our proposed approach can handle both individual genotype data and pooled DNA data with hundreds of loci. The CSHAP exhibits the same accuracy compared with the state-of-the-art methods, but runs several orders of magnitude faster. CSHAP can also handle with missing genotype data imputations efficiently. AVAILABILITY AND IMPLEMENTATION: The CSHAP is implemented in R, the source code and the testing datasets are available at http://home.ustc.edu.cn/â¼zhouys/CSHAP/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Algoritmos , Software , Frequência do Gene , Genótipo , Haplótipos , Humanos , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Nonlinear mixed-effects (NLME) modeling is one of the most powerful tools for analyzing longitudinal data especially under the sparse sampling design. The determinant of the Fisher information matrix is a commonly used global metric of the information that can be provided by the data under a given model. However, in clinical studies, it is also important to measure how much information the data provide for a certain parameter of interest under the assumed model, for example, the clearance in population pharmacokinetic models. This paper proposes a new, easy-to-interpret information metric, the "relative information" (RI), which is designed for specific parameters of a model and takes a value between 0% and 100%. We establish the relationship between interindividual variability for a specific parameter and the variance of the associated parameter estimator, demonstrating that, under a "perfect" experiment (eg, infinite samples or/and minimum experimental error), the RI and the variance of the model parameter estimator converge, respectively, to 100% and the ratio of the interindividual variability for that parameter and the number of subjects. Extensive simulation experiments and analyses of three real datasets show that our proposed RI metric can accurately characterize the information for parameters of interest for NLME models. The new information metric can be readily used to facilitate study designs and model diagnosis.
Assuntos
Interpretação Estatística de Dados , Modelos Estatísticos , Simulação por Computador , Modelos Biológicos , Dinâmica não Linear , Projetos de PesquisaRESUMO
BACKGROUND: Family-based design is one of the most popular designs in genetic studies. Transmission disequilibrium test (TDT) for family trio design is optimal only under the additive trait model and may lose power under the other trait models. The TDT-type tests are powerful only when the underlying trait model is correctly specified. Usually, the true trait model is unknown, and the selection of the TDT-type test is problematic. Several methods, which are robust against the mis-specification of the trait model, have been proposed. In this paper, we propose a new efficiency robust procedure for family trio design, namely, the weighted TDT (WTDT) test. METHODS: We combine information of the largest two TDT-type tests by using weights related to the three TDT-type tests and take the weighted sum as the test statistic. RESULTS: Simulation results demonstrate that WTDT has power close to, but much more robust than, the optimal TDT-type test based on a single trait model. WTDT also outperforms other efficiency robust methods in terms of power. Applications to real and simulated data from Genetic Analysis Workshop (GAW15) illustrate the practical application of the WTDT method. CONCLUSION: WTDT is not only efficiency robust to model mis-specifications but also efficiency robust against mis-specifications of risk allele.