Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
1.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34791019

RESUMO

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is responsible for millions of deaths around the world. To help contribute to the understanding of crucial knowledge and to further generate new hypotheses relevant to SARS-CoV-2 and human protein interactions, we make use of the information abundant Biomine probabilistic database and extend the experimentally identified SARS-CoV-2-human protein-protein interaction (PPI) network in silico. We generate an extended network by integrating information from the Biomine database, the PPI network and other experimentally validated results. To generate novel hypotheses, we focus on the high-connectivity sub-communities that overlap most with the integrated experimentally validated results in the extended network. Therefore, we propose a new data analysis pipeline that can efficiently compute core decomposition on the extended network and identify dense subgraphs. We then evaluate the identified dense subgraph and the generated hypotheses in three contexts: literature validation for uncovered virus targeting genes and proteins, gene function enrichment analysis on subgraphs and literature support on drug repurposing for identified tissues and diseases related to COVID-19. The major types of the generated hypotheses are proteins with their encoding genes and we rank them by sorting their connections to the integrated experimentally validated nodes. In addition, we compile a comprehensive list of novel genes, and proteins potentially related to COVID-19, as well as novel diseases which might be comorbidities. Together with the generated hypotheses, our results provide novel knowledge relevant to COVID-19 for further validation.


Assuntos
COVID-19 , Simulação por Computador , Modelos Biológicos , Mapas de Interação de Proteínas , COVID-19/genética , COVID-19/metabolismo , Humanos , SARS-CoV-2/química , SARS-CoV-2/genética , SARS-CoV-2/metabolismo
2.
Brief Bioinform ; 23(3)2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35368077

RESUMO

Survival analysis is a technique for identifying prognostic biomarkers and genetic vulnerabilities in cancer studies. Large-scale consortium-based projects have profiled >11 000 adult and >4000 pediatric tumor cases with clinical outcomes and multiomics approaches. This provides a resource for investigating molecular-level cancer etiologies using clinical correlations. Although cancers often arise from multiple genetic vulnerabilities and have deregulated gene sets (GSs), existing survival analysis protocols can report only on individual genes. Additionally, there is no systematic method to connect clinical outcomes with experimental (cell line) data. To address these gaps, we developed cSurvival (https://tau.cmmt.ubc.ca/cSurvival). cSurvival provides a user-adjustable analytical pipeline with a curated, integrated database and offers three main advances: (i) joint analysis with two genomic predictors to identify interacting biomarkers, including new algorithms to identify optimal cutoffs for two continuous predictors; (ii) survival analysis not only at the gene, but also the GS level; and (iii) integration of clinical and experimental cell line studies to generate synergistic biological insights. To demonstrate these advances, we report three case studies. We confirmed findings of autophagy-dependent survival in colorectal cancers and of synergistic negative effects between high expression of SLC7A11 and SLC2A1 on outcomes in several cancers. We further used cSurvival to identify high expression of the Nrf2-antioxidant response element pathway as a main indicator for lung cancer prognosis and for cellular resistance to oxidative stress-inducing drugs. Altogether, these analyses demonstrate cSurvival's ability to support biomarker prognosis and interaction analysis via gene- and GS-level approaches and to integrate clinical and experimental biomedical studies.


Assuntos
Biomarcadores Tumorais , Neoplasias Pulmonares , Adulto , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo , Linhagem Celular , Criança , Regulação Neoplásica da Expressão Gênica , Humanos , Neoplasias Pulmonares/genética , Análise de Sobrevida
3.
BMC Med Res Methodol ; 24(1): 83, 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-38589775

RESUMO

BACKGROUND: The timing of treating cancer patients is an essential factor in the efficacy of treatment. So, patients who will not respond to current therapy should receive a different treatment as early as possible. Machine learning models can be built to classify responders and nonresponders. Such classification models predict the probability of a patient being a responder. Most methods use a probability threshold of 0.5 to convert the probabilities into binary group membership. However, the cutoff of 0.5 is not always the optimal choice. METHODS: In this study, we propose a novel data-driven approach to select a better cutoff value based on the optimal cross-validation technique. To illustrate our novel method, we applied it to three clinical trial datasets of small-cell lung cancer patients. We used two different datasets to build a scoring system to segment patients. Then the models were applied to segment patients into the test data. RESULTS: We found that, in test data, the predicted responders and non-responders had significantly different long-term survival outcomes. Our proposed novel method segments patients better than the standard approach using a cutoff of 0.5. Comparing clinical outcomes of responders versus non-responders, our novel method had a p-value of 0.009 with a hazard ratio of 0.668 for grouping patients using the Cox proportion hazard model and a p-value of 0.011 using the accelerated failure time model which approved a significant difference between responders and non-responders. In contrast, the standard approach had a p-value of 0.194 with a hazard ratio of 0.823 using the Cox proportion hazard model and a p-value of 0.240 using the accelerated failure time model indicating the responders and non-responders do not differ significantly in survival. CONCLUSION: In summary, our novel prediction method can successfully segment new patients into responders and non-responders. Clinicians can use our prediction to decide if a patient should receive a different treatment or stay with the current treatment.


Assuntos
Neoplasias Pulmonares , Carcinoma de Pequenas Células do Pulmão , Humanos , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/terapia , Carcinoma de Pequenas Células do Pulmão/terapia , Resultado do Tratamento , Projetos de Pesquisa
4.
Environ Res ; 252(Pt 2): 118944, 2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-38636647

RESUMO

Paralytic shellfish toxins (PST) in shellfish products have led to severe risks to human health. To monitor the risk, the Canadian Shellfish Sanitation Program has been collecting longitudinal PST measurements in blue mussel (Mytilus edulis) and soft-shell clam (Mya arenaria) samples in six coastal provinces of Canada. The spatial distributions of major temporal variation patterns were studied via Functional Principal Component Analysis. Seasonal increases in PST contamination were found to vary the most in terms of magnitude along the coastlines, which provides support for location-specific management of the time-sensitive PST contamination. In British Columbia, the first functional principal component (FPC1) indicated the variance among the magnitudes, while FPC2 indicated the seasonality of the PST levels. The temporal variations tended to be positively correlated with the abundance of dianoflagellates Alexandrium spp., and negatively with precipitation and inorganic nutrients. These findings indicate the underlying mechanism of PST variation in various geographical settings. In New Brunswick, Prince Edward, and Nova Scotia, the top FPCs indicated that the PST contamination differed mostly in the seasonal increase of the PST level during summer.


Assuntos
Toxinas Marinhas , Estações do Ano , Animais , Estudos Longitudinais , Toxinas Marinhas/análise , Canadá , Monitoramento Ambiental , Mytilus edulis , Bivalves , Análise de Componente Principal , Dinoflagellida , Intoxicação por Frutos do Mar
5.
J Biomed Inform ; 146: 104501, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37742781

RESUMO

BACKGROUND: We often must conduct diagnostic tests on a massive volume of samples within a limited time during outbreaks of infectious diseases (e.g., COVID-19,screening) or repeat many times routinely (e.g., regular and massive screening for plant virus infections in farms). These tests aim to obtain the diagnostic result of all samples within a limited time. In such scenarios, the limitation of testing resources and human labor drives the need to pool individual samples and test them together to improve testing efficiency. When a pool is positive, further testing is required to identify the affected individuals; whereas when a pool is negative, we conclude all individuals in the pool are negative. How one splits the samples into pools is a critical factor affecting testing efficiency. OBJECTIVE: We aim to find the optimal strategy that adaptively guides users on optimally splitting the sample cohort into test-pools. METHODS: We developed an algorithm that minimizes the expected number of tests needed to obtain the diagnostic results of all samples. Our algorithm dynamically updates the critical information according to the result of the most recent test and calculates the optimal pool size for the next test. We implemented our novel adaptive sample pooling strategy into a web-based application, ADSP (https://ADSP.uvic.ca). ADSP interactively guides users on how many samples to be pooled for the current test, asks users to report the test result back and uses it to update the best strategy on how many samples to be pooled for the next test. RESULTS: We compared ADSP with other popular pooling methods in simulation studies, and found that ADSP requires fewer tests to diagnose a cohort and is more robust to the inaccurate initial estimate of the test cohort's disease prevalence. CONCLUSION: Our web-based application can help researchers decide how to pool their samples for grouped diagnostic tests. It improves test efficiency when grouped tests are conducted.


Assuntos
COVID-19 , Técnicas e Procedimentos Diagnósticos , Humanos , COVID-19/diagnóstico , COVID-19/epidemiologia , Teste para COVID-19 , Sensibilidade e Especificidade
6.
Chaos ; 32(5): 053127, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-35649972

RESUMO

User opinion affects the performance of network reconstruction greatly since it plays a crucial role in the network structure. In this paper, we present a novel model for reconstructing the social network with community structure by taking into account the Hegselmann-Krause bounded confidence model of opinion dynamic and compressive sensing method of network reconstruction. Three types of user opinion, including the random opinion, the polarity opinion, and the overlap opinion, are constructed. First, in Zachary's karate club network, the reconstruction accuracies are compared among three types of opinions. Second, the synthetic networks, generated by the Stochastic Block Model, are further examined. The experimental results show that the user opinions play a more important role than the community structure for the network reconstruction. Moreover, the polarity of opinions can increase the accuracy of inter-community and the overlap of opinions can improve the reconstruction accuracy of intra-community. This work helps reveal the mechanism between information propagation and social relation prediction.


Assuntos
Atitude , Processos Mentais , Rede Social
7.
Bioinformatics ; 36(1): 65-72, 2020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31263871

RESUMO

MOTIVATION: HIV is difficult to treat because its virus mutates at a high rate and mutated viruses easily develop resistance to existing drugs. If the relationships between mutations and drug resistances can be determined from historical data, patients can be provided personalized treatment according to their own mutation information. The HIV Drug Resistance Database was built to investigate the relationships. Our goal is to build a model using data in this database, which simultaneously predicts the resistance of multiple drugs using mutation information from sequences of viruses for any new patient. RESULTS: We propose two variations of a stacking algorithm which borrow information among multiple prediction tasks to improve multivariate prediction performance. The most attractive feature of our proposed methods is the flexibility with which complex multivariate prediction models can be constructed using any univariate prediction models. Using cross-validation studies, we show that our proposed methods outperform other popular multivariate prediction methods. AVAILABILITY AND IMPLEMENTATION: An R package is being developed. In the meantime, R code can be requested by email. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Farmacorresistência Viral , Infecções por HIV , HIV-1 , Biologia Computacional/métodos , Farmacorresistência Viral/genética , Infecções por HIV/virologia , HIV-1/efeitos dos fármacos , HIV-1/genética , Humanos , Mutação , Software
8.
Stat Med ; 40(7): 1752-1766, 2021 03 30.
Artigo em Inglês | MEDLINE | ID: mdl-33426649

RESUMO

As a future trend of healthcare, personalized medicine tailors medical treatments to individual patients. It requires to identify a subset of patients with the best response to treatment. The subset can be defined by a biomarker (eg, expression of a gene) and its cutoff value. Topics on subset identification have received massive attention. There are over two million hits by keyword searches on Google Scholar. However, designing clinical trials that utilize the discovered uncertain subsets/biomarkers is not trivial and rarely discussed in the literature. This leads to a gap between research results and real-world drug development. To fill in this gap, we formulate the problem of clinical trial design into an optimization problem involving high-dimensional integration, and propose a novel computational solution based on Monte Carlo and smoothing methods. Our method utilizes the modern techniques of general purpose computing on graphics processing units for large-scale parallel computing. Compared to a published method in three-dimensional problems, our approach is more accurate and 133 times faster. This advantage increases when dimensionality increases. Our method is scalable to higher dimensional problems since the precision bound of our estimated study power is a finite number not affected by dimensionality. To design clinical trials incorporating the potential biomarkers, users can use our software "DesignCTPB". This software can be found on Github and will be available as an R package on CRAN. Although our research is motivated by the design of clinical trials, the method can be used widely to solve other optimization problems involving high-dimensional integration.


Assuntos
Gráficos por Computador , Software , Algoritmos , Biomarcadores , Humanos , Método de Monte Carlo
9.
Invest New Drugs ; 36(4): 629-637, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-29196957

RESUMO

Background The signaling protein p38 mitogen-activated protein kinase (MAPK) regulates the tumor cell microenvironment, modulating cell survival, migration, and invasion. This phase 1 study evaluated the safety of p38 MAPK inhibitor LY3007113 in patients with advanced cancer to establish a recommended phase 2 dose. Methods In part A (dose escalation), LY3007113 was administered orally every 12 h (Q12H) at doses ranging from 20 mg to 200 mg daily on a 28-day cycle until the maximum tolerated dose (MTD) was reached. In part B (dose confirmation), patients received MTD. Safety, pharmacokinetics, pharmacodynamics, and tumor response data were evaluated. Results MTD was 30 mg Q12H. The most frequent treatment-related adverse events (>10%) were tremor, rash, stomatitis, increased blood creatine phosphokinase, and fatigue. Grade ≥ 3 treatment-related adverse events included upper gastrointestinal haemorrhage and increased hepatic enzyme, both occurring at 40 mg Q12H and considered dose-limiting toxicities. LY3007113 exhibited an approximately dose-proportional increase in exposure and time-independent pharmacokinetics after repeated dosing. Maximal inhibition (80%) of primary biomarker MAPK-activated protein kinase 2 in peripheral blood mononuclear cells was not reached, and sustained minimal inhibition (60%) was not maintained for 6 h after dosing to achieve a biologically effective dose (BED). The best overall response in part B was stable disease in 3 of 27 patients. Conclusions The recommended phase 2 dosage of LY3007113 was 30 mg Q12H. Three patients continued treatment after the first radiographic assessment, and the BED was not achieved. Further clinical development of this compound is not planned as toxicity precluded achieving a biologically effective dose.


Assuntos
Antineoplásicos/farmacocinética , Antineoplásicos/uso terapêutico , Neoplasias/tratamento farmacológico , Inibidores de Proteínas Quinases/farmacocinética , Inibidores de Proteínas Quinases/uso terapêutico , Proteínas Quinases p38 Ativadas por Mitógeno/antagonistas & inibidores , Adulto , Idoso , Biomarcadores Tumorais/metabolismo , Relação Dose-Resposta a Droga , Feminino , Humanos , Masculino , Dose Máxima Tolerável , Pessoa de Meia-Idade , Neoplasias/metabolismo , Resultado do Tratamento
10.
Bioinformatics ; 29(16): 2049-50, 2013 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-23786769

RESUMO

SUMMARY: MNase-Seq and ChIP-Seq have evolved as popular techniques to study chromatin and histone modification. Although many tools have been developed to identify enriched regions, software tools for nucleosome positioning are still limited. We introduce a flexible and powerful open-source R package, PING 2.0, for nucleosome positioning using MNase-Seq data or MNase- or sonicated- ChIP-Seq data combined with either single-end or paired-end sequencing. PING uses a model-based approach, which enables nucleosome predictions even in the presence of low read counts. We illustrate PING using two paired-end datasets from Saccharomyces cerevisiae and compare its performance with nucleR and ChIPseqR. AVAILABILITY: PING 2.0 is available from the Bioconductor website at http://bioconductor.org. It can run on Linux, Mac and Windows.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Nucleossomos/química , Análise de Sequência de DNA/métodos , Software , Saccharomyces cerevisiae/genética
11.
Sci Total Environ ; 933: 172817, 2024 Jul 10.
Artigo em Inglês | MEDLINE | ID: mdl-38688372

RESUMO

Shellfish poisonings have posed severe risks to human health globally. The Canadian Shellfish Sanitation Program was established in 1948 to monitor the toxin levels at shellfish harvesting sites along the coast of six provinces in Canada. Domoic acid has been a causal toxin for amnesic shellfish poisoning, and a macro-scale analysis of the temporal and spatial variation of domoic acid along Canada's coast was conducted in this study. We aggregated the toxin levels by week in blue mussel (Mytilus edulis) and soft-shell clam (Mya arenaria) samples, respectively, over a one-year scale. The subsequent application of Functional Principal Component Analysis unveiled that magnitudes of seasonal variation and peaked DA levels around early summer, spring, or mid-fall formed the largest variation in the toxin levels in blue mussels along the coastlines of British Columbia and Prince Edward Island and in soft-shell calms along those of New Brunswick and Nova Scotia. In Quebec, the DA levels were low and varied mostly in terms of the overall magnitude from spring to fall. Downstream correlation analyses in British Columbia further discovered that, at most sites, the strongest correlations were negative between precipitation as well as inorganic nutrients (including nitrate, nitrite, phosphate, and silicate) on one side and DA a few weeks afterward on the other. These findings indicated associations between amnesic shellfish poisoning and environmental stresses.


Assuntos
Monitoramento Ambiental , Ácido Caínico , Poluentes Químicos da Água , Ácido Caínico/análogos & derivados , Ácido Caínico/análise , Animais , Canadá , Poluentes Químicos da Água/análise , Toxinas Marinhas/análise , Bivalves , Mytilus edulis , Intoxicação por Frutos do Mar , Estações do Ano
12.
Contemp Clin Trials Commun ; 36: 101229, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38034840

RESUMO

This short communication concerns a biomarker adaptive Phase 2/3 design for new oncology drugs with an uncertain biomarker effect. Depending on the outcome of an interim analysis for adaptive decision, a Phase 2 study that starts in a biomarker enriched subpopulation may continue to the end without expansion to Phase 3, expand to Phase 3 in the same population or expand to Phase 3 in a broader population. Each path can enjoy full alpha for hypothesis testing without inflating the overall Type I error.

13.
Financ Innov ; 9(1): 39, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36687790

RESUMO

Full electronic automation in stock exchanges has recently become popular, generating high-frequency intraday data and motivating the development of near real-time price forecasting methods. Machine learning algorithms are widely applied to mid-price stock predictions. Processing raw data as inputs for prediction models (e.g., data thinning and feature engineering) can primarily affect the performance of the prediction methods. However, researchers rarely discuss this topic. This motivated us to propose three novel modelling strategies for processing raw data. We illustrate how our novel modelling strategies improve forecasting performance by analyzing high-frequency data of the Dow Jones 30 component stocks. In these experiments, our strategies often lead to statistically significant improvement in predictions. The three strategies improve the F1 scores of the SVM models by 0.056, 0.087, and 0.016, respectively. Supplementary Information: The online version contains supplementary material available at 10.1186/s40854-022-00431-9.

14.
Bioinform Adv ; 3(1): vbad030, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36949780

RESUMO

Motivation: Single-cell RNA-sequencing (scRNA-seq) technology enables researchers to investigate a genome at the cellular level with unprecedented resolution. An organism consists of a heterogeneous collection of cell types, each of which plays a distinct role in various biological processes. Hence, the first step of scRNA-seq data analysis is often to distinguish cell types so they can be investigated separately. Researchers have recently developed several automated cell-type annotation tools, requiring neither biological knowledge nor subjective human decisions. Dropout is a crucial characteristic of scRNA-seq data widely used in differential expression analysis. However, no current cell annotation method explicitly utilizes dropout information. Fully utilizing dropout information motivated this work. Results: We present scAnnotate, a cell annotation tool that fully utilizes dropout information. We model every gene's marginal distribution using a mixture model, which describes both the dropout proportion and the distribution of the non-dropout expression levels. Then, using an ensemble machine learning approach, we combine the mixture models of all genes into a single model for cell-type annotation. This combining approach can avoid estimating numerous parameters in the high-dimensional joint distribution of all genes. Using 14 real scRNA-seq datasets, we demonstrate that scAnnotate is competitive against nine existing annotation methods. Furthermore, because of its distinct modelling strategy, scAnnotate's misclassified cells differ greatly from competitor methods. This suggests using scAnnotate together with other methods could further improve annotation accuracy. Availability and implementation: We implemented scAnnotate as an R package and made it publicly available from CRAN: https://cran.r-project.org/package=scAnnotate. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

15.
Mar Pollut Bull ; 189: 114712, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-36827773

RESUMO

The vast coastline provides Canada with a flourishing seafood industry including bivalve shellfish production. To sustain a healthy bivalve molluscan shellfish production, the Canadian Shellfish Sanitation Program was established to monitor the health of shellfish harvesting habitats, and fecal coliform bacteria data have been collected at nearly 15,000 marine sample sites across six coastal provinces in Canada since 1979. We applied Functional Principal Component Analysis and subsequent correlation analyses to find annual variation patterns of bacteria levels at sites in each province. The overall magnitude and the seasonality of fecal contamination were modelled by functional principal component one and two, respectively. The amplitude was related to human and warm-blooded animal activities; the seasonality was strongly correlated with river discharge driven by precipitation and snow melt in British Columbia, but such correlation in provinces along the Atlantic coast could not be properly evaluated due to lack of data during winter.


Assuntos
Bivalves , Animais , Humanos , Estações do Ano , Frutos do Mar , Bactérias Gram-Negativas , Colúmbia Britânica
16.
Cells ; 12(24)2023 12 05.
Artigo em Inglês | MEDLINE | ID: mdl-38132091

RESUMO

BACKGROUND: Macrophages and monocytes orchestrate inflammatory processes in the lungs. However, their role in the pathogenesis of chronic obstructive pulmonary disease (COPD), an inflammatory condition, is not well known. Here, we determined the characteristics of these cells in lungs of COPD patients and identified novel therapeutic targets. METHODS: We analyzed the RNA sequencing (scRNA-seq) data of explanted human lung tissue from COPD (n = 18) and control (n = 28) lungs and found 16 transcriptionally distinct groups of macrophages and monocytes. We performed pathway and gene enrichment analyses to determine the characteristics of macrophages and monocytes from COPD (versus control) lungs and to identify the therapeutic targets, which were then validated using data from a randomized controlled trial of COPD patients (DISARM). RESULTS: In the alveolar macrophages, 176 genes were differentially expressed (83 up- and 93 downregulated; Padj < 0.05, |log2FC| > 0.5) and were enriched in downstream biological processes predicted to cause poor lipid uptake and impaired cell activation, movement, and angiogenesis in COPD versus control lungs. Classical monocytes from COPD lungs harbored a differential gene set predicted to cause the activation, mobilization, and recruitment of cells and a hyperinflammatory response to influenza. In silico, the corticosteroid fluticasone propionate was one of the top compounds predicted to modulate the abnormal transcriptional profiles of these cells. In vivo, a fluticasone-salmeterol combination significantly modulated the gene expression profiles of bronchoalveolar lavage cells of COPD patients (p < 0.05). CONCLUSIONS: COPD lungs harbor transcriptionally distinct lung macrophages and monocytes, reflective of a dysfunctional and hyperinflammatory state. Inhaled corticosteroids and other compounds can modulate the transcriptomic profile of these cells in patients with COPD.


Assuntos
Macrófagos Alveolares , Monócitos , Doença Pulmonar Obstrutiva Crônica , Humanos , Corticosteroides/farmacologia , Corticosteroides/uso terapêutico , Pulmão/metabolismo , Macrófagos/metabolismo , Macrófagos Alveolares/metabolismo , Monócitos/metabolismo , Ensaios Clínicos Controlados não Aleatórios como Assunto , Doença Pulmonar Obstrutiva Crônica/tratamento farmacológico , Doença Pulmonar Obstrutiva Crônica/genética , Doença Pulmonar Obstrutiva Crônica/metabolismo
17.
Am J Respir Crit Care Med ; 183(9): 1187-92, 2011 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-21216880

RESUMO

RATIONALE: There are no accepted blood-based biomarkers in chronic obstructive pulmonary disease (COPD). Pulmonary and activation-regulated chemokine (PARC/CCL-18) is a lung-predominant inflammatory protein that is found in serum. OBJECTIVES: To determine whether PARC/CCL-18 levels are elevated and modifiable in COPD and to determine their relationship to clinical end points of hospitalization and mortality. METHODS: PARC/CCL-18 was measured in serum samples from individuals who participated in the ECLIPSE (Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints) and LHS (Lung Health Study) studies and a prednisolone intervention study. MEASUREMENTS AND MAIN RESULTS: Serum PARC/CCL-18 levels were higher in subjects with COPD than in smokers or lifetime nonsmokers without COPD (105 vs. 81 vs. 80 ng/ml, respectively; P < 0.0001). Elevated PARC/CCL-18 levels were associated with increased risk of cardiovascular hospitalization or mortality in the LHS cohort and with total mortality in the ECLIPSE cohort. CONCLUSIONS: Serum PARC/CCL-18 levels are elevated in COPD and track clinical outcomes. PARC/CCL-18, a lung-predominant chemokine, could be a useful blood biomarker in COPD.


Assuntos
Quimiocinas CC/sangue , Doença Pulmonar Obstrutiva Crônica/sangue , Anti-Inflamatórios/uso terapêutico , Biomarcadores/sangue , Estudos de Coortes , Ensaio de Imunoadsorção Enzimática , Feminino , Hospitalização/estatística & dados numéricos , Humanos , Estimativa de Kaplan-Meier , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , Prednisolona/uso terapêutico , Doença Pulmonar Obstrutiva Crônica/tratamento farmacológico , Fatores de Risco , Fumar/sangue
18.
Front Genet ; 13: 992070, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36212148

RESUMO

Deep Learning (DL) has been broadly applied to solve big data problems in biomedical fields, which is most successful in image processing. Recently, many DL methods have been applied to analyze genomic studies. However, genomic data usually has too small a sample size to fit a complex network. They do not have common structural patterns like images to utilize pre-trained networks or take advantage of convolution layers. The concern of overusing DL methods motivates us to evaluate DL methods' performance versus popular non-deep Machine Learning (ML) methods for analyzing genomic data with a wide range of sample sizes. In this paper, we conduct a benchmark study using the UK Biobank data and its many random subsets with different sample sizes. The original UK Biobank data has about 500k participants. Each patient has comprehensive patient characteristics, disease histories, and genomic information, i.e., the genotypes of millions of Single-Nucleotide Polymorphism (SNPs). We are interested in predicting the risk of three lung diseases: asthma, COPD, and lung cancer. There are 205,238 participants have recorded disease outcomes for these three diseases. Five prediction models are investigated in this benchmark study, including three non-deep machine learning methods (Elastic Net, XGBoost, and SVM) and two deep learning methods (DNN and LSTM). Besides the most popular performance metrics, such as the F1-score, we promote the hit curve, a visual tool to describe the performance of predicting rare events. We discovered that DL methods frequently fail to outperform non-deep ML in analyzing genomic data, even in large datasets with over 200k samples. The experiment results suggest not overusing DL methods in genomic studies, even with biobank-level sample sizes. The performance differences between DL and non-deep ML decrease as the sample size of data increases. This suggests when the sample size of data is significant, further increasing sample sizes leads to more performance gain in DL methods. Hence, DL methods could be better if we analyze genomic data bigger than this study.

19.
Front Genet ; 13: 836798, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35281805

RESUMO

The new technology of single-cell RNA sequencing (scRNA-seq) can yield valuable insights into gene expression and give critical information about the cellular compositions of complex tissues. In recent years, vast numbers of scRNA-seq datasets have been generated and made publicly available, and this has enabled researchers to train supervised machine learning models for predicting or classifying various cell-level phenotypes. This has led to the development of many new methods for analyzing scRNA-seq data. Despite the popularity of such applications, there has as yet been no systematic investigation of the performance of these supervised algorithms using predictors from various sizes of scRNA-seq datasets. In this study, 13 popular supervised machine learning algorithms for cell phenotype classification were evaluated using published real and simulated datasets with diverse cell sizes. This benchmark comprises two parts. In the first, real datasets were used to assess the computing speed and cell phenotype classification performance of popular supervised algorithms. The classification performances were evaluated using the area under the receiver operating characteristic curve, F1-score, Precision, Recall, and false-positive rate. In the second part, we evaluated gene-selection performance using published simulated datasets with a known list of real genes. The results showed that ElasticNet with interactions performed the best for small and medium-sized datasets. The NaiveBayes classifier was found to be another appropriate method for medium-sized datasets. With large datasets, the performance of the XGBoost algorithm was found to be excellent. Ensemble algorithms were not found to be significantly superior to individual machine learning methods. Including interactions in the ElasticNet algorithm caused a significant performance improvement for small datasets. The linear discriminant analysis algorithm was found to be the best choice when speed is critical; it is the fastest method, it can scale to handle large sample sizes, and its performance is not much worse than the top performers.

20.
Biometrics ; 67(1): 151-63, 2011 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-20528864

RESUMO

ChIP-seq combines chromatin immunoprecipitation with massively parallel short-read sequencing. While it can profile genome-wide in vivo transcription factor-DNA association with higher sensitivity, specificity, and spatial resolution than ChIP-chip, it poses new challenges for statistical analysis that derive from the complexity of the biological systems characterized and from variability and biases in its sequence data. We propose a method called PICS (Probabilistic Inference for ChIP-seq) for identifying regions bound by transcription factors from aligned reads. PICS identifies binding event locations by modeling local concentrations of directional reads, and uses DNA fragment length prior information to discriminate closely adjacent binding events via a Bayesian hierarchical t-mixture model. It uses precalculated, whole-genome read mappability profiles and a truncated t-distribution to adjust binding event models for reads that are missing due to local genome repetitiveness. It estimates uncertainties in model parameters that can be used to define confidence regions on binding event locations and to filter estimates. Finally, PICS calculates a per-event enrichment score relative to a control sample, and can use a control sample to estimate a false discovery rate. Using published GABP and FOXA1 data from human cell lines, we show that PICS' predicted binding sites were more consistent with computationally predicted binding motifs than the alternative methods MACS, QuEST, CisGenome, and USeq. We then use a simulation study to confirm that PICS compares favorably to these methods and is robust to model misspecification.


Assuntos
Algoritmos , Imunoprecipitação da Cromatina/métodos , DNA/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Sequência de Bases , Simulação por Computador , Modelos Genéticos , Modelos Estatísticos , Dados de Sequência Molecular
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA