RESUMO
We present a robust Dirichlet process for estimating survival functions from samples with right-censored data. It adopts a prior near-ignorance approach to avoid almost any assumption about the distribution of the population lifetimes, as well as the need of eliciting an infinite dimensional parameter (in case of lack of prior information), as it happens with the usual Dirichlet process prior. We show how such model can be used to derive robust inferences from right-censored lifetime data. Robustness is due to the identification of the decisions that are prior-dependent, and can be interpreted as an analysis of sensitivity with respect to the hypothetical inclusion of fictitious new samples in the data. In particular, we derive a nonparametric estimator of the survival probability and a hypothesis test about the probability that the lifetime of an individual from one population is shorter than the lifetime of an individual from another. We evaluate these ideas on simulated data and on the Australian AIDS survival dataset. The methods are publicly available through an easy-to-use R package.
Assuntos
Biometria/métodos , Síndrome da Imunodeficiência Adquirida/epidemiologia , Feminino , Humanos , Masculino , Modelos Estatísticos , Probabilidade , Análise de SobrevidaRESUMO
Marginal zone B-cell lymphomas (MZLs) have been divided into 3 distinct subtypes (extranodal MZLs of mucosa-associated lymphoid tissue [MALT] type, nodal MZLs, and splenic MZLs). Nevertheless, the relationship between the subtypes is still unclear. We performed a comprehensive analysis of genomic DNA copy number changes in a very large series of MZL cases with the aim of addressing this question. Samples from 218 MZL patients (25 nodal, 57 MALT, 134 splenic, and 2 not better specified MZLs) were analyzed with the Affymetrix Human Mapping 250K SNP arrays, and the data combined with matched gene expression in 33 of 218 cases. MALT lymphoma presented significantly more frequently gains at 3p, 6p, 18p, and del(6q23) (TNFAIP3/A20), whereas splenic MZLs was associated with del(7q31), del(8p). Nodal MZLs did not show statistically significant differences compared with MALT lymphoma while lacking the splenic MZLs-related 7q losses. Gains of 3q and 18q were common to all 3 subtypes. del(8p) was often present together with del(17p) (TP53). Although del(17p) did not determine a worse outcome and del(8p) was only of borderline significance, the presence of both deletions had a highly significant negative impact on the outcome of splenic MZLs.
Assuntos
Impressões Digitais de DNA , Perfilação da Expressão Gênica , Linfoma de Zona Marginal Tipo Células B/genética , Polimorfismo de Nucleotídeo Único/genética , Neoplasias Esplênicas/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Aberrações Cromossômicas , Hibridização Genômica Comparativa , Feminino , Regulação Neoplásica da Expressão Gênica , Genoma Humano , Humanos , Linfoma de Zona Marginal Tipo Células B/classificação , Linfoma de Zona Marginal Tipo Células B/patologia , Masculino , Pessoa de Meia-Idade , Prognóstico , Neoplasias Esplênicas/classificação , Neoplasias Esplênicas/patologia , Adulto JovemRESUMO
Despite recent therapeutic improvements, the clinical course of diffuse large B-cell lymphoma (DLBCL) still differs considerably among patients. We conducted this retrospective multi-centre study to evaluate the impact of genomic aberrations detected using a high-density genome wide-single nucleotide polymorphism-based array on clinical outcome in a population of DLBCL patients treated with R-CHOP-21 (rituximab, cyclophosphamide, doxorubicine, vincristine and prednisone repeated every 21 d). 166 DNA samples were analysed using the GeneChip Human Mapping 250K NspI. Genomic anomalies were analysed regarding their impact on the clinical course of 124 patients treated with R-CHOP-21. Unsupervised clustering was performed to identify genetically related subgroups of patients with different clinical outcomes. Twenty recurrent genetic lesions showed an impact on the clinical course. Loss of genomic material at 8p23.1 showed the strongest statistical significance and was associated with additional aberrations, such as 17p- and 15q-. Unsupervised clustering identified five DLBCL clusters with distinct genetic profiles, clinical characteristics and outcomes. Genetic features and clusters, associated with a different outcome in patients treated with R-CHOP, have been identified by arrayCGH.
Assuntos
Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , Aberrações Cromossômicas , Linfoma Difuso de Grandes Células B/tratamento farmacológico , Linfoma Difuso de Grandes Células B/genética , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Anticorpos Monoclonais Murinos/administração & dosagem , Protocolos de Quimioterapia Combinada Antineoplásica/administração & dosagem , Deleção Cromossômica , Cromossomos Humanos Par 8/genética , Hibridização Genômica Comparativa , Ciclofosfamida/administração & dosagem , Doxorrubicina/administração & dosagem , Métodos Epidemiológicos , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único , Prednisona/administração & dosagem , Rituximab , Transdução de Sinais/genética , Resultado do Tratamento , Vincristina/administração & dosagem , Adulto JovemRESUMO
The International Collegiate Programming Contest is an annual, multi-tier competition held amongst college students on a global scale, with world championships every year. Last year alone, around fifty thousand students from three thousand universities participated in ICPC regional competitions. Because of its significant size involving a lot of talent and skillful people, multiple stakeholders are interested in the competition. Each of the competitions results in scoreboards, containing valuable data about the performance of teams. This data however is, up till now, never collected and stored in an open and free repository. The ICPC does keep track of the basic information such as teams' names and their final scores, but more detailed information has remained scattered across the internet. This paper describes the data collected and cleaned from the European, Latin-American, North American, South Pacific and World Finals from 2012 to 2018, opening up research opportunities for an in-depth look into the programming competitions.
RESUMO
In the study of complex genetic diseases, the identification of subgroups of patients sharing similar genetic characteristics represents a challenging task, for example, to improve treatment decision. One type of genetic lesion, frequently investigated in such disorders, is the change of the DNA copy number (CN) at specific genomic traits. Non-negative Matrix Factorization (NMF) is a standard technique to reduce the dimensionality of a data set and to cluster data samples, while keeping its most relevant information in meaningful components. Thus, it can be used to discover subgroups of patients from CN profiles. It is however computationally impractical for very high dimensional data, such as CN microarray data. Deciding the most suitable number of subgroups is also a challenging problem. The aim of this work is to derive a procedure to compact high dimensional data, in order to improve NMF applicability without compromising the quality of the clustering. This is particularly important for analyzing high-resolution microarray data. Many commonly used quality measures, as well as our own measures, are employed to decide the number of subgroups and to assess the quality of the results. Our measures are based on the idea of identifying robust subgroups, inspired by biologically/clinically relevance instead of simply aiming at well-separated clusters. We evaluate our procedure using four real independent data sets. In these data sets, our method was able to find accurate subgroups with individual molecular and clinical features and outperformed the standard NMF in terms of accuracy in the factorization fitness function. Hence, it can be useful for the discovery of subgroups of patients with similar CN profiles in the study of heterogeneous diseases.