Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
Más filtros

Intervalo de año de publicación
1.
Ann Hematol ; 102(2): 447-456, 2023 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-36422672

RESUMEN

The SARS-CoV-2 pandemic has favored the expansion of telemedicine. Philadelphia-negative chronic myeloproliferative neoplasms (Ph-MPN) might be good candidates for virtual follow-up. In this study, we aimed to analyze the follow-up of patients with Ph-MPN in Spain during COVID-19, its effectiveness, and acceptance among patients. We present a multicenter retrospective study from 30 centers. Five hundred forty-one patients were included with a median age of 67 years (yr). With a median follow-up of 19 months, 4410 appointments were recorded. The median of visits per patient was 7 and median periodicity was 2.7 months; significantly more visits and a higher frequency of them were registered in myelofibrosis (MF) patients. 60.1% of visits were in-person, 39.5% were by telephone, and 0.3% were videocall visits, with a predominance of telephone visits for essential thrombocythemia (ET) and polycythemia vera (PV) patients over MF, as well as for younger patients (< 50 yr). The proportion of phone visits significantly decreased after the first semester of the pandemic. Pharmacological modifications were performed only in 25.7% of the visits, and, considering overall management, ET patients needed fewer global treatment changes. Telephone contact effectiveness reached 90% and only 5.4% required a complementary in-person appointment. Although 56.2% of the cohort preferred in-person visits, 90.5% of our patients claimed to be satisfied with follow-up during the pandemic, with an 83% of positive comments. In view of our results, telemedicine has proven effective and efficient, and might continue to play a complementary role in Ph-MPN patients' follow-up.


Asunto(s)
COVID-19 , Trastornos Mieloproliferativos , Policitemia Vera , Mielofibrosis Primaria , Trombocitemia Esencial , Humanos , Anciano , Pandemias , Estudios Retrospectivos , Satisfacción del Paciente , España/epidemiología , SARS-CoV-2 , Trastornos Mieloproliferativos/epidemiología , Trastornos Mieloproliferativos/terapia , Policitemia Vera/epidemiología , Mielofibrosis Primaria/epidemiología , Trombocitemia Esencial/epidemiología
2.
J Transl Med ; 20(1): 373, 2022 08 18.
Artículo en Inglés | MEDLINE | ID: mdl-35982500

RESUMEN

BACKGROUND: Recently, extensive cancer genomic studies have revealed mutational and clinical data of large cohorts of cancer patients. For example, the Pan-Lung Cancer 2016 dataset (part of The Cancer Genome Atlas project), summarises the mutational and clinical profiles of different subtypes of Lung Cancer (LC). Mutational and clinical signatures have been used independently for tumour typification and prediction of metastasis in LC patients. Is it then possible to achieve better typifications and predictions when combining both data streams? METHODS: In a cohort of 1144 Lung Adenocarcinoma (LUAD) and Lung Squamous Cell Carcinoma (LSCC) patients, we studied the number of missense mutations (hereafter, the Total Mutational Load TML) and distribution of clinical variables, for different classes of patients. Using the TML and different sets of clinical variables (tumour stage, age, sex, smoking status, and packs of cigarettes smoked per year), we built Random Forest classification models that calculate the likelihood of developing metastasis. RESULTS: We found that LC patients different in age, smoking status, and tumour type had significantly different mean TMLs. Although TML was an informative feature, its effect was secondary to the "tumour stage" feature. However, its contribution to the classification is not redundant with the latter; models trained using both TML and tumour stage performed better than models trained using only one of these variables. We found that models trained in the entire dataset (i.e., without using dimensionality reduction techniques) and without resampling achieved the highest performance, with an F1 score of 0.64 (95%CrI [0.62, 0.66]). CONCLUSIONS: Clinical variables and TML should be considered together when assessing the likelihood of LC patients progressing to metastatic states, as the information these encode is not redundant. Altogether, we provide new evidence of the need for comprehensive diagnostic tools for metastasis.


Asunto(s)
Adenocarcinoma del Pulmón , Carcinoma de Pulmón de Células no Pequeñas , Carcinoma de Células Escamosas , Neoplasias Pulmonares , Adenocarcinoma del Pulmón/genética , Adenocarcinoma del Pulmón/patología , Carcinoma de Pulmón de Células no Pequeñas/patología , Carcinoma de Células Escamosas/genética , Humanos , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/patología , Mutación/genética
3.
Bioinformatics ; 35(20): 4120-4128, 2019 10 15.
Artículo en Inglés | MEDLINE | ID: mdl-30887042

RESUMEN

MOTIVATION: Genome repositories are growing faster than our storage capacities, challenging our ability to store, transmit, process and analyze them. While genomes are not very compressible individually, those repositories usually contain myriads of genomes or genome reads of the same species, thereby creating opportunities for orders-of-magnitude compression by exploiting inter-genome similarities. A useful compression system, however, cannot be only usable for archival, but it must allow direct access to the sequences, ideally in transparent form so that applications do not need to be rewritten. RESULTS: We present a highly compressed filesystem that specializes in storing large collections of genomes and reads. The system obtains orders-of-magnitude compression by using Relative Lempel-Ziv, which exploits the high similarities between genomes of the same species. The filesystem transparently stores the files in compressed form, intervening the system calls of the applications without the need to modify them. A client/server variant of the system stores the compressed files in a server, while the client's filesystem transparently retrieves and updates the data from the server. The data between client and server are also transferred in compressed form, which saves an order of magnitude network time. AVAILABILITY AND IMPLEMENTATION: The C++ source code of our implementation is available for download in https://github.com/vsepulve/relz_fs.


Asunto(s)
Compresión de Datos , Genoma , Programas Informáticos
4.
Rev Clin Esp (Barc) ; 223(6): 340-349, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37105383

RESUMEN

AIMS: The aim is to evaluate a management program for direct oral anticoagulants (DOACs) in non-valvular atrial fibrillation (NVAF) patients according to their profiles, appropriateness of dosing, patterns of crossover, effectiveness and safety. This is an observational and longitudinal prospective study in a cohort of patients attended in daily clinical practice in a regional hospital in Spain with 3-year a follow-up plan for patients initiating dabigatran, rivaroxaban or apixaban between JAN/2012-DEC/2016. METHODS: We analyzed 490 episodes of treatment (apixaban 2.5 9.4%, apixaban 5 21.4%, dabigatran 75 0.6%, dabigatran 110 12,4%, dabigatran 150 19.8%, rivaroxaban 15 17.8% and rivaroxaban 20 18.6%) in 445 patients. 13.6% of patients on dabigatran, 9.7% on rivaroxaban, and 3.9% on apixaban switched to other DOACs or changed dosing. RESULTS: Apixaban was the most frequent DOAC switched to. The most frequent reasons for switching were toxicity (23.8%), bleeding (21.4%) and renal deterioration (16.7%). Inappropriateness of dose was found in 23.8% of episodes. Rates of stroke/transient ischemic attack (TIA) were 1.64/0.54 events/100 patients-years, while rates of major, clinically relevant non-major (CRNM) bleeding and intracranial bleeding were 2.4, 5, and 0.5 events/100 patients-years. Gastrointestinal and genitourinary bleeding were the most common type of bleeding events (BE). On multivariable analysis, prior stroke and age were independent predictors of stroke/TIA. Concurrent platelet inhibitors, male gender and age were independent predictors of BE. CONCLUSION: This study complements the scant data available on the use of DOACs in NVAF patients in Spain, confirming a good safety and effectiveness profile.


Asunto(s)
Fibrilación Atrial , Ataque Isquémico Transitorio , Accidente Cerebrovascular , Humanos , Masculino , Fibrilación Atrial/complicaciones , Fibrilación Atrial/tratamiento farmacológico , Fibrilación Atrial/inducido químicamente , Rivaroxabán/efectos adversos , Dabigatrán/efectos adversos , Anticoagulantes/efectos adversos , Ataque Isquémico Transitorio/inducido químicamente , Ataque Isquémico Transitorio/tratamiento farmacológico , Estudios Prospectivos , España , Accidente Cerebrovascular/prevención & control , Accidente Cerebrovascular/inducido químicamente , Hemorragia/inducido químicamente , Hemorragia/epidemiología , Hemorragia/tratamiento farmacológico , Estudios Retrospectivos
5.
Hemasphere ; 7(8): e936, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-37476303

RESUMEN

The International Prognostic Score of thrombosis in Essential Thrombocythemia (IPSET-thrombosis) and its revised version have been proposed to guide thrombosis prevention strategies. We evaluated both classifications to prognosticate thrombosis in 1366 contemporary essential thrombocythemia (ET) patients prospectively followed from the Spanish Registry of ET. The cumulative incidence of thrombosis at 10 years, taking death as a competing risk, was 11.4%. The risk of thrombosis was significantly higher in the high-risk IPSET-thrombosis and high-risk revised IPSET-thrombosis, but no differences were observed among the lower risk categories. Patients allocated in high-risk IPSET-thrombosis (subdistribution hazard ratios [SHR], 3.7 [95% confidence interval, CI, 1.6-8.7]) and high-risk revised IPSET-thrombosis (SHR, 3.2 [95% CI, 1.4-7.45]) showed an increased risk of arterial thrombosis, whereas both scoring systems failed to predict venous thrombosis. The incidence rate of thrombosis in intermediate risk revised IPSET-thrombosis (aged >60 years, JAK2-negative, and no history of thrombosis) was very low regardless of the treatment administered (0.9% and 0% per year with and without cytoreduction, respectively). Dynamic application of the revised IPSET-thrombosis showed a low rate of thrombosis when patients without history of prior thrombosis switched to a higher risk category after reaching 60 years of age. In conclusion, IPSET-thrombosis scores are useful for identifying patients at high risk of arterial thrombosis, whereas they fail to predict venous thrombosis. Controlled studies are needed to determine the appropriate treatment of ET patients assigned to the non-high-risk categories.

6.
J Clin Med ; 12(20)2023 Oct 10.
Artículo en Inglés | MEDLINE | ID: mdl-37892566

RESUMEN

Primary immune thrombocytopenia (ITP) is a complex autoimmune disease whose hallmark is a deregulation of cellular and humoral immunity leading to increased destruction and reduced production of platelets. The heterogeneity of presentation and clinical course hampers personalized approaches for diagnosis and management. In 2021, the Spanish ITP Group (GEPTI) of the Spanish Society of Hematology and Hemotherapy (SEHH) updated a consensus document that had been launched in 2011. The updated guidelines have been the reference for the diagnosis and management of primary ITP in Spain ever since. Nevertheless, the emergence of new tools and strategies makes it advisable to review them again. For this reason, we have updated the main recommendations appropriately. Our aim is to provide a practical tool to facilitate the integral management of all aspects of primary ITP management.

7.
BMC Ecol ; 12: 1, 2012 Jan 27.
Artículo en Inglés | MEDLINE | ID: mdl-22284854

RESUMEN

BACKGROUND: The Andes-Amazon basin of Peru and Bolivia is one of the most data-poor, biologically rich, and rapidly changing areas of the world. Conservation scientists agree that this area hosts extremely high endemism, perhaps the highest in the world, yet we know little about the geographic distributions of these species and ecosystems within country boundaries. To address this need, we have developed conservation data on endemic biodiversity (~800 species of birds, mammals, amphibians, and plants) and terrestrial ecological systems (~90; groups of vegetation communities resulting from the action of ecological processes, substrates, and/or environmental gradients) with which we conduct a fine scale conservation prioritization across the Amazon watershed of Peru and Bolivia. We modelled the geographic distributions of 435 endemic plants and all 347 endemic vertebrate species, from existing museum and herbaria specimens at a regional conservation practitioner's scale (1:250,000-1:1,000,000), based on the best available tools and geographic data. We mapped ecological systems, endemic species concentrations, and irreplaceable areas with respect to national level protected areas. RESULTS: We found that sizes of endemic species distributions ranged widely (< 20 km2 to > 200,000 km2) across the study area. Bird and mammal endemic species richness was greatest within a narrow 2500-3000 m elevation band along the length of the Andes Mountains. Endemic amphibian richness was highest at 1000-1500 m elevation and concentrated in the southern half of the study area. Geographical distribution of plant endemism was highly taxon-dependent. Irreplaceable areas, defined as locations with the highest number of species with narrow ranges, overlapped slightly with areas of high endemism, yet generally exhibited unique patterns across the study area by species group. We found that many endemic species and ecological systems are lacking national-level protection; a third of endemic species have distributions completely outside of national protected areas. Protected areas cover only 20% of areas of high endemism and 20% of irreplaceable areas. Almost 40% of the 91 ecological systems are in serious need of protection (= < 2% of their ranges protected). CONCLUSIONS: We identify for the first time, areas of high endemic species concentrations and high irreplaceability that have only been roughly indicated in the past at the continental scale. We conclude that new complementary protected areas are needed to safeguard these endemics and ecosystems. An expansion in protected areas will be challenged by geographically isolated micro-endemics, varied endemic patterns among taxa, increasing deforestation, resource extraction, and changes in climate. Relying on pre-existing collections, publically accessible datasets and tools, this working framework is exportable to other regions plagued by incomplete conservation data.


Asunto(s)
Biodiversidad , Conservación de los Recursos Naturales/métodos , Demografía , Ecosistema , Modelos Teóricos , Animales , Bolivia , Geografía , Mapas como Asunto , Perú , Especificidad de la Especie
9.
Vaccines (Basel) ; 10(6)2022 Jun 16.
Artículo en Inglés | MEDLINE | ID: mdl-35746569

RESUMEN

Worldwide vaccination against SARS-CoV-2 has allowed the detection of hematologic autoimmune complications. Adverse events (AEs) of this nature had been previously observed in association with other vaccines. The underlying mechanisms are not totally understood, although mimicry between viral and self-antigens plays a relevant role. It is important to remark that, although the incidence of these AEs is extremely low, their evolution may lead to life-threatening scenarios if treatment is not readily initiated. Hematologic autoimmune AEs have been associated with both mRNA and adenoviral vector-based SARS-CoV-2 vaccines. The main reported entities are secondary immune thrombocytopenia, immune thrombotic thrombocytopenic purpura, autoimmune hemolytic anemia, Evans syndrome, and a newly described disorder, so-called vaccine-induced immune thrombotic thrombocytopenia (VITT). The hallmark of VITT is the presence of anti-platelet factor 4 autoantibodies able to trigger platelet activation. Patients with VITT present with thrombocytopenia and may develop thrombosis in unusual locations such as cerebral beds. The management of hematologic autoimmune AEs does not differ significantly from that of these disorders in a non-vaccine context, thus addressing autoantibody production and bleeding/thromboembolic risk. This means that clinicians must be aware of their distinctive signs in order to diagnose them and initiate treatment as soon as possible.

10.
Pharmaceuticals (Basel) ; 15(7)2022 Jun 23.
Artículo en Inglés | MEDLINE | ID: mdl-35890078

RESUMEN

Primary immune thrombocytopenia (ITP) is an autoimmune disorder that causes low platelet counts and subsequent bleeding risk. Although current corticosteroid-based ITP therapies are able to improve platelet counts, up to 70% of subjects with an ITP diagnosis do not achieve a sustained clinical response in the absence of treatment, thus requiring a second-line therapy option as well as additional care to prevent bleeding. Less than 40% of patients treated with thrombopoietin analogs, 60% of those treated with splenectomy, and 20% or fewer of those treated with rituximab or fostamatinib reach sustained remission in the absence of treatment. Therefore, optimizing therapeutic options for ITP management is mandatory. The pathophysiology of ITP is complex and involves several mechanisms that are apparently unrelated. These include the clearance of autoantibody-coated platelets by splenic macrophages or by the complement system, hepatic desialylated platelet destruction, and the inhibition of platelet production from megakaryocytes. The number of pathways involved may challenge treatment, but, at the same time, offer the possibility of unveiling a variety of new targets as the knowledge of the involved mechanisms progresses. The aim of this work, after revising the limitations of the current treatments, is to perform a thorough review of the mechanisms of action, pharmacokinetics/pharmacodynamics, efficacy, safety, and development stage of the novel ITP therapies under investigation. Hopefully, several of the options included herein may allow us to personalize ITP management according to the needs of each patient in the near future.

11.
BMJ Open ; 12(11): e062873, 2022 11 04.
Artículo en Inglés | MEDLINE | ID: mdl-36332946

RESUMEN

INTRODUCTION: To date, no pancreatic stump closure technique has been shown to be superior to any other in distal pancreatectomy. Although several studies have shown a trend towards better results in transection using a radiofrequency device (radiofrequency-assisted transection (RFT)), no randomised trial for this purpose has been performed to date. Therefore, we designed a randomised clinical trial, with the hypothesis that this technique used in distal pancreatectomies is superior in reducing clinically relevant postoperative pancreatic fistula (CR-POPF) than mechanical closures. METHODS AND ANALYSIS: TRANSPAIRE is a multicentre randomised controlled trial conducted in seven Spanish pancreatic centres that includes 112 patients undergoing elective distal pancreatectomy for any indication who will be randomly assigned to RFT or classic stapler transections (control group) in a ratio of 1:1. The primary outcome is the CR-POPF percentage. Sample size is calculated with the following assumptions: 5% one-sided significance level (α), 80% power (1-ß), expected POPF in control group of 32%, expected POPF in RFT group of 10% and a clinically relevant difference of 22%. Secondary outcomes include postoperative results, complications, radiological evaluation of the pancreatic stump, metabolomic profile of postoperative peritoneal fluid, survival and quality of life. Follow-ups will be carried out in the external consultation at 1, 6 and 12 months postoperatively. ETHICS AND DISSEMINATION: TRANSPAIRE has been approved by the CEIM-PSMAR Ethics Committee. This project is being carried out in accordance with national and international guidelines, the basic principles of protection of human rights and dignity established in the Declaration of Helsinki (64th General Assembly, Fortaleza, Brazil, October 2013), and in accordance with regulations in studies with biological samples, Law 14/2007 on Biomedical Research will be followed. We have defined a dissemination strategy, whose main objective is the participation of stakeholders and the transfer of knowledge to support the exploitation of activities. REGISTRATION DETAILS: ClinicalTrials.gov Registry (NCT04402346).


Asunto(s)
Pancreatectomía , Humanos , Estudios Multicéntricos como Asunto , Páncreas/cirugía , Pancreatectomía/efectos adversos , Pancreatectomía/métodos , Fístula Pancreática/etiología , Fístula Pancreática/prevención & control , Complicaciones Posoperatorias/etiología , Calidad de Vida , Ensayos Clínicos Controlados Aleatorios como Asunto , Factores de Riesgo
12.
Proc Worksh Algorithm Eng Exp ; 2021: 60-72, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-35355938

RESUMEN

Prefix-free parsing (PFP) was introduced by Boucher et al. (2019) as a preprocessing step to ease the computation of Burrows-Wheeler Transforms (BWTs) of genomic databases. Given a string S, it produces a dictionary D and a parse P of overlapping phrases such that BWT(S) can be computed from D and P in time and workspace bounded in terms of their combined size |PFP(S)|. In practice D and P are significantly smaller than S and computing BWT(S) from them is more efficient than computing it from S directly, at least when S is the concatenation of many genomes. In this paper, we consider PFP(S) as a data structure and show how it can be augmented to support full suffix tree functionality, still built and fitting within O(|PFP(S)|) space. This entails the efficient computation of various primitives to simulate the suffix tree: computing a longest common extension (LCE) of two positions in S; reading any cell of its suffix array (SA), of its inverse (ISA), of its BWT, and of its longest common prefix array (LCP); and computing minima over ranges and next/previous smaller value queries over the LCP. Our experimental results show that the PFP suffix tree can be efficiently constructed for very large repetitive datasets and that its operations perform competitively with other compressed suffix trees that can only handle much smaller datasets.

13.
Proc Data Compress Conf ; 2021: 193-202, 2021 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-34778549

RESUMEN

Computing the matching statistics of patterns with respect to a text is a fundamental task in bioinformatics, but a formidable one when the text is a highly compressed genomic database. Bannai et al. gave an efficient solution for this case, which Rossi et al. recently implemented, but it uses two passes over the patterns and buffers a pointer for each character during the first pass. In this paper, we simplify their solution and make it streaming, at the cost of slowing it down slightly. This means that, first, we can compute the matching statistics of several long patterns (such as whole human chromosomes) in parallel while still using a reasonable amount of RAM; second, we can compute matching statistics online with low latency and thus quickly recognize when a pattern becomes incompressible relative to the database. Our code is available at https://github.com/koeppl/phoni.

15.
J Bioinform Comput Biol ; 17(3): 1950011, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-31230498

RESUMEN

Signaling pathways are responsible for the regulation of cell processes, such as monitoring the external environment, transmitting information across membranes, and making cell fate decisions. Given the increasing amount of biological data available and the recent discoveries showing that many diseases are related to the disruption of cellular signal transduction cascades, in silico discovery of signaling pathways in cell biology has become an active research topic in past years. However, reconstruction of signaling pathways remains a challenge mainly because of the need for systematic approaches for predicting causal relationships, like edge direction and activation/inhibition among interacting proteins in the signal flow. We propose an approach for predicting signaling pathways that integrates protein interactions, gene expression, phenotypes, and protein complex information. Our method first finds candidate pathways using a directed-edge-based algorithm and then defines a graph model to include causal activation relationships among proteins, in candidate pathways using cell cycle gene expression and phenotypes to infer consistent pathways in yeast. Then, we incorporate protein complex coverage information for deciding on the final predicted signaling pathways. We show that our approach improves the predictive results of the state of the art using different ranking metrics.


Asunto(s)
Ciclo Celular , Biología Computacional/métodos , Complejos Multiproteicos/metabolismo , Transducción de Señal , Algoritmos , Ciclo Celular/genética , Gráficos por Computador , Visualización de Datos , Expresión Génica , Mapeo de Interacción de Proteínas/métodos , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo
16.
IEEE Trans Pattern Anal Mach Intell ; 30(9): 1647-58, 2008 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-18617721

RESUMEN

We introduce a new probabilistic proximity search algorithm for range and K-nearest neighbor (K-NN) searching in both coordinate and metric spaces. Although there exist solutions for these problems, they boil down to a linear scan when the space is intrinsically high-dimensional, as is the case in many pattern recognition tasks. This, for example, renders the K-NN approach to classification rather slow in large databases. Our novel idea is to predict closeness between elements according to how they order their distances towards a distinguished set of anchor objects. Each element in the space sorts the anchor objects from closest to farthest to it, and the similarity between orders turns out to be an excellent predictor of the closeness between the corresponding elements. We present extensive experiments comparing our method against state-of-the-art exact and approximate techniques, both in synthetic and real, metric and non-metric databases, measuring both CPU time and distance computations. The experiments demonstrate that our technique almost always improves upon the performance of alternative techniques, in some cases by a wide margin.


Asunto(s)
Algoritmos , Inteligencia Artificial , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Almacenamiento y Recuperación de la Información/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Técnica de Sustracción , Simulación por Computador , Interpretación Estadística de Datos , Modelos Estadísticos
17.
Comput J ; 61(5): 773-788, 2018 May.
Artículo en Inglés | MEDLINE | ID: mdl-29795706

RESUMEN

Suffix trees are one of the most versatile data structures in stringology, with many applications in bioinformatics. Their main drawback is their size, which can be tens of times larger than the input sequence. Much effort has been put into reducing the space usage, leading ultimately to compressed suffix trees. These compressed data structures can efficiently simulate the suffix tree, while using space proportional to a compressed representation of the sequence. In this work, we take a new approach to compressed suffix trees for repetitive sequence collections, such as collections of individual genomes. We compress the suffix trees of individual sequences relative to the suffix tree of a reference sequence. These relative data structures provide competitive time/space trade-offs, being almost as small as the smallest compressed suffix trees for repetitive collections, and competitive in time with the largest and fastest compressed suffix trees.

18.
PLoS One ; 12(9): e0183460, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28937982

RESUMEN

Many proteins work together with others in groups called complexes in order to achieve a specific function. Discovering protein complexes is important for understanding biological processes and predict protein functions in living organisms. Large-scale and throughput techniques have made possible to compile protein-protein interaction networks (PPI networks), which have been used in several computational approaches for detecting protein complexes. Those predictions might guide future biologic experimental research. Some approaches are topology-based, where highly connected proteins are predicted to be complexes; some propose different clustering algorithms using partitioning, overlaps among clusters for networks modeled with unweighted or weighted graphs; and others use density of clusters and information based on protein functionality. However, some schemes still require much processing time or the quality of their results can be improved. Furthermore, most of the results obtained with computational tools are not accompanied by an analysis of false positives. We propose an effective and efficient mining algorithm for discovering highly connected subgraphs, which is our base for defining protein complexes. Our representation is based on transforming the PPI network into a directed acyclic graph that reduces the number of represented edges and the search space for discovering subgraphs. Our approach considers weighted and unweighted PPI networks. We compare our best alternative using PPI networks from Saccharomyces cerevisiae (yeast) and Homo sapiens (human) with state-of-the-art approaches in terms of clustering, biological metrics and execution times, as well as three gold standards for yeast and two for human. Furthermore, we analyze false positive predicted complexes searching the PDBe (Protein Data Bank in Europe) database in order to identify matching protein complexes that have been purified and structurally characterized. Our analysis shows that more than 50 yeast protein complexes and more than 300 human protein complexes found to be false positives according to our prediction method, i.e., not described in the gold standard complex databases, in fact contain protein complexes that have been characterized structurally and documented in PDBe. We also found that some of these protein complexes have recently been classified as part of a Periodic Table of Protein Complexes. The latest version of our software is publicly available at http://doi.org/10.6084/m9.figshare.5297314.v1.


Asunto(s)
Algoritmos , Modelos Moleculares , Mapeo de Interacción de Proteínas/métodos , Proteínas/metabolismo , Humanos , Internet , Saccharomyces cerevisiae , Programas Informáticos
19.
Inf Retr Boston ; 20(3): 253-291, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28596702

RESUMEN

Most of the fastest-growing string collections today are repetitive, that is, most of the constituent documents are similar to many others. As these collections keep growing, a key approach to handling them is to exploit their repetitiveness, which can reduce their space usage by orders of magnitude. We study the problem of indexing repetitive string collections in order to perform efficient document retrieval operations on them. Document retrieval problems are routinely solved by search engines on large natural language collections, but the techniques are less developed on generic string collections. The case of repetitive string collections is even less understood, and there are very few existing solutions. We develop two novel ideas, interleaved LCPs and precomputed document lists, that yield highly compressed indexes solving the problem of document listing (find all the documents where a string appears), top-k document retrieval (find the k documents where a string appears most often), and document counting (count the number of documents where a string appears). We also show that a classical data structure supporting the latter query becomes highly compressible on repetitive data. Finally, we show how the tools we developed can be combined to solve ranked conjunctive and disjunctive multi-term queries under the simple [Formula: see text] model of relevance. We thoroughly evaluate the resulting techniques in various real-life repetitiveness scenarios, and recommend the best choices for each case.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA