Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Artículo en Inglés | MEDLINE | ID: mdl-38895877

RESUMEN

OBJECTIVE: To develop a machine learning-based prediction model for identifying hyperuricemic participants at risk of developing gout. METHODS: A retrospective nationwide Israeli cohort study used the Clalit Health Insurance database of 473 124 individuals to identify adults 18 years or older with at least two serum urate measurements exceeding 6.8 mg/dl between January 2007 and December 2022. Patients with a prior gout diagnosis or on gout medications were excluded. Patients' demographic characteristics, community and hospital diagnoses, routine medication prescriptions and laboratory results were used to train a risk prediction model. A machine learning model, XGBoost, was developed to predict the risk of gout. Feature selection methods were used to identify relevant variables. The model's performance was evaluated using the receiver operating characteristic area under the curve (ROC AUC) and precision-recall AUC. The primary outcome was the diagnosis of gout among hyperuricemic patients. RESULTS: Among the 301 385 participants with hyperuricemia included in the analysis, 15 055 (5%) were diagnosed with gout. The XGBoost model had a ROC-AUC of 0.781 (95% CI 0.78-0.784) and precision-recall AUC of 0.208 (95% CI 0.195-0.22). The most significant variables associated with gout diagnosis were serum uric acid levels, age, hyperlipidemia, non-steroidal anti-inflammatory drugs and diuretic purchases. A compact model using only these five variables yielded a ROC-AUC of 0.714 (95% CI 0.706-0.723) and a negative predictive value (NPV) of 95%. CONCLUSIONS: The findings of this cohort study suggest that a machine learning-based prediction model had relatively good performance and high NPV for identifying hyperuricemic participants at risk of developing gout.

2.
Proc Natl Acad Sci U S A ; 118(15)2021 04 13.
Artículo en Inglés | MEDLINE | ID: mdl-33833052

RESUMEN

Interactions between genetic variants-epistasis-is pervasive in model systems and can profoundly impact evolutionary adaption, population disease dynamics, genetic mapping, and precision medicine efforts. In this work, we develop a model for structured polygenic epistasis, called coordinated epistasis (CE), and prove that several recent theories of genetic architecture fall under the formal umbrella of CE. Unlike standard epistasis models that assume epistasis and main effects are independent, CE captures systematic correlations between epistasis and main effects that result from pathway-level epistasis, on balance skewing the penetrance of genetic effects. To test for the existence of CE, we propose the even-odd (EO) test and prove it is calibrated in a range of realistic biological models. Applying the EO test in the UK Biobank, we find evidence of CE in 18 of 26 traits spanning disease, anthropometric, and blood categories. Finally, we extend the EO test to tissue-specific enrichment and identify several plausible tissue-trait pairs. Overall, CE is a dimension of genetic architecture that can capture structured, systemic forms of epistasis in complex human traits.


Asunto(s)
Epistasis Genética , Modelos Genéticos , Herencia Multifactorial/genética , Evolución Molecular , Predisposición Genética a la Enfermedad , Humanos , Carácter Cuantitativo Heredable
3.
Bioinformatics ; 38(8): 2102-2110, 2022 04 12.
Artículo en Inglés | MEDLINE | ID: mdl-35020807

RESUMEN

SUMMARY: Self-supervised deep language modeling has shown unprecedented success across natural language tasks, and has recently been repurposed to biological sequences. However, existing models and pretraining methods are designed and optimized for text analysis. We introduce ProteinBERT, a deep language model specifically designed for proteins. Our pretraining scheme combines language modeling with a novel task of Gene Ontology (GO) annotation prediction. We introduce novel architectural elements that make the model highly efficient and flexible to long sequences. The architecture of ProteinBERT consists of both local and global representations, allowing end-to-end processing of these types of inputs and outputs. ProteinBERT obtains near state-of-the-art performance, and sometimes exceeds it, on multiple benchmarks covering diverse protein properties (including protein structure, post-translational modifications and biophysical attributes), despite using a far smaller and faster model than competing deep-learning methods. Overall, ProteinBERT provides an efficient framework for rapidly training protein predictors, even with limited labeled data. AVAILABILITY AND IMPLEMENTATION: Code and pretrained model weights are available at https://github.com/nadavbra/protein_bert. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Profundo , Secuencia de Aminoácidos , Proteínas/química , Lenguaje , Procesamiento de Lenguaje Natural
4.
Bioinformatics ; 35(21): 4515-4518, 2019 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-31214700

RESUMEN

MOTIVATION: Electronic health records (EHRs) are quickly becoming omnipresent in healthcare, but interoperability issues and technical demands limit their use for biomedical and clinical research. Interactive and flexible software that interfaces directly with EHR data structured around a common data model (CDM) could accelerate more EHR-based research by making the data more accessible to researchers who lack computational expertise and/or domain knowledge. RESULTS: We present PatientExploreR, an extensible application built on the R/Shiny framework that interfaces with a relational database of EHR data in the Observational Medical Outcomes Partnership CDM format. PatientExploreR produces patient-level interactive and dynamic reports and facilitates visualization of clinical data without any programming required. It allows researchers to easily construct and export patient cohorts from the EHR for analysis with other software. This application could enable easier exploration of patient-level data for physicians and researchers. PatientExploreR can incorporate EHR data from any institution that employs the CDM for users with approved access. The software code is free and open source under the MIT license, enabling institutions to install and users to expand and modify the application for their own purposes. AVAILABILITY AND IMPLEMENTATION: PatientExploreR can be freely obtained from GitHub: https://github.com/BenGlicksberg/PatientExploreR. We provide instructions for how researchers with approved access to their institutional EHR can use this package. We also release an open sandbox server of synthesized patient data for users without EHR access to explore: http://patientexplorer.ucsf.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Registros Electrónicos de Salud , Programas Informáticos , Computadores , Bases de Datos Factuales , Humanos , Estudios Observacionales como Asunto
5.
Nucleic Acids Res ; 42(Web Server issue): W182-6, 2014 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-24792159

RESUMEN

Neuropeptides (NPs) are short secreted peptides produced in neurons. NPs act by activating signaling cascades governing broad functions such as metabolism, sensation and behavior throughout the animal kingdom. NPs are the products of multistep processing of longer proteins, the NP precursors (NPPs). We present NeuroPID (Neuropeptide Precursor Identifier), an online machine-learning tool that identifies metazoan NPPs. NeuroPID was trained on 1418 NPPs annotated as such by UniProtKB. A large number of sequence-based features were extracted for each sequence with the goal of capturing the biophysical and informational-statistical properties that distinguish NPPs from other proteins. Training several machine-learning models, including support vector machines and ensemble decision trees, led to high accuracy (89-94%) and precision (90-93%) in cross-validation tests. For inputs of thousands of unseen sequences, the tool provides a ranked list of high quality predictions based on the results of four machine-learning classifiers. The output reveals many uncharacterized NPPs and secreted cell modulators that are rich in potential cleavage sites. NeuroPID is a discovery and a prediction tool that can be used to identify NPPs from unannotated transcriptomes and mass spectrometry experiments. NeuroPID predicted sequences are attractive targets for investigating behavior, physiology and cell modulation. The NeuroPID web tool is available at http:// neuropid.cs.huji.ac.il.


Asunto(s)
Neuropéptidos/clasificación , Precursores de Proteínas/clasificación , Programas Informáticos , Animales , Inteligencia Artificial , Genómica , Humanos , Internet , Neuropéptidos/química , Neuropéptidos/genética , Precursores de Proteínas/química , Precursores de Proteínas/genética , Análisis de Secuencia de Proteína
6.
BMC Genomics ; 16: 583, 2015 Aug 07.
Artículo en Inglés | MEDLINE | ID: mdl-26251035

RESUMEN

BACKGROUND: Insects belong to a class that accounts for the majority of animals on earth. With over one million identified species, insects display a huge diversity and occupy extreme environments. At present, there are dozens of fully sequenced insect genomes that cover a range of habitats, social behavior and morphologies. In view of such diverse collection of genomes, revealing evolutionary trends and charting functional relationships of proteins remain challenging. RESULTS: We analyzed the relatedness of 17 complete proteomes representative of proteomes from insects including louse, bee, beetle, ants, flies and mosquitoes, as well as an out-group from the crustaceans. The analyzed proteomes mostly represented the orders of Hymenoptera and Diptera. The 287,405 protein sequences from the 18 proteomes were automatically clustered into 20,933 families, including 799 singletons. A comprehensive analysis based on statistical considerations identified the families that were significantly expanded or reduced in any of the studied organisms. Among all the tested species, ants are characterized by an exceptionally high rate of family gain and loss. By assigning annotations to hundreds of species-specific families, the functional diversity among species and between the major clades (Diptera and Hymenoptera) is revealed. We found that many species-specific families are associated with receptor signaling, stress-related functions and proteases. The highest variability among insects associates with the function of transposition and nucleic acids processes (collectively coined TNAP). Specifically, the wasp and ants have an order of magnitude more TNAP families and proteins relative to species that belong to Diptera (mosquitoes and flies). CONCLUSIONS: An unsupervised clustering methodology combined with a comparative functional analysis unveiled proteomic signatures in the major clades of winged insects. We propose that the expansion of TNAP families in Hymenoptera potentially contributes to the accelerated genome dynamics that characterize the wasp and ants.


Asunto(s)
Genoma de los Insectos/genética , Insectos/genética , Proteoma/genética , Animales , Evolución Molecular , Filogenia , Proteómica/métodos , Especificidad de la Especie , Transcripción Genética/genética
7.
Bioinformatics ; 30(17): i624-30, 2014 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-25161256

RESUMEN

MOTIVATION: Modern protein sequencing techniques have led to the determination of >50 million protein sequences. ProtoNet is a clustering system that provides a continuous hierarchical agglomerative clustering tree for all proteins. While ProtoNet performs unsupervised classification of all included proteins, finding an optimal level of granularity for the purpose of focusing on protein functional groups remain elusive. Here, we ask whether knowledge-based annotations on protein families can support the automatic unsupervised methods for identifying high-quality protein families. We present a method that yields within the ProtoNet hierarchy an optimal partition of clusters, relative to manual annotation schemes. The method's principle is to minimize the entropy-derived distance between annotation-based partitions and all available hierarchical partitions. We describe the best front (BF) partition of 2 478 328 proteins from UniRef50. Of 4,929,553 ProtoNet tree clusters, BF based on Pfam annotations contain 26,891 clusters. The high quality of the partition is validated by the close correspondence with the set of clusters that best describe thousands of keywords of Pfam. The BF is shown to be superior to naïve cut in the ProtoNet tree that yields a similar number of clusters. Finally, we used parameters intrinsic to the clustering process to enrich a priori the BF's clusters. We present the entropy-based method's benefit in overcoming the unavoidable limitations of nested clusters in ProtoNet. We suggest that this automatic information-based cluster selection can be useful for other large-scale annotation schemes, as well as for systematically testing and comparing putative families derived from alternative clustering methods. AVAILABILITY AND IMPLEMENTATION: A catalog of BF clusters for thousands of Pfam keywords is provided at http://protonet.cs.huji.ac.il/bestFront/.


Asunto(s)
Proteínas/clasificación , Algoritmos , Análisis por Conglomerados , Anotación de Secuencia Molecular , Análisis de Secuencia de Proteína
8.
Nucleic Acids Res ; 40(Database issue): D313-20, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22121228

RESUMEN

ProtoNet 6.0 (http://www.protonet.cs.huji.ac.il) is a data structure of protein families that cover the protein sequence space. These families are generated through an unsupervised bottom-up clustering algorithm. This algorithm organizes large sets of proteins in a hierarchical tree that yields high-quality protein families. The 2012 ProtoNet (Version 6.0) tree includes over 9 million proteins of which 5.5% come from UniProtKB/SwissProt and the rest from UniProtKB/TrEMBL. The hierarchical tree structure is based on an all-against-all comparison of 2.5 million representatives of UniRef50. Rigorous annotation-based quality tests prune the tree to most informative 162,088 clusters. Every high-quality cluster is assigned a ProtoName that reflects the most significant annotations of its proteins. These annotations are dominated by GO terms, UniProt/Swiss-Prot keywords and InterPro. ProtoNet 6.0 operates in a default mode. When used in the advanced mode, this data structure offers the user a view of the family tree at any desired level of resolution. Systematic comparisons with previous versions of ProtoNet are carried out. They show how our view of protein families evolves, as larger parts of the sequence space become known. ProtoNet 6.0 provides numerous tools to navigate the hierarchy of clusters.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/clasificación , Análisis de Secuencia de Proteína , Algoritmos , Análisis por Conglomerados , Internet , Metagenoma , Anotación de Secuencia Molecular
9.
IEEE J Biomed Health Inform ; 28(7): 4216-4223, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38457316

RESUMEN

Efficient optimization of operating room (OR) activity poses a significant challenge for hospital managers due to the complex and risky nature of the environment. The traditional "one size fits all" approach to OR scheduling is no longer practical, and personalized medicine is required to meet the diverse needs of patients, care providers, medical procedures, and system constraints within limited resources. This paper aims to introduce a scientific and practical tool for predicting surgery durations and improving OR performance for maximum benefit to patients and the hospital. Previous works used machine-learning models for surgery duration prediction based on preoperative data. The models consider covariates known to the medical staff at the time of scheduling the surgery. Given a large number of covariates, model selection becomes crucial, and the number of covariates used for prediction depends on the available sample size. Our proposed approach utilizes multi-task regression to select a common subset of predicting covariates for all tasks with the same sample size while allowing the model's coefficients to vary between them. A regression task can refer to a single surgeon or operation type or the interaction between them. By considering these diverse factors, our method provides an overall more accurate estimation of the surgery durations, and the selected covariates that enter the model may help to identify the resources required for a specific surgery. We found that when the regression tasks were surgeon-based or based on the pair of operation type and surgeon, our suggested approach outperformed the compared baseline suggested in a previous study. However, our approach failed to reach the baseline for an operation-type-based task. By accurately estimating surgery durations, hospital managers can provide care to a greater number of patients, optimize resource allocation and utilization, and reduce waste. This research contributes to the advancement of personalized medicine and provides a valuable tool for improving operational efficiency in the dynamic world of medicine.


Asunto(s)
Quirófanos , Humanos , Tempo Operativo , Aprendizaje Automático , Algoritmos , Modelos Estadísticos , Procedimientos Quirúrgicos Operativos/métodos
10.
medRxiv ; 2024 Apr 12.
Artículo en Inglés | MEDLINE | ID: mdl-38680842

RESUMEN

Objectives: 1.1Biases inherent in electronic health records (EHRs), and therefore in medical artificial intelligence (AI) models may significantly exacerbate health inequities and challenge the adoption of ethical and responsible AI in healthcare. Biases arise from multiple sources, some of which are not as documented in the literature. Biases are encoded in how the data has been collected and labeled, by implicit and unconscious biases of clinicians, or by the tools used for data processing. These biases and their encoding in healthcare records undermine the reliability of such data and bias clinical judgments and medical outcomes. Moreover, when healthcare records are used to build data-driven solutions, the biases are further exacerbated, resulting in systems that perpetuate biases and induce healthcare disparities. This literature scoping review aims to categorize the main sources of biases inherent in EHRs. Methods: 1.2We queried PubMed and Web of Science on January 19th, 2023, for peer-reviewed sources in English, published between 2016 and 2023, using the PRISMA approach to stepwise scoping of the literature. To select the papers that empirically analyze bias in EHR, from the initial yield of 430 papers, 27 duplicates were removed, and 403 studies were screened for eligibility. 196 articles were removed after the title and abstract screening, and 96 articles were excluded after the full-text review resulting in a final selection of 116 articles. Results: 1.3Systematic categorizations of diverse sources of bias are scarce in the literature, while the effects of separate studies are often convoluted and methodologically contestable. Our categorization of published empirical evidence identified the six main sources of bias: a) bias arising from past clinical trials; b) data-related biases arising from missing, incomplete information or poor labeling of data; human-related bias induced by c) implicit clinician bias, d) referral and admission bias; e) diagnosis or risk disparities bias and finally, (f) biases in machinery and algorithms. Conclusions: 1.4Machine learning and data-driven solutions can potentially transform healthcare delivery, but not without limitations. The core inputs in the systems (data and human factors) currently contain several sources of bias that are poorly documented and analyzed for remedies. The current evidence heavily focuses on data-related biases, while other sources are less often analyzed or anecdotal. However, these different sources of biases add to one another exponentially. Therefore, to understand the issues holistically we need to explore these diverse sources of bias. While racial biases in EHR have been often documented, other sources of biases have been less frequently investigated and documented (e.g. gender-related biases, sexual orientation discrimination, socially induced biases, and implicit, often unconscious, human-related cognitive biases). Moreover, some existing studies lack causal evidence, illustrating the different prevalences of disease across groups, which does not per se prove the causality. Our review shows that data-, human- and machine biases are prevalent in healthcare and they significantly impact healthcare outcomes and judgments and exacerbate disparities and differential treatment. Understanding how diverse biases affect AI systems and recommendations is critical. We suggest that researchers and medical personnel should develop safeguards and adopt data-driven solutions with a "bias-in-mind" approach. More empirical evidence is needed to tease out the effects of different sources of bias on health outcomes.

11.
J Am Med Inform Assoc ; 31(2): 536-541, 2024 Jan 18.
Artículo en Inglés | MEDLINE | ID: mdl-38037121

RESUMEN

OBJECTIVE: Given the importance AI in genomics and its potential impact on human health, the American Medical Informatics Association-Genomics and Translational Biomedical Informatics (GenTBI) Workgroup developed this assessment of factors that can further enable the clinical application of AI in this space. PROCESS: A list of relevant factors was developed through GenTBI workgroup discussions in multiple in-person and online meetings, along with review of pertinent publications. This list was then summarized and reviewed to achieve consensus among the group members. CONCLUSIONS: Substantial informatics research and development are needed to fully realize the clinical potential of such technologies. The development of larger datasets is crucial to emulating the success AI is achieving in other domains. It is important that AI methods do not exacerbate existing socio-economic, racial, and ethnic disparities. Genomic data standards are critical to effectively scale such technologies across institutions. With so much uncertainty, complexity and novelty in genomics and medicine, and with an evolving regulatory environment, the current focus should be on using these technologies in an interface with clinicians that emphasizes the value each brings to clinical decision-making.


Asunto(s)
Inteligencia Artificial , Medicina , Humanos , Biología Computacional , Genómica
12.
BMC Bioinformatics ; 14 Suppl 3: S11, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23514195

RESUMEN

BACKGROUND: Daphnia pulex (Water flea) is the first fully sequenced crustacean genome. The crustaceans and insects have diverged from a common ancestor. It is a model organism for studying the molecular makeup for coping with the environmental challenges. In the complete proteome, there are 30,550 putative proteins. However, about 10,000 of them have no known homologues. Currently, the UniProtoKB reports on 95% of the Daphnia's proteins as putative and uncharacterized proteins. RESULTS: We have applied ProtoNet, an unsupervised hierarchical protein clustering method that covers about 10 million sequences, for automatic annotation of the Daphnia's proteome. 98.7% (26,625) of the Daphnia full-length proteins were successfully mapped to 13,880 ProtoNet stable clusters, and only 1.3% remained unmapped. We compared the properties of the Daphnia's protein families with those of the mouse and the fruitfly proteomes. Functional annotations were successfully assigned for 86% of the proteins. Most proteins (61%) were mapped to only 2953 clusters that contain Daphnia's duplicated genes. We focused on the functionality of maximally amplified paralogs. Cuticle structure components and a variety of ion channels protein families were associated with a maximal level of gene amplification. We focused on gene amplification as a leading strategy of the Daphnia in coping with environmental toxicity. CONCLUSIONS: Automatic inference is achieved through mapping of sequences to the protein family tree of ProtoNet 6.0. Applying a careful inference protocol resulted in functional assignments for over 86% of the complete proteome. We conclude that the scaffold of ProtoNet can be used as an alignment-free protocol for large-scale annotation task of uncharacterized proteomes.


Asunto(s)
Proteínas de Artrópodos/clasificación , Daphnia/genética , Anotación de Secuencia Molecular , Proteoma/clasificación , Animales , Proteínas de Artrópodos/química , Proteínas de Artrópodos/genética , Clasificación/métodos , Análisis por Conglomerados , Genes Duplicados , Canales Iónicos/clasificación , Canales Iónicos/genética , Ratones , Proteoma/genética , Receptores de Superficie Celular/clasificación , Receptores de Superficie Celular/genética , Análisis de Secuencia de Proteína
13.
PLoS Comput Biol ; 8(2): e1002364, 2012 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-22319434

RESUMEN

The infection cycle of viruses creates many opportunities for the exchange of genetic material with the host. Many viruses integrate their sequences into the genome of their host for replication. These processes may lead to the virus acquisition of host sequences. Such sequences are prone to accumulation of mutations and deletions. However, in rare instances, sequences acquired from a host become beneficial for the virus. We searched for unexpected sequence similarity among the 900,000 viral proteins and all proteins from cellular organisms. Here, we focus on viruses that infect metazoa. The high-conservation analysis yielded 187 instances of highly similar viral-host sequences. Only a small number of them represent viruses that hijacked host sequences. The low-conservation sequence analysis utilizes the Pfam family collection. About 5% of the 12,000 statistical models archived in Pfam are composed of viral-metazoan proteins. In about half of Pfam families, we provide indirect support for the directionality from the host to the virus. The other families are either wrongly annotated or reflect an extensive sequence exchange between the viruses and their hosts. In about 75% of cross-taxa Pfam families, the viral proteins are significantly shorter than their metazoan counterparts. The tendency for shorter viral proteins relative to their related host proteins accounts for the acquisition of only a fragment of the host gene, the elimination of an internal domain and shortening of the linkers between domains. We conclude that, along viral evolution, the host-originated sequences accommodate simplified domain compositions. We postulate that the trimmed proteins act by interfering with the fundamental function of the host including intracellular signaling, post-translational modification, protein-protein interaction networks and cellular trafficking. We compiled a collection of hijacked protein sequences. These sequences are attractive targets for manipulation of viral infection.


Asunto(s)
Transferencia de Gen Horizontal , Interacciones Huésped-Patógeno , Modelos Genéticos , Proteínas Virales/química , Secuencia de Aminoácidos , Animales , Análisis por Conglomerados , Secuencia Conservada , ADN Viral/química , Humanos , Mamíferos/genética , Datos de Secuencia Molecular , Mutagénesis Insercional , Estructura Terciaria de Proteína , ARN Viral/química , Alineación de Secuencia , Proteínas Virales/genética , Virus/química , Virus/genética
14.
Ann Thorac Surg ; 116(2): 287-295, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-36328096

RESUMEN

BACKGROUND: We assessed volume-outcome relationships of resternotomy coronary artery bypass grafting (CABG). METHODS: We studied 1,362,218 first-time CABG and 93,985 resternotomy CABG patients reported to The Society of Thoracic Surgeons Adult Cardiac Surgery Database between 2010 and 2019. Primary outcomes were in-hospital mortality and mortality and morbidity (M&M) rates calculated per hospital and per surgeon. Outcomes were compared across 6 total cardiac surgery volume categories. Multivariable generalized linear mixed-effects models were used considering continuous case volume as the main exposure, adjusting for patient characteristics and within-surgeon and hospital variation. RESULTS: We observed a decline in resternotomy CABG unadjusted mortality and M&M from the lowest to the highest case-volume categories (hospital-level mortality, 3.9% ± 0.6% to 3.3% ± 0.1%; M&M, 18.5% ± 1.1% to 15.7% ± 0.4%, P < .001; surgeon-level mortality, 4.1% ± 0.3% to 4.1% ± 1.3%; M&M, 18.5% ± 0.6% to 14.5% ± 2.2%, P < .001). Looking at outcomes vs continuous volume showed that beyond a minimum annual volume (hospital 200-300 cases; surgeon 100-150 cases, approximately), mortality and M&M rates did not further improve. Using individual-level data and adjusting for patient characteristics and clustering within surgeon and hospital, we found higher procedural volume was associated with improved surgeon-level outcomes (mortality adjusted odds ratio, 0.39/100 procedures; 95% CI, 0.24-0.61; M&M adjusted odds ratio, 0.37/100 procedures; 95% CI, 0.28-0.48; P < .001 for both). Hospital-level adjusted volume-outcomes associations were not statistically significant. CONCLUSIONS: We observed an inverse relationship between total cardiac case volume and resternotomy CABG outcomes at the surgeon level only, indicating that individual surgeon's experience, rather than institutional volume, is the key determinant.


Asunto(s)
Puente de Arteria Coronaria , Hospitales , Adulto , Humanos , Puente de Arteria Coronaria/métodos , Morbilidad , Mortalidad Hospitalaria , Modelos Lineales
15.
J Clin Med ; 12(21)2023 Oct 26.
Artículo en Inglés | MEDLINE | ID: mdl-37959230

RESUMEN

(1) Background: The "obesity paradox" refers to a protective effect of higher body mass index (BMI) on mortality in acute infectious disease patients. However, the long-term impact of this paradox remains uncertain. (2) Methods: A retrospective study of patients diagnosed with community-acquired acute infectious diseases at Shamir Medical Center, Israel (2010-2020) was conducted. Patients were grouped by BMI: underweight, normal weight, overweight, and obesity classes I-III. Short- and long-term mortality rates were compared across these groups. (3) Results: Of the 25,226 patients, diverse demographics and comorbidities were observed across BMI categories. Short-term (90-day) and long-term (one-year) mortality rates were notably higher in underweight and normal-weight groups compared to others. Specifically, 90-day mortality was 22% and 13.2% for underweight and normal weight respectively, versus 7-9% for others (p < 0.001). Multivariate time series analysis revealed underweight individuals had a significantly higher 5-year mortality risk (HR 1.41 (95% CI 1.27-1.58, p < 0.001)), while overweight and obese categories had a reduced risk (overweight-HR 0.76 (95% CI 0.72-0.80, p < 0.001), obesity class I-HR 0.71 (95% CI 0.66-0.76, p < 0.001), obesity class II-HR 0.77 (95% CI 0.70-0.85, p < 0.001), and obesity class III-HR 0.79 (95% CI 0.67-0.92, p = 0.003)). (4) Conclusions: In this comprehensive study, obesity was independently associated with decreased short- and long-term mortality. These unexpected results prompt further exploration of this counterintuitive phenomenon.

16.
Ann Thorac Surg ; 115(1): 62-71, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-35618047

RESUMEN

BACKGROUND: We sought to quantify the risk trend of resternotomy coronary artery bypass grafting (CABG) over the past 2 decades. METHODS: We compared the outcomes of 194 804 consecutive resternotomy CABG patients and 1 445 894 randomly selected first-time CABG patients (50% of total) reported to The Society of Thoracic Surgeons Adult Cardiac Surgery Database between 1999 and 2018. Primary outcomes were in-hospital mortality and overall morbidity. Using multiple logistic regression for each outcome for each year, we computed the annual trends of risk-adjusted odds ratios for the primary outcomes in the entire cohort and in 194 776 propensity-matched pairs. RESULTS: The annual resternotomy CABG case volume from participating centers declined by 68%, from a median of 25 (range, 14-44) to a median of 8 (range, 4-15). Compared with first-time CABG, resternotomy CABG patients were consistently older, with higher proportions of comorbidities. After propensity matching, primary outcomes of resternotomy and first-time CABG were similar (mortality: 3.5% vs 2.3%, standardized difference [SDiff], 7.5%; morbidity: 40.7% vs 40.3%, SDiff, 0.9%). Mortality of resternotomy CABG performed after prior CABG was higher than that after prior non-CABG (4.3% vs 2.4%; SDiff, 10.8). Morbidity was similar between these subgroups (41.0% vs 39.1%; SDiff, 2.9). The adjusted odds ratio for mortality after resternotomy CABG declined from 1.93 (95% CI, 1.73-2.16) to 1.22 (95% CI, 0.92-1.62), and that of morbidity declined from 1.13 (95% CI, 1.08-1.18) to 0.91 (95% CI, 0.87-0.95), P < .001 for both. CONCLUSIONS: The risk of resternotomy CABG has decreased substantially over time. Resternotomy CABG performed after a prior CABG is higher risk compared with that performed after a non-CABG operation.


Asunto(s)
Enfermedad de la Arteria Coronaria , Complicaciones Posoperatorias , Humanos , Adulto , Complicaciones Posoperatorias/etiología , Puente de Arteria Coronaria/efectos adversos , Comorbilidad , Modelos Logísticos , Resultado del Tratamiento , Estudios Retrospectivos
17.
J Am Med Inform Assoc ; 30(5): 859-868, 2023 04 19.
Artículo en Inglés | MEDLINE | ID: mdl-36826399

RESUMEN

OBJECTIVE: Observational studies can impact patient care but must be robust and reproducible. Nonreproducibility is primarily caused by unclear reporting of design choices and analytic procedures. This study aimed to: (1) assess how the study logic described in an observational study could be interpreted by independent researchers and (2) quantify the impact of interpretations' variability on patient characteristics. MATERIALS AND METHODS: Nine teams of highly qualified researchers reproduced a cohort from a study by Albogami et al. The teams were provided the clinical codes and access to the tools to create cohort definitions such that the only variable part was their logic choices. We executed teams' cohort definitions against the database and compared the number of subjects, patient overlap, and patient characteristics. RESULTS: On average, the teams' interpretations fully aligned with the master implementation in 4 out of 10 inclusion criteria with at least 4 deviations per team. Cohorts' size varied from one-third of the master cohort size to 10 times the cohort size (2159-63 619 subjects compared to 6196 subjects). Median agreement was 9.4% (interquartile range 15.3-16.2%). The teams' cohorts significantly differed from the master implementation by at least 2 baseline characteristics, and most of the teams differed by at least 5. CONCLUSIONS: Independent research teams attempting to reproduce the study based on its free-text description alone produce different implementations that vary in the population size and composition. Sharing analytical code supported by a common data model and open-source tools allows reproducing a study unambiguously thereby preserving initial design choices.


Asunto(s)
Investigadores , Humanos , Bases de Datos Factuales
18.
Nucleic Acids Res ; 38(Web Server issue): W84-9, 2010 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-20444873

RESUMEN

Derivation of biological meaning from large sets of proteins or genes is a frequent task in genomic and proteomic studies. Such sets often arise from experimental methods including large-scale gene expression experiments and mass spectrometry (MS) proteomics. Large sets of genes or proteins are also the outcome of computational methods such as BLAST search and homology-based classifications. We have developed the PANDORA web server, which functions as a platform for the advanced biological analysis of sets of genes, proteins, or proteolytic peptides. First, the input set is mapped to a set of corresponding proteins. Then, an analysis of the protein set produces a graph-based hierarchy which highlights intrinsic relations amongst biological subsets, in light of their different annotations from multiple annotation resources. PANDORA integrates a large collection of annotation sources (GO, UniProt Keywords, InterPro, Enzyme, SCOP, CATH, Gene-3D, NCBI taxonomy and more) that comprise approximately 200,000 different annotation terms associated with approximately 3.2 million sequences from UniProtKB. Statistical enrichment based on a binomial approximation of the hypergeometric distribution and corrected for multiple hypothesis tests is calculated using several background sets, including major gene-expression DNA-chip platforms. Users can also visualize either standard or user-defined binary and quantitative properties alongside the proteins. PANDORA 4.2 is available at http://www.pandora.cs.huji.ac.il.


Asunto(s)
Péptidos/química , Péptidos/metabolismo , Proteínas/química , Proteínas/metabolismo , Programas Informáticos , Animales , Interpretación Estadística de Datos , Bases de Datos de Proteínas , Humanos , Internet , Espectrometría de Masas , Ratones , Péptidos/fisiología , Proteínas/fisiología , Proteómica , Ratas , Integración de Sistemas , Interfaz Usuario-Computador
19.
Stud Health Technol Inform ; 294: 224-228, 2022 May 25.
Artículo en Inglés | MEDLINE | ID: mdl-35612061

RESUMEN

Biological age may be of higher importance than chronological age, yet biological age is not trivial to estimate. This study presents a regression model to predict age using routine clinical tests like laboratory tests using the UK Biobank (UKBB) data. We run different machine learning regression models for this predictions task and compare their performance according to RMSE. The models were trained using data from 472,189 subjects aged 37-82 years old and 61 different laboratory tests results. Our chosen model was an XGboost model, which achieved an RMSE of 6.67 years. Subjects whose the model predicted to be younger than their actual age were found to be healthier as they had fewer diagnoses, fewer operations, and had a lower prevalence of specific diseases than age-matched controls. On the other hand, subjects predicted to be older than their chronological age had no significant differences in the number of diagnoses, number of operations, and specific diseases than age-matched controls.


Asunto(s)
Envejecimiento , Estado de Salud , Adulto , Anciano , Anciano de 80 o más Años , Envejecimiento/fisiología , Humanos , Aprendizaje Automático , Persona de Mediana Edad , Análisis de Regresión
20.
Stud Health Technol Inform ; 294: 219-223, 2022 May 25.
Artículo en Inglés | MEDLINE | ID: mdl-35612060

RESUMEN

The standard of care for a physician to review laboratory tests results is to weigh each individual laboratory test result and compare it to against a standard reference range. Such a method of scanning can lead to missing high-level information. Different methods have tried to overcome a part of the problem by creating new types of reference values. This research proposes looking at test scores in a higher dimension space. And using machine learning approach, determine whether a subject has abnormal tests result that, according to current practice, would be defined as valid - and thus indicating a possible disease or illness. To determine health status, we look both at a disease-specific level and disease-independent level, while looking at several different outcomes.


Asunto(s)
Técnicas de Laboratorio Clínico , Aprendizaje Automático , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA