Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 7.525
Filtrar
1.
Sci Data ; 11(1): 358, 2024 Apr 09.
Artigo em Inglês | MEDLINE | ID: mdl-38594314

RESUMO

This paper presents a standardised dataset versioning framework for improved reusability, recognition and data version tracking, facilitating comparisons and informed decision-making for data usability and workflow integration. The framework adopts a software engineering-like data versioning nomenclature ("major.minor.patch") and incorporates data schema principles to promote reproducibility and collaboration. To quantify changes in statistical properties over time, the concept of data drift metrics (d) is introduced. Three metrics (dP, dE,PCA, and dE,AE) based on unsupervised Machine Learning techniques (Principal Component Analysis and Autoencoders) are evaluated for dataset creation, update, and deletion. The optimal choice is the dE,PCA metric, combining PCA models with splines. It exhibits efficient computational time, with values below 50 for new dataset batches and values consistent with seasonal or trend variations. Major updates (i.e., values of 100) occur when scaling transformations are applied to over 30% of variables while efficiently handling information loss, yielding values close to 0. This metric achieved a favourable trade-off between interpretability, robustness against information loss, and computation time.


Assuntos
Conjuntos de Dados como Assunto , Software , Análise de Componente Principal , Reprodutibilidade dos Testes , Fluxo de Trabalho , Conjuntos de Dados como Assunto/normas , Aprendizado de Máquina
2.
Nature ; 627(8002): 108-115, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38448695

RESUMO

The sea level along the US coastlines is projected to rise by 0.25-0.3 m by 2050, increasing the probability of more destructive flooding and inundation in major cities1-3. However, these impacts may be exacerbated by coastal subsidence-the sinking of coastal land areas4-a factor that is often underrepresented in coastal-management policies and long-term urban planning2,5. In this study, we combine high-resolution vertical land motion (that is, raising or lowering of land) and elevation datasets with projections of sea-level rise to quantify the potential inundated areas in 32 major US coastal cities. Here we show that, even when considering the current coastal-defence structures, further land area of between 1,006 and 1,389 km2 is threatened by relative sea-level rise by 2050, posing a threat to a population of 55,000-273,000 people and 31,000-171,000 properties. Our analysis shows that not accounting for spatially variable land subsidence within the cities may lead to inaccurate projections of expected exposure. These potential consequences show the scale of the adaptation challenge, which is not appreciated in most US coastal cities.


Assuntos
Altitude , Cidades , Planejamento de Cidades , Inundações , Movimento (Física) , Elevação do Nível do Mar , Cidades/estatística & dados numéricos , Planejamento de Cidades/métodos , Planejamento de Cidades/tendências , Inundações/prevenção & controle , Inundações/estatística & dados numéricos , Estados Unidos , Conjuntos de Dados como Assunto , Elevação do Nível do Mar/estatística & dados numéricos , Aclimatação
3.
Sci Data ; 11(1): 290, 2024 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-38472209

RESUMO

Fat infiltration in skeletal muscle is now recognized as a standard feature of aging and is directly related to the decline in muscle function. However, there is still a limited systematic integration and exploration of the mechanisms underlying the occurrence of myosteatosis in aging across species. Here, we re-analyzed bulk RNA-seq datasets to investigate the association between fat infiltration in skeletal muscle and aging. Our integrated analysis of single-nucleus transcriptomics in aged humans and Laiwu pigs with high intramuscular fat content, identified species-preference subclusters and revealed core gene programs associated with myosteatosis. Furthermore, we found that fibro/adipogenic progenitors (FAPs) had potential capacity of differentiating into PDE4D+/PDE7B+ preadipocytes across species. Additionally, cell-cell communication analysis revealed that FAPs may be associated with other adipogenic potential clusters via the COL4A2 and COL6A3 pathways. Our study elucidates the correlation mechanism between aging and fat infiltration in skeletal muscle, and these consensus signatures in both humans and pigs may contribute to increasing reproducibility and reliability in future studies involving in the field of muscle research.


Assuntos
Adipogenia , Envelhecimento , Músculo Esquelético , Idoso , Animais , Humanos , Adipogenia/fisiologia , Diferenciação Celular , Músculo Esquelético/fisiologia , Suínos , Conjuntos de Dados como Assunto , RNA-Seq , Transcriptoma , Adipócitos , Células-Tronco
4.
Sci Data ; 11(1): 289, 2024 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-38472225

RESUMO

High heterogeneity and complex interactions of malignant cells in breast cancer has been recognized as a driver of cancer progression and therapeutic failure. However, complete understanding of common cancer cell states and their underlying driver factors remain scarce and challenging. Here, we revealed seven consensus cancer cell states recurring cross patients by integrative analysis of single-cell RNA sequencing data of breast cancer. The distinct biological functions, the subtype-specific distribution, the potential cells of origin and the interrelation of consensus cancer cell states were systematically elucidated and validated in multiple independent datasets. We further uncovered the internal regulons and external cell components in tumor microenvironments, which contribute to the consensus cancer cell states. Using the state-specific signature, we also inferred the abundance of cells with each consensus cancer cell state by deconvolution of large breast cancer RNA-seq cohorts, revealing the association of immune-related state with better survival. Our study provides new insights for the cancer cell state composition and potential therapeutic strategies of breast cancer.


Assuntos
Neoplasias da Mama , Análise de Célula Única , Feminino , Humanos , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/genética , Relevância Clínica , Microambiente Tumoral , Conjuntos de Dados como Assunto , Análise de Sequência de RNA
7.
Sci Data ; 11(1): 204, 2024 Feb 14.
Artigo em Inglês | MEDLINE | ID: mdl-38355867

RESUMO

Public health and safety measures (PHSM) made in response to the COVID-19 pandemic have been singular, rapid, and profuse compared to the content, speed, and volume of normal policy-making. Not only can they have a profound effect on the spread of the disease, but they may also have multitudinous secondary effects, in both the social and natural worlds. Unfortunately, despite the best efforts by numerous research groups, existing data on COVID-19 PHSM only partially captures their full geographical scale and policy scope for any significant duration of time. This paper introduces our effort to harmonize data from the eight largest such efforts for policies made before September 21, 2021 into the taxonomy developed by the CoronaNet Research Project in order to respond to the need for comprehensive, high quality COVID-19 data. In doing so, we present a comprehensive comparative analysis of existing data from different COVID-19 PHSM datasets, introduce our novel methodology for harmonizing COVID-19 PHSM data, and provide a clear-eyed assessment of the pros and cons of our efforts.


Assuntos
COVID-19 , Pandemias , Formulação de Políticas , Humanos , Governo , Saúde Pública , Conjuntos de Dados como Assunto
8.
Sci Data ; 11(1): 210, 2024 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-38360815

RESUMO

Exosomes play a crucial role in intercellular communication and can be used as biomarkers for diagnostic and therapeutic clinical applications. However, systematic studies in cancer-associated exosomal nucleic acids remain a big challenge. Here, we developed ExMdb, a comprehensive database of exosomal nucleic acid biomarkers and disease-gene associations curated from published literature and high-throughput datasets. We performed a comprehensive curation of exosome properties including 4,586 experimentally supported gene-disease associations, 13,768 diagnostic and therapeutic biomarkers, and 312,049 nucleic acid subcellular locations. To characterize expression variation of exosomal molecules and identify causal factors of complex diseases, we have also collected 164 high-throughput datasets, including bulk and single-cell RNA sequencing (scRNA-seq) data. Based on these datasets, we performed various bioinformatics and statistical analyses to support our conclusions and advance our knowledge of exosome biology. Collectively, our dataset will serve as an essential resource for investigating the regulatory mechanisms of complex diseases and improving the development of diagnostic and therapeutic biomarkers.


Assuntos
Conjuntos de Dados como Assunto , Exossomos , Neoplasias , Ácidos Nucleicos , Humanos , Biomarcadores , Biomarcadores Tumorais , Biologia Computacional , Exossomos/genética , Neoplasias/diagnóstico , Neoplasias/genética , Ácidos Nucleicos/genética , Bases de Dados Genéticas
9.
Transl Vis Sci Technol ; 13(2): 16, 2024 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-38381447

RESUMO

Purpose: Retinal images contain rich biomarker information for neurodegenerative disease. Recently, deep learning models have been used for automated neurodegenerative disease diagnosis and risk prediction using retinal images with good results. Methods: In this review, we systematically report studies with datasets of retinal images from patients with neurodegenerative diseases, including Alzheimer's disease, Huntington's disease, Parkinson's disease, amyotrophic lateral sclerosis, and others. We also review and characterize the models in the current literature which have been used for classification, regression, or segmentation problems using retinal images in patients with neurodegenerative diseases. Results: Our review found several existing datasets and models with various imaging modalities primarily in patients with Alzheimer's disease, with most datasets on the order of tens to a few hundred images. We found limited data available for the other neurodegenerative diseases. Although cross-sectional imaging data for Alzheimer's disease is becoming more abundant, datasets with longitudinal imaging of any disease are lacking. Conclusions: The use of bilateral and multimodal imaging together with metadata seems to improve model performance, thus multimodal bilateral image datasets with patient metadata are needed. We identified several deep learning tools that have been useful in this context including feature extraction algorithms specifically for retinal images, retinal image preprocessing techniques, transfer learning, feature fusion, and attention mapping. Importantly, we also consider the limitations common to these models in real-world clinical applications. Translational Relevance: This systematic review evaluates the deep learning models and retinal features relevant in the evaluation of retinal images of patients with neurodegenerative disease.


Assuntos
Doença de Alzheimer , Aprendizado Profundo , Doenças Neurodegenerativas , Retina , Humanos , Algoritmos , Doença de Alzheimer/diagnóstico por imagem , Aprendizado de Máquina , Doenças Neurodegenerativas/diagnóstico por imagem , Conjuntos de Dados como Assunto , Retina/diagnóstico por imagem
10.
Sci Data ; 11(1): 259, 2024 Feb 29.
Artigo em Inglês | MEDLINE | ID: mdl-38424097

RESUMO

Large annotated datasets are required for training deep learning models, but in medical imaging data sharing is often complicated due to ethics, anonymization and data protection legislation. Generative AI models, such as generative adversarial networks (GANs) and diffusion models, can today produce very realistic synthetic images, and can potentially facilitate data sharing. However, in order to share synthetic medical images it must first be demonstrated that they can be used for training different networks with acceptable performance. Here, we therefore comprehensively evaluate four GANs (progressive GAN, StyleGAN 1-3) and a diffusion model for the task of brain tumor segmentation (using two segmentation networks, U-Net and a Swin transformer). Our results show that segmentation networks trained on synthetic images reach Dice scores that are 80%-90% of Dice scores when training with real images, but that memorization of the training images can be a problem for diffusion models if the original dataset is too small. Our conclusion is that sharing synthetic medical images is a viable option to sharing real images, but that further work is required. The trained generative models and the generated synthetic images are shared on AIDA data hub.


Assuntos
Neoplasias Encefálicas , Humanos , Neoplasias Encefálicas/diagnóstico por imagem , Processamento de Imagem Assistida por Computador , Disseminação de Informação , Conjuntos de Dados como Assunto
11.
Sci Rep ; 14(1): 2858, 2024 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-38310165

RESUMO

Radiomic datasets can be class-imbalanced, for instance, when the prevalence of diseases varies notably, meaning that the number of positive samples is much smaller than that of negative samples. In these cases, the majority class may dominate the model's training and thus negatively affect the model's predictive performance, leading to bias. Therefore, resampling methods are often utilized to class-balance the data. However, several resampling methods exist, and neither their relative predictive performance nor their impact on feature selection has been systematically analyzed. In this study, we aimed to measure the impact of nine resampling methods on radiomic models utilizing a set of fifteen publicly available datasets regarding their predictive performance. Furthermore, we evaluated the agreement and similarity of the set of selected features. Our results show that applying resampling methods did not improve the predictive performance on average. On specific datasets, slight improvements in predictive performance (+ 0.015 in AUC) could be seen. A considerable disagreement on the set of selected features was seen (only 28.7% of features agreed), which strongly impedes feature interpretability. However, selected features are similar when considering their correlation (82.9% of features correlated on average).


Assuntos
Análise de Dados , 60570 , Conjuntos de Dados como Assunto
12.
JAMA Netw Open ; 7(2): e2356619, 2024 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-38393731

RESUMO

Importance: Nonadherence to antihypertensive medications is associated with uncontrolled blood pressure, higher mortality rates, and increased health care costs, and food insecurity is one of the modifiable medication nonadherence risk factors. The Supplemental Nutrition Assistance Program (SNAP), a social intervention program for addressing food insecurity, may help improve adherence to antihypertensive medications. Objective: To evaluate whether receipt of SNAP benefits can modify the consequences of food insecurity on nonadherence to antihypertensive medications. Design, Setting, and Participants: A retrospective cohort study design was used to assemble a cohort of antihypertensive medication users from the linked Medical Expenditure Panel Survey (MEPS)-National Health Interview Survey (NHIS) dataset for 2016 to 2017. The MEPS is a national longitudinal survey on verified self-reported prescribed medication use and health care access measures, and the NHIS is an annual cross-sectional survey of US households that collects comprehensive health information, health behavior, and sociodemographic data, including receipt of SNAP benefits. Receipt of SNAP benefits in the past 12 months and food insecurity status in the past 30 days were assessed through standard questionnaires during the study period. Data analysis was performed from March to December 2021. Exposure: Status of SNAP benefit receipt. Main Outcomes and Measures: The main outcome, nonadherence to antihypertensive medication refill adherence (MRA), was defined using the MEPS data as the total days' supply divided by 365 days for each antihypertensive medication class. Patients were considered nonadherent if their overall MRA was less than 80%. Food insecurity status in the 30 days prior to the survey was modeled as the effect modifier. Inverse probability of treatment (IPT) weighting was used to control for measured confounding effects of baseline covariates. A probit model was used, weighted by the product of the computed IPT weights and MEPS weights, to estimate the population average treatment effects (PATEs) of SNAP benefit receipt on nonadherence. A stratified analysis approach was used to assess for potential effect modification by food insecurity status. Results: This analysis involved 6692 antihypertensive medication users, of whom 1203 (12.8%) reported receiving SNAP benefits and 1338 (14.8%) were considered as food insecure. The mean (SD) age was 63.0 (13.3) years; 3632 (51.3%) of the participants were women and 3060 (45.7%) were men. Although SNAP was not associated with nonadherence to antihypertensive medications in the overall population, it was associated with a 13.6-percentage point reduction in nonadherence (PATE, -13.6 [95% CI, -25.0 to -2.3]) among the food-insecure subgroup but not among their food-secure counterparts. Conclusions and Relevance: This analysis of a national observational dataset suggests that patients with hypertension who receive SNAP benefits may be less likely to become nonadherent to antihypertensive medication, especially if they are experiencing food insecurity. Further examination of the role of SNAP as a potential intervention for preventing nonadherence to antihypertensive medications through prospectively designed interventional studies or natural experiment study designs is needed.


Assuntos
Assistência Alimentar , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Anti-Hipertensivos/uso terapêutico , Estudos Transversais , Pobreza , Estudos Retrospectivos , Idoso , Conjuntos de Dados como Assunto
13.
Nature ; 627(8003): 335-339, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38418873

RESUMO

The latitudinal diversity gradient (LDG) dominates global patterns of diversity1,2, but the factors that underlie the LDG remain elusive. Here we use a unique global dataset3 to show that vascular plants on oceanic islands exhibit a weakened LDG and explore potential mechanisms for this effect. Our results show that traditional physical drivers of island biogeography4-namely area and isolation-contribute to the difference between island and mainland diversity at a given latitude (that is, the island species deficit), as smaller and more distant islands experience reduced colonization. However, plant species with mutualists are underrepresented on islands, and we find that this plant mutualism filter explains more variation in the island species deficit than abiotic factors. In particular, plant species that require animal pollinators or microbial mutualists such as arbuscular mycorrhizal fungi contribute disproportionately to the island species deficit near the Equator, with contributions decreasing with distance from the Equator. Plant mutualist filters on species richness are particularly strong at low absolute latitudes where mainland richness is highest, weakening the LDG of oceanic islands. These results provide empirical evidence that mutualisms, habitat heterogeneity and dispersal are key to the maintenance of high tropical plant diversity and mediate the biogeographic patterns of plant diversity on Earth.


Assuntos
Biodiversidade , Mapeamento Geográfico , Ilhas , Plantas , Simbiose , Animais , Conjuntos de Dados como Assunto , Micorrizas/fisiologia , Plantas/microbiologia , Polinização , Clima Tropical , Oceanos e Mares , Filogeografia
14.
Nature ; 627(8003): 340-346, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38374255

RESUMO

Comprehensively mapping the genetic basis of human disease across diverse individuals is a long-standing goal for the field of human genetics1-4. The All of Us Research Program is a longitudinal cohort study aiming to enrol a diverse group of at least one million individuals across the USA to accelerate biomedical research and improve human health5,6. Here we describe the programme's genomics data release of 245,388 clinical-grade genome sequences. This resource is unique in its diversity as 77% of participants are from communities that are historically under-represented in biomedical research and 46% are individuals from under-represented racial and ethnic minorities. All of Us identified more than 1 billion genetic variants, including more than 275 million previously unreported genetic variants, more than 3.9 million of which had coding consequences. Leveraging linkage between genomic data and the longitudinal electronic health record, we evaluated 3,724 genetic variants associated with 117 diseases and found high replication rates across both participants of European ancestry and participants of African ancestry. Summary-level data are publicly available, and individual-level data can be accessed by researchers through the All of Us Researcher Workbench using a unique data passport model with a median time from initial researcher registration to data access of 29 hours. We anticipate that this diverse dataset will advance the promise of genomic medicine for all.


Assuntos
Conjuntos de Dados como Assunto , Genética Médica , Genética Populacional , Genoma Humano , Genômica , Grupos Minoritários , Grupos Raciais , Humanos , Acesso à Informação , População Negra/genética , Registros Eletrônicos de Saúde , Etnicidade/genética , População Europeia/genética , Predisposição Genética para Doença/genética , Variação Genética/genética , Genoma Humano/genética , Estudos Longitudinais , Grupos Raciais/genética , Reprodutibilidade dos Testes , Pesquisadores , Fatores de Tempo , Populações Vulneráveis
15.
Eur Rev Med Pharmacol Sci ; 28(3): 1213-1226, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38375726

RESUMO

OBJECTIVE: In this study, it is aimed to classify data by feature extraction from tomographic images for the diagnosis of COVID-19 using image processing and transfer learning. MATERIALS AND METHODS: In the proposed study, CT images are made better detectable by artificial intelligence through preliminary processes such as masking and segmentation. Then, the number of data was increased by applying data augmentation. The size of the dataset contains a large number of images in numerical terms. Therefore, the results of the models are more reliable. The dataset is split into 70% training and 30% testing. In this way, different features of the applied models were found, and positive effects were achieved on the result. Transfer Learning was used to reduce training times and further increase the success rate. To find the best method, many different pre-trained Transfer Learning models have been tried and compared with many different studies. RESULTS: A total of 8,354 images were used in the research. Of these, 2,695 consist of COVID-19 patients and the remaining healthy chest tomography images. All of these images were given to the models through masking and segmentation processes. As a result of the experimental evaluation, the best model was determined to be ResNet-50 and the highest results were found (accuracy 95.7%, precision 94.7%, recall 99.2%, specificity 88.3%, F1 score 96.9%, ROC-AUC score 97%). CONCLUSIONS: The presence of a COVID-19 lesion in the images was identified with high accuracy and recall rate using the transfer learning model we developed using thorax CT images. This outcome demonstrates that the strategy will speed up the diagnosis of COVID-19.


Assuntos
Inteligência Artificial , COVID-19 , Humanos , COVID-19/diagnóstico por imagem , Teste para COVID-19 , Pulmão/diagnóstico por imagem , Aprendizado de Máquina , Tomografia Computadorizada por Raios X , Conjuntos de Dados como Assunto
16.
Sci Data ; 11(1): 224, 2024 Feb 21.
Artigo em Inglês | MEDLINE | ID: mdl-38383523

RESUMO

The cutaneous absorption parameters of xenobiotics are crucial for the development of drugs and cosmetics, as well as for assessing environmental and occupational chemical risks. Despite the great variability in the design of experimental conditions due to uncertain international guidelines, datasets like HuskinDB have been created to report skin absorption endpoints. This review updates available skin permeability data by rigorously compiling research published between 2012 and 2021. Inclusion and exclusion criteria have been selected to build the most harmonized and reusable dataset possible. The Generative Topographic Mapping method was applied to the present dataset and compared to HuskinDB to monitor the progress in skin permeability research and locate chemotypes of particular concern. The open-source dataset (SkinPiX) includes steady-state flux, maximum flux, lag time and permeability coefficient results for the substances tested, as well as relevant information on experimental parameters that can impact the data. It can be used to extract subsets of data for comparisons and to build predictive models.


Assuntos
Absorção Cutânea , Pele , Xenobióticos , Permeabilidade , Pele/metabolismo , Xenobióticos/metabolismo , Conjuntos de Dados como Assunto , Humanos
17.
BMC Res Notes ; 17(1): 18, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-38183153

RESUMO

OBJECTIVES: This article presents the process of extraction and treatment of two datasets from the General Ombudsman of the Brazilian Unified Health System (OUVSUS). The resulting datasets allow the analysis of manifestation characteristics and sociodemographic profile of the citizens that performed these manifestations. DATA DESCRIPTION: The first dataset depicts the characteristics of the manifestations registered by the General Ombudsman. Each row represents an individual manifestation and contains information such as the registration date, classification, input channel, and subject, among others. The second dataset is constituted of sociodemographic information for each citizen that performed a manifestation, and characteristics such as sexual orientation, race, age, and geographic location of the citizen are presented, among others.


Assuntos
Conjuntos de Dados como Assunto , Demografia , Humanos , Brasil
18.
PLoS One ; 19(1): e0296929, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38277376

RESUMO

Every day thousands of news are published on the web and filtering tools can be used to extract knowledge on specific topics. The categorization of news into a predefined set of topics is a subject widely studied in the literature, however, most works are restricted to documents in English. In this work, we make two contributions. First, we introduce a Portuguese news dataset collected from WikiNews an open-source media that provide news from different sources. Since there is a lack of datasets for Portuguese, and an existing one is from a single news channel, we aim to introduce a dataset from different news channels. The availability of comprehensive datasets plays a key role in advancing research. Second, we compare different architectures for Portuguese news classification, exploring different text representations (BoW, TF-IDF, Embedding) and classification techniques (SVM, CNN, DJINN, BERT) for documents in Portuguese, covering classical methods and current technologies. We show the trade-off between accuracy and training time for this application. We aim to show the capabilities of available algorithms and the challenges faced in the area.


Assuntos
Conjuntos de Dados como Assunto , Internet , Humanos , Algoritmos , Portugal
19.
J Mol Biol ; 436(2): 168374, 2024 01 15.
Artigo em Inglês | MEDLINE | ID: mdl-38182301

RESUMO

Variant effect predictors assess if a substitution is pathogenic or benign. Most predictors, including those that are structure-based, are designed for globular proteins in aqueous environments and do not consider that the variant residue is located within the membrane. We report Missense3D-TM that provides a structure-based assessment of the impact of a missense variant located within a membrane. On a dataset of 2,078 pathogenic and 1,060 benign variants, spanning 711 proteins from 706 structures, Missense3D-TM achieved an accuracy of 66%, Mathews correlation coefficient of 0.37, sensitivity of 58% and specificity of 81%. Missense3D-TM performed similarly to mCSM-membrane: accuracy 66% vs 61% (p = 0.02) on an unbalanced test set and 70% vs 67% (p = 0.20) on a balanced test set. The Missense3D-TM website provides an analysis of the structural effects of the variant along with its predicted position within the membrane. The web server is available at http://missense3d.bc.ic.ac.uk/.


Assuntos
Proteínas de Membrana , Mutação de Sentido Incorreto , Domínios Proteicos , Imageamento Tridimensional , Conjuntos de Dados como Assunto , Proteínas de Membrana/química , Proteínas de Membrana/genética
20.
J Mol Biol ; 436(4): 168444, 2024 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-38218366

RESUMO

Many examples are known of regions of intrinsically disordered proteins that fold into α-helices upon binding to their targets. These helical binding motifs (HBMs) can be partially helical also in the unbound state, and this so-called residual structure can affect binding affinity and kinetics. To investigate the underlying mechanisms governing the formation of residual helical structure, we assembled a dataset of experimental helix contents of 65 peptides containing HBM that fold-upon-binding. The average residual helicity is 17% and increases to 60% upon target binding. The helix contents of residual and target-bound structures do not correlate, however the relative location of helix elements in both states shows a strong overlap. Compared to the general disordered regions, HBMs are enriched in amino acids with high helix preference and these residues are typically involved in target binding, explaining the overlap in helix positions. In particular, we find that leucine residues and leucine motifs in HBMs are the major contributors to helix stabilization and target-binding. For the two model peptides, we show that substitution of leucine motifs to other hydrophobic residues (valine or isoleucine) leads to reduction of residual helicity, supporting the role of leucine as helix stabilizer. From the three hydrophobic residues only leucine can efficiently stabilize residual helical structure. We suggest that the high occurrence of leucine motifs and a general preference for leucine at binding interfaces in HBMs can be explained by its unique ability to stabilize helical elements.


Assuntos
Proteínas Intrinsicamente Desordenadas , Leucina , Proteínas Intrinsicamente Desordenadas/química , Leucina/química , Peptídeos/química , Estrutura Secundária de Proteína , Motivos de Aminoácidos , Conjuntos de Dados como Assunto , Interações Hidrofóbicas e Hidrofílicas , Ligação Proteica , Modelos Químicos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...