Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 89
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Nutr J ; 21(1): 67, 2022 11 08.
Artigo em Inglês | MEDLINE | ID: mdl-36348423

RESUMO

BACKGROUND: Household food purchases (HFP) are in the pathway between the community food environment and the foods available in households for consumption. As such, HFP data have emerged as alternatives to monitor population dietary trends over-time. In this paper, we investigate the use of loyalty card datasets as unexplored sources of continuously collected HFP data to describe temporal trends in household produce purchases. METHODS: We partnered with a grocery store chain to obtain a loyalty card database with grocery transactions by household from January 2016-October 2018. We included households in an urban county with complete observations for head of household age group, household income group, and family size. Data were summarized as weighted averages (95% CI) of percent produce purchased out of all foods purchased by household per month. We modeled seasonal and linear trends in the proportion of produce purchases by age group and income while accounting for repeated observations per household using generalized estimating equations. RESULTS: There are 290,098 households in the database (88% of all county households). At baseline, the smallest and largest percent produce purchases are observed among the youngest and lowest income (12.2%, CI 11.1; 13.3) and the oldest and highest income households (19.3, CI 18.9; 19.6); respectively. The seasonal variations are consistent in all age and income groups with an April-June peak gradually descending until December. However, the average linear change in percent produce purchased per household per year varies by age and income being the steepest among the youngest households at each income level (from 1.42%, CI 0.98;1.8 to 0.69%, CI 0.42;0.95) while the oldest households experience almost no annual change. CONCLUSIONS: We explored the potential of a collaboration with a food retailer to use continuously collected loyalty card data for public health nutrition purposes. Our findings suggest a trend towards a healthier pattern in long-term food purchases and household food availability among the youngest households that may lessen the population chronic disease burden if sustained. Understanding the foods available for consumption within households allows public health advocates to develop and evaluate policies and programs promoting foods and nutrients along the life course.


Assuntos
Comportamento do Consumidor , Características da Família , Humanos , Renda , Dieta , Preferências Alimentares
2.
J Med Internet Res ; 22(6): e17496, 2020 06 22.
Artigo em Inglês | MEDLINE | ID: mdl-32568093

RESUMO

BACKGROUND: In recent years, flavored electronic cigarettes (e-cigarettes) have become popular among teenagers and young adults. Discussions about e-cigarettes and e-cigarette use (vaping) experiences are prevalent online, making social media an ideal resource for understanding the health risks associated with e-cigarette flavors from the users' perspective. OBJECTIVE: This study aimed to investigate the potential associations between electronic cigarette liquid (e-liquid) flavors and the reporting of health symptoms using social media data. METHODS: A dataset consisting of 2.8 million e-cigarette-related posts was collected using keyword filtering from Reddit, a social media platform, from January 2013 to April 2019. Temporal analysis for nine major health symptom categories was used to understand the trend of public concerns related to e-cigarettes. Sentiment analysis was conducted to obtain the proportions of positive and negative sentiment scores for all reported health symptom categories. Topic modeling was applied to reveal the topics related to e-cigarettes and health symptoms. Furthermore, generalized estimating equation (GEE) models were used to quantitatively measure potential associations between e-liquid flavors and the reporting of health symptoms. RESULTS: Temporal analysis showed that the Respiratory category was consistently the most discussed health symptom category among all categories related to e-cigarettes on Reddit, followed by the Throat category. Sentiment analysis showed higher proportions of positive sentiment scores for all reported health symptom categories, except for the Cancer category. Topic modeling conducted on all health-related posts showed that 17 of the top 100 topics were flavor related. GEE models showed different associations between the reporting of health symptoms and e-liquid flavor categories, for example, lower association of the Beverage flavors with Respiratory compared with other flavors and higher association of the Fruit flavors with Cardiovascular than other flavors. CONCLUSIONS: This study identified different potential associations between e-liquid flavors and the reporting of health symptoms using social media data. The results of this study provide valuable information for further investigation of the health effects associated with different e-liquid flavors.


Assuntos
Sistemas Eletrônicos de Liberação de Nicotina/normas , Aromatizantes/efeitos adversos , Mídias Sociais/normas , Vaping/efeitos adversos , Adolescente , Feminino , Humanos , Masculino , Adulto Jovem
3.
J Med Internet Res ; 22(6): e17280, 2020 06 24.
Artigo em Inglês | MEDLINE | ID: mdl-32579123

RESUMO

BACKGROUND: The number of electronic cigarette (e-cigarette) users has been increasing rapidly in recent years, especially among youth and young adults. More e-cigarette products have become available, including e-liquids with various brands and flavors. Various e-liquid flavors have been frequently discussed by e-cigarette users on social media. OBJECTIVE: This study aimed to examine the longitudinal prevalence of mentions of electronic cigarette liquid (e-liquid) flavors and user perceptions on social media. METHODS: We applied a data-driven approach to analyze the trends and macro-level user sentiments of different e-cigarette flavors on social media. With data collected from web-based stores, e-liquid flavors were classified into categories in a flavor hierarchy based on their ingredients. The e-cigarette-related posts were collected from social media platforms, including Reddit and Twitter, using e-cigarette-related keywords. The temporal trend of mentions of e-liquid flavor categories was compiled using Reddit data from January 2013 to April 2019. Twitter data were analyzed using a sentiment analysis from May to August 2019 to explore the opinions of e-cigarette users toward each flavor category. RESULTS: More than 1000 e-liquid flavors were classified into 7 major flavor categories. The fruit and sweets categories were the 2 most frequently discussed e-liquid flavors on Reddit, contributing to approximately 58% and 15%, respectively, of all flavor-related posts. We showed that mentions of the fruit flavor category had a steady overall upward trend compared with other flavor categories that did not show much change over time. Results from the sentiment analysis demonstrated that most e-liquid flavor categories had significant positive sentiments, except for the beverage and tobacco categories. CONCLUSIONS: The most updated information about the popular e-liquid flavors mentioned on social media was investigated, which showed that the prevalence of mentions of e-liquid flavors and user perceptions on social media were different. Fruit was the most frequently discussed flavor category on social media. Our study provides valuable information for future regulation of flavored e-cigarettes.


Assuntos
Sistemas Eletrônicos de Liberação de Nicotina/normas , Aromatizantes/química , Mídias Sociais/normas , Feminino , Aromatizantes/análise , Humanos , Estudos Longitudinais , Masculino , Percepção
4.
Prev Chronic Dis ; 16: E130, 2019 09 19.
Artigo em Inglês | MEDLINE | ID: mdl-31538566

RESUMO

INTRODUCTION: As one of the most prevalent chronic diseases in the United States, diabetes, especially type 2 diabetes, affects the health of millions of people and puts an enormous financial burden on the US economy. We aimed to develop predictive models to identify risk factors for type 2 diabetes, which could help facilitate early diagnosis and intervention and also reduce medical costs. METHODS: We analyzed cross-sectional data on 138,146 participants, including 20,467 with type 2 diabetes, from the 2014 Behavioral Risk Factor Surveillance System. We built several machine learning models for predicting type 2 diabetes, including support vector machine, decision tree, logistic regression, random forest, neural network, and Gaussian Naive Bayes classifiers. We used univariable and multivariable weighted logistic regression models to investigate the associations of potential risk factors with type 2 diabetes. RESULTS: All predictive models for type 2 diabetes achieved a high area under the curve (AUC), ranging from 0.7182 to 0.7949. Although the neural network model had the highest accuracy (82.4%), specificity (90.2%), and AUC (0.7949), the decision tree model had the highest sensitivity (51.6%) for type 2 diabetes. We found that people who slept 9 or more hours per day (adjusted odds ratio [aOR] = 1.13, 95% confidence interval [CI], 1.03-1.25) or had checkup frequency of less than 1 year (aOR = 2.31, 95% CI, 1.86-2.85) had higher risk for type 2 diabetes. CONCLUSION: Of the 8 predictive models, the neural network model gave the best model performance with the highest AUC value; however, the decision tree model is preferred for initial screening for type 2 diabetes because it had the highest sensitivity and, therefore, detection rate. We confirmed previously reported risk factors and also identified sleeping time and frequency of checkup as 2 new potential risk factors related to type 2 diabetes.


Assuntos
Árvores de Decisões , Diabetes Mellitus Tipo 2/epidemiologia , Modelos Biológicos , Redes Neurais de Computação , Área Sob a Curva , Sistema de Vigilância de Fator de Risco Comportamental , Humanos , Fatores de Risco
5.
Perspect Behav Sci ; 47(1): 283-310, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38660506

RESUMO

A complete science of human behavior requires a comprehensive account of the verbal behavior those humans exhibit. Existing behavioral theories of such verbal behavior have produced compelling insight into language's underlying function, but the expansive program of research those theories deserve has unfortunately been slow to develop. We argue that the status quo's manually implemented and study-specific coding systems are too resource intensive to be worthwhile for most behavior analysts. These high input costs in turn discourage research on verbal behavior overall. We propose lexicon-based sentiment analysis as a more modern and efficient approach to the study of human verbal products, especially naturally occurring ones (e.g., psychotherapy transcripts, social media posts). In the present discussion, we introduce the reader to principles of sentiment analysis, highlighting its usefulness as a behavior analytic tool for the study of verbal behavior. We conclude with an outline of approaches for handling some of the more complex forms of speech, like negation, sarcasm, and speculation. The appendix also provides a worked example of how sentiment analysis could be applied to existing questions in behavior analysis, complete with code that readers can incorporate into their own work.

6.
IEEE Trans Image Process ; 33: 1938-1951, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38224517

RESUMO

Generalized Zero-Shot Learning (GZSL) aims at recognizing images from both seen and unseen classes by constructing correspondences between visual images and semantic embedding. However, existing methods suffer from a strong bias problem, where unseen images in the target domain tend to be recognized as seen classes in the source domain. To address this issue, we propose a Prototype-augmented Self-supervised Generative Network by integrating self-supervised learning and prototype learning into a feature generating model for GZSL. The proposed model enjoys several advantages. First, we propose a Self-supervised Learning Module to exploit inter-domain relationships, where we introduce anchors as a bridge between seen and unseen categories. In the shared space, we pull the distribution of the target domain away from the source domain and obtain domain-aware features. To our best knowledge, this is the first work to introduce self-supervised learning into GZSL as learning guidance. Second, a Prototype Enhancing Module is proposed to utilize class prototypes to model reliable target domain distribution in finer granularity. In this module, a Prototype Alignment mechanism and a Prototype Dispersion mechanism are combined to guide the generation of better target class features with intra-class compactness and inter-class separability. Extensive experimental results on five standard benchmarks demonstrate that our model performs favorably against state-of-the-art GZSL methods.

7.
J Endourol ; 2024 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-38753704

RESUMO

Introduction: Chemical composition analysis is important in prevention counseling for kidney stone disease. Advances in laser technology have made dusting techniques more prevalent, but this offers no consistent way to collect enough material to send for chemical analysis, leading many to forgo this test. We developed a novel machine learning (ML) model to effectively assess stone composition based on intraoperative endoscopic video data. Methods: Two endourologists performed ureteroscopy for kidney stones ≥ 10 mm. Representative videos were recorded intraoperatively. Individual frames were extracted from the videos, and the stone was outlined by human tracing. An ML model, UroSAM, was built and trained to automatically identify kidney stones in the images and predict the majority stone composition as follows: calcium oxalate monohydrate (COM), dihydrate (COD), calcium phosphate (CAP), or uric acid (UA). UroSAM was built on top of the publicly available Segment Anything Model (SAM) and incorporated a U-Net convolutional neural network (CNN). Discussion: A total of 78 ureteroscopy videos were collected; 50 were used for the model after exclusions (32 COM, 8 COD, 8 CAP, 2 UA). The ML model segmented the images with 94.77% precision. Dice coefficient (0.9135) and Intersection over Union (0.8496) confirmed good segmentation performance of the ML model. A video-wise evaluation demonstrated 60% correct classification of stone composition. Subgroup analysis showed correct classification in 84.4% of COM videos. A post hoc adaptive threshold technique was used to mitigate biasing of the model toward COM because of data imbalance; this improved the overall correct classification to 62% while improving the classification of COD, CAP, and UA videos. Conclusions: This study demonstrates the effective development of UroSAM, an ML model that precisely identifies kidney stones from natural endoscopic video data. More high-quality video data will improve the performance of the model in classifying the majority stone composition.

8.
IEEE Trans Image Process ; 33: 625-638, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38198242

RESUMO

How to model the effect of reflection is crucial for single image reflection removal (SIRR) task. Modern SIRR methods usually simplify the reflection formulation with the assumption of linear combination of a transmission layer and a reflection layer. However, the large variations in image content and the real-world picture-taking conditions often result in far more complex reflection. In this paper, we introduce a new screen-blur combination based on two important factors, namely the intensity and the blurriness of reflection, to better characterize the reflection formulation in SIRR. Specifically, we present Screen-blur Reflection Networks (SRNet), which executes the screen-blur formulation in its network design and adapts to the complex reflection on real scenes. Technically, SRNet consists of three components: a blended image generator, a reflection estimator and a reflection removal module. The image generator exploits the screen-blur combination to synthesize the training blended images. The reflection estimator learns the reflection layer and a blur degree that measures the level of blurriness for reflection. The reflection removal module further uses the blended image, blur degree and reflection layer to filter out the transmission layer in a cascaded manner. Superior results on three different SIRR methods are reported when generating the training data on the principle of the screen-blur combination. Moreover, extensive experiments on six datasets quantitatively and qualitatively demonstrate the efficacy of SRNet over the state-of-the-art methods.

9.
Artigo em Inglês | MEDLINE | ID: mdl-38669165

RESUMO

Structure-guided image completion aims to inpaint a local region of an image according to an input guidance map from users. While such a task enables many practical applications for interactive editing, existing methods often struggle to hallucinate realistic object instances in complex natural scenes. Such a limitation is partially due to the lack of semantic-level constraints inside the hole region as well as the lack of a mechanism to enforce realistic object generation. In this work, we propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects. Specifically, the semantic discriminators leverage pretrained visual features to improve the realism of the generated visual concepts. Moreover, the object-level discriminators take aligned instances as inputs to enforce the realism of individual objects. Our proposed scheme significantly improves the generation quality and achieves state-of-the-art results on various tasks, including segmentation-guided completion, edge-guided manipulation and panoptically-guided manipulation on Places2 datasets. Furthermore, our trained model is flexible and can support multiple editing use cases, such as object insertion, replacement, removal and standard inpainting. In particular, our trained model combined with a novel automatic image completion pipeline achieves state-of-the-art results on the standard inpainting task.

10.
Comput Biol Med ; 165: 107423, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37672926

RESUMO

BACKGROUND: Despite declines in infant death rates in recent decades in the United States, the national goal of reducing infant death has not been reached. This study aims to predict infant death using machine-learning approaches. METHODS: A population-based retrospective study of live births in the United States between 2016 and 2021 was conducted. Thirty-three factors related to birth facility, prenatal care and pregnancy history, labor and delivery, and newborn characteristics were used to predict infant death. RESULTS: XGBoost demonstrated superior performance compared to the other four compared machine learning models. The original imbalanced dataset yielded better results than the balanced datasets created through oversampling procedures. The cross-validation of the XGBoost-based model consistently achieved high performance during both the pre-pandemic (2016-2019) and pandemic (2020-2021) periods. Specifically, the XGBoost-based model performed exceptionally well in predicting neonatal death (AUC: 0.98). The key predictors of infant death were identified as gestational age, birth weight, 5-min APGAR score, and prenatal visits. A simplified model based on these four predictors resulted in slightly inferior yet comparable performance to the all-predictor model (AUC: 0.91 vs. 0.93). Furthermore, the four-factor risk classification system effectively identified infant deaths in 2020 and 2021 for high-risk (88.7%-89.0%), medium-risk (4.6%-5.4%), and low-risk groups (0.1), outperforming the risk screening tool based on accumulated risk factors. CONCLUSIONS: XGBoost-based models excel in predicting infant death, providing valuable prognostic information for perinatal care education and counselling. The simplified four-predictor classification system could serve as a practical alternative for infant death risk prediction.


Assuntos
Morte do Lactente , Aprendizado de Máquina , Lactente , Recém-Nascido , Feminino , Gravidez , Humanos , Estudos Retrospectivos , Peso ao Nascer , Idade Gestacional
11.
Quintessence Int ; 54(1): 64-76, 2023 Jan 13.
Artigo em Inglês | MEDLINE | ID: mdl-36268943

RESUMO

OBJECTIVES: To assess self-reported population oral health conditions amid the COVID-19 pandemic using user reports on Twitter. METHOD AND MATERIALS: Oral health-related tweets during the COVID-19 pandemic were collected from 9,104 Twitter users across 26 states (with sufficient samples) in the United States between 12 November 2020 and 14 June 2021. User demographics were inferred by leveraging the visual information from the user profile images. Other characteristics including income, population density, poverty rate, health insurance coverage rate, community water fluoridation rate, and relative change in the number of daily confirmed COVID-19 cases were acquired or inferred based on retrieved information from user profiles. Logistic regression was performed to examine whether discussions vary across user characteristics. RESULTS: Overall, 26.70% of the Twitter users discussed "Wisdom tooth pain/jaw hurt," 23.86% tweeted about "Dental service/cavity," 18.97% discussed "Chipped tooth/tooth break," 16.23% talked about "Dental pain," and the rest tweeted about "Tooth decay/gum bleeding." Women and younger adults (19 to 29 years) were more likely to talk about oral health problems. Health insurance coverage rate was the most significant predictor in logistic regression for topic prediction. CONCLUSION: Tweets inform social disparities in oral health during the pandemic. For instance, people from counties at a higher risk of COVID-19 talked more about "Tooth decay/gum bleeding" and "Chipped tooth/tooth break." Older adults, who are vulnerable to COVID-19, were more likely to discuss "Dental pain." Topics of interest varied across user characteristics. Through the lens of social media, these findings may provide insights for oral health practitioners and policy makers.


Assuntos
COVID-19 , Mídias Sociais , Feminino , Humanos , Estados Unidos/epidemiologia , Idoso , COVID-19/epidemiologia , Pandemias , Saúde Bucal , Determinantes Sociais da Saúde , Dor
12.
Front Big Data ; 6: 1099182, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37091459

RESUMO

Since the World Health Organization (WHO) characterized COVID-19 as a pandemic in March 2020, there have been over 600 million confirmed cases of COVID-19 and more than six million deaths as of October 2022. The relationship between the COVID-19 pandemic and human behavior is complicated. On one hand, human behavior is found to shape the spread of the disease. On the other hand, the pandemic has impacted and even changed human behavior in almost every aspect. To provide a holistic understanding of the complex interplay between human behavior and the COVID-19 pandemic, researchers have been employing big data techniques such as natural language processing, computer vision, audio signal processing, frequent pattern mining, and machine learning. In this study, we present an overview of the existing studies on using big data techniques to study human behavior in the time of the COVID-19 pandemic. In particular, we categorize these studies into three groups-using big data to measure, model, and leverage human behavior, respectively. The related tasks, data, and methods are summarized accordingly. To provide more insights into how to fight the COVID-19 pandemic and future global catastrophes, we further discuss challenges and potential opportunities.

13.
Artigo em Inglês | MEDLINE | ID: mdl-37141054

RESUMO

Some cognitive research has discovered that humans accomplish event segmentation as a side effect of event anticipation. Inspired by this discovery, we propose a simple yet effective end-to-end self-supervised learning framework for event segmentation/boundary detection. Unlike the mainstream clustering-based methods, our framework exploits a transformer-based feature reconstruction scheme to detect event boundaries by reconstruction errors. This is consistent with the fact that humans spot new events by leveraging the deviation between their prediction and what is perceived. Thanks to their heterogeneity in semantics, the frames at boundaries are difficult to be reconstructed (generally with large reconstruction errors), which is favorable for event boundary detection. In addition, since the reconstruction occurs on the semantic feature level instead of the pixel level, we develop a temporal contrastive feature embedding (TCFE) module to learn the semantic visual representation for frame feature reconstruction (FFR). This procedure is like humans building up experiences with "long-term memory." The goal of our work is to segment generic events rather than localize some specific ones. We focus on achieving accurate event boundaries. As a result, we adopt the F1 score (Precision/Recall) as our primary evaluation metric for a fair comparison with previous approaches. Meanwhile, we also calculate the conventional frame-based mean over frames (MoF) and intersection over union (IoU) metric. We thoroughly benchmark our work on four publicly available datasets and demonstrate much better results. The source code is available at https://github.com/wang3702/CoSeg.

14.
Neural Netw ; 168: 450-458, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37806138

RESUMO

Time series data continuously collected by different sensors play an essential role in monitoring and predicting events in many real-world applications, and anomaly detection for time series has received increasing attention during the past decades. In this paper, we propose an anomaly detection method by densely contrasting the whole time series with its sub-sequences at different timestamps in a latent space. Our approach leverages the locality property of convolutional neural networks (CNN) and integrates position embedding to effectively capture local features for sub-sequences. Simultaneously, we employ an attention mechanism to extract global features from the entire time series. By combining these local and global features, our model is trained using both instance-level contrastive learning loss and distribution-level alignment loss. Furthermore, we introduce a reconstruction loss applied to the extracted global features to prevent the potential loss of information. To validate the efficacy of our proposed technique, we conduct experiments on publicly available time-series datasets for anomaly detection. Additionally, we evaluate our method on an in-house mobile phone dataset aimed at monitoring the status of Parkinson's disease, all within an unsupervised learning framework. Our results demonstrate the effectiveness and potential of the proposed approach in tackling anomaly detection in time series data, offering promising applications in real-world scenarios.


Assuntos
Redes Neurais de Computação , Doença de Parkinson , Humanos , Fatores de Tempo
15.
IEEE Trans Neural Netw Learn Syst ; 34(3): 1601-1612, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-34460400

RESUMO

The goal of domain adaptation (DA) is to train a good model for a target domain, with a large amount of labeled data in a source domain but only limited labeled data in the target domain. Conventional closed set domain adaptation (CSDA) assumes source and target label spaces are the same. However, this is not quite practical in real-world applications. In this work, we study the problem of open set domain adaptation (OSDA), which only requires the target label space to partially overlap with the source label space. Consequently, the solution to OSDA requires unknown classes detection and separation, which is normally achieved by introducing a threshold for the prediction of target unknown classes; however, the performance can be quite sensitive to that threshold. In this article, we tackle the above issues by proposing a novel OSDA method to perform soft rejection of unknown target classes and simultaneously match the source and target domains. Extensive experiments on three standard datasets validate the effectiveness of the proposed method over the state-of-the-art competitors.

16.
Artigo em Inglês | MEDLINE | ID: mdl-37467094

RESUMO

Audiovisual event localization aims to localize the event that is both visible and audible in a video. Previous works focus on segment-level audio and visual feature sequence encoding and neglect the event proposals and boundaries, which are crucial for this task. The event proposal features provide event internal consistency between several consecutive segments constructing one proposal, while the event boundary features offer event boundary consistency to make segments located at boundaries be aware of the event occurrence. In this article, we explore the proposal-level feature encoding and propose a novel context-aware proposal-boundary (CAPB) network to address audiovisual event localization. In particular, we design a local-global context encoder (LGCE) to aggregate local-global temporal context information for visual sequence, audio sequence, event proposals, and event boundaries, respectively. The local context from temporally adjacent segments or proposals contributes to event discrimination, while the global context from the entire video provides semantic guidance of temporal relationship. Furthermore, we enhance the structural consistency between segments by exploiting the above-encoded proposal and boundary representations. CAPB leverages the context information and structural consistency to obtain context-aware event-consistent cross-modal representation for accurate event localization. Extensive experiments conducted on the audiovisual event (AVE) dataset show that our approach outperforms the state-of-the-art methods by clear margins in both supervised event localization and cross-modality localization.

17.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 11707-11719, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37339034

RESUMO

Unpaired image-to-image translation (UNIT) aims to map images between two visual domains without paired training data. However, given a UNIT model trained on certain domains, it is difficult for current methods to incorporate new domains because they often need to train the full model on both existing and new domains. To address this problem, we propose a new domain-scalable UNIT method, termed as latent space anchoring, which can be efficiently extended to new visual domains and does not need to fine-tune encoders and decoders of existing domains. Our method anchors images of different domains to the same latent space of frozen GANs by learning lightweight encoder and regressor models to reconstruct single-domain images. In the inference phase, the learned encoders and decoders of different domains can be arbitrarily combined to translate images between any two domains without fine-tuning. Experiments on various datasets show that the proposed method achieves superior performance on both standard and domain-scalable UNIT tasks in comparison with the state-of-the-art methods.

18.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 11824-11841, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37167050

RESUMO

It is often the case that data are with multiple views in real-world applications. Fully exploring the information of each view is significant for making data more representative. However, due to various limitations and failures in data collection and pre-processing, it is inevitable for real data to suffer from view missing and data scarcity. The coexistence of these two issues makes it more challenging to achieve the pattern classification task. Currently, to our best knowledge, few appropriate methods can well-handle these two issues simultaneously. Aiming to draw more attention from the community to this challenge, we propose a new task in this paper, called few-shot partial multi-view learning, which focuses on overcoming the negative impact of the view-missing issue in the low-data regime. The challenges of this task are twofold: (i) it is difficult to overcome the impact of data scarcity under the interference of missing views; (ii) the limited number of data exacerbates information scarcity, thus making it harder to address the view-missing issue in turn. To address these challenges, we propose a new unified Gaussian dense-anchoring method. The unified dense anchors are learned for the limited partial multi-view data, thereby anchoring them into a unified dense representation space where the influence of data scarcity and view missing can be alleviated. We conduct extensive experiments to evaluate our method. The results on Cub-googlenet-doc2vec, Handwritten, Caltech102, Scene15, Animal, ORL, tieredImagenet, and Birds-200-2011 datasets validate its effectiveness. The codes will be released at https://github.com/zhouyuan888888/UGDA.

19.
IEEE Trans Med Imaging ; 42(10): 2817-2831, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37037257

RESUMO

Surgical workflow analysis aims to recognise surgical phases from untrimmed surgical videos. It is an integral component for enabling context-aware computer-aided surgical operating systems. Many deep learning-based methods have been developed for this task. However, most existing works aggregate homogeneous temporal context for all frames at a single level and neglect the fact that each frame has its specific need for information at multiple levels for accurate phase prediction. To fill this gap, in this paper we propose Cascade Multi-Level Transformer Network (CMTNet) composed of cascaded Adaptive Multi-Level Context Aggregation (AMCA) modules. Each AMCA module first extracts temporal context at the frame level and the phase level and then fuses frame-specific spatial feature, frame-level temporal context, and phase-level temporal context for each frame adaptively. By cascading multiple AMCA modules, CMTNet is able to gradually enrich the representation of each frame with the multi-level semantics that it specifically requires, achieving better phase prediction in a frame-adaptive manner. In addition, we propose a novel refinement loss for CMTNet, which explicitly guides each AMCA module to focus on extracting the key context for refining the prediction of the previous stage in terms of both prediction confidence and smoothness. This further enhances the quality of the extracted context effectively. Extensive experiments on the Cholec80 and the M2CAI datasets demonstrate that CMTNet achieves state-of-the-art performance.


Assuntos
Ácido Tranexâmico , Fluxo de Trabalho , Semântica
20.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 7711-7725, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37015417

RESUMO

We study the problem of localizing audio-visual events that are both audible and visible in a video. Existing works focus on encoding and aligning audio and visual features at the segment level while neglecting informative correlation between segments of the two modalities and between multi-scale event proposals. We propose a novel Semantic and Relation Modulation Network (SRMN) to learn the above correlation and leverage it to modulate the related auditory, visual, and fused features. In particular, for semantic modulation, we propose intra-modal normalization and cross-modal normalization. The former modulates features of a single modality with the event-relevant semantic guidance of the same modality. The latter modulates features of two modalities by establishing and exploiting the cross-modal relationship. For relation modulation, we propose a multi-scale proposal modulating module and a multi-alignment segment modulating module to introduce multi-scale event proposals and enable dense matching between cross-modal segments, which strengthen correlations between successive segments within one proposal and between all segments. With the features modulated by the correlation information regarding audio-visual events, SRMN performs accurate event localization. Extensive experiments conducted on the public AVE dataset demonstrate that our method outperforms the state-of-the-art methods in both supervised event localization and cross-modality localization tasks.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA