Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 4.442
Filtrar
1.
Elife ; 132024 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-38568075

RESUMO

Learning invariances allows us to generalise. In the visual modality, invariant representations allow us to recognise objects despite translations or rotations in physical space. However, how we learn the invariances that allow us to generalise abstract patterns of sensory data ('concepts') is a longstanding puzzle. Here, we study how humans generalise relational patterns in stimulation sequences that are defined by either transitions on a nonspatial two-dimensional feature manifold, or by transitions in physical space. We measure rotational generalisation, i.e., the ability to recognise concepts even when their corresponding transition vectors are rotated. We find that humans naturally generalise to rotated exemplars when stimuli are defined in physical space, but not when they are defined as positions on a nonspatial feature manifold. However, if participants are first pre-trained to map auditory or visual features to spatial locations, then rotational generalisation becomes possible even in nonspatial domains. These results imply that space acts as a scaffold for learning more abstract conceptual invariances.


Assuntos
Generalização Psicológica , Aprendizagem , Humanos
2.
Cogn Sci ; 48(4): e13440, 2024 04.
Artigo em Inglês | MEDLINE | ID: mdl-38606615

RESUMO

People implicitly generalize the actions of known individuals in a social group to unknown members. However, actions have social goals and evaluative valences, and the extent to which actions with different valences (helpful and harmful) are implicitly generalized among group members remains unclear. We used computer animations to simulate social group actions, where helping and hindering actions were represented by aiding and obstructing another's climb up a hill. Study 1 found that helpful actions are implicitly expected to be shared among members of the same group but not among members of different groups, but no such effect was found for harmful actions. This suggests that helpful actions are more likely than harmful actions to be implicitly generalized to group members. This finding was replicated in Study 2 by increasing the group size from three to five. Study 3 found that the null effect for generalizing harmful actions among group members is not due to the difficulty of detecting action generalization, as both helpful and harmful actions are similarly generalized within particular individuals. Moreover, Study 4 demonstrated that weakening social group information resulted in the absence of implicit generalization for helpful actions, suggesting the specificity of group membership. Study 5 revealed that the generalization of helping actions occurred when actions were performed by multiple group members rather than being repeated by one group member, showing group-based inductive generalization. Overall, these findings support valence-dependent implicit action generalization among group members. This implies that people may possess different knowledge regarding valenced actions on category-based generalization.


Assuntos
Generalização Psicológica , Dinâmica de Grupo , Humanos
3.
J Speech Lang Hear Res ; 67(5): 1558-1600, 2024 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-38629966

RESUMO

PURPOSE: The present meta-analysis investigated the efficacy of anomia treatment in bilingual and multilingual persons with aphasia (BPWAs) by assessing the magnitudes of six anomia treatment outcomes. Three of the treatment outcomes pertained to the "trained language": improvement of trained words (treatment effect [TE]), within-language generalization of semantically related untrained words (WLG-Related), and within-language generalization of unrelated words (WLG-Unrelated). Three treatment outcomes were for the "untrained language": improvement of translations of the trained words (cross-language generalization of trained words [CLG-Tx]), cross-language generalization of semantically related untrained words (CLG-Related), and cross-language generalization of unrelated untrained words (CLG-Unrelated). This study also examined participant- and treatment-related predictors of these treatment outcomes. METHOD: This study is registered in the International Prospective Register of Systematic Reviews (PROSPERO) under the number CRD42023418147. Nine electronic databases were searched to identify word retrieval treatment studies of poststroke BPWAs of at least 6 months postonset. Pre- and posttreatment single-word naming scores were extracted for each eligible participant and used to calculate effect sizes (within-case Cohen's d) of the six treatment outcomes. Random-effects meta-analyses were conducted to assess weighted mean effect sizes of the treatment outcomes across studies. Multiple linear regression analyses were used to examine the effects of participant-related variables (pretreatment single-word naming and comprehension representing poststroke lexical processing abilities) and treatment-related variables (type, language, and duration). The methodological quality of eligible studies and the risk of bias in this meta-analysis were assessed. RESULTS: A total of 17 published studies with 39 BPWAs were included in the meta-analysis. The methodological quality of the included studies ranged from fair (n = 4) to good (n = 13). Anomia treatment produced a medium effect size for TE (M = 8.36) and marginally small effect sizes for WLG-Related (M = 1.63), WLG-Unrelated (M = 0.68), and CLG-Tx (M = 1.56). Effect sizes were nonsignificant for CLG-Related and CLG-Unrelated. TE was significantly larger than the other five types of treatment outcomes. TE and WLG-Related effect sizes were larger for BPWAs with milder comprehension or naming impairments and for treatments of longer duration. WLG-Unrelated was larger when BPWAs received phonological treatment than semantic and mixed treatments. The overall risk of bias in the meta-analysis was low with a potential risk of bias present in the study identification process. CONCLUSIONS: Current anomia treatment practices for bilingual speakers are efficacious in improving trained items but produce marginally small within-language generalization and cross-language generalization to translations of the trained items. These results highlight the need to provide treatment in each language of BPWAs and/or investigate other approaches to promote cross-language generalization. Furthermore, anomia treatment outcomes are influenced by BPWAs' poststroke single-word naming and comprehension abilities as well as treatment duration and the provision of phonological treatment. SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.25595712.


Assuntos
Anomia , Generalização Psicológica , Multilinguismo , Humanos , Anomia/terapia , Resultado do Tratamento , Terapia da Linguagem/métodos , Afasia/terapia
4.
Sci Rep ; 14(1): 8906, 2024 04 17.
Artigo em Inglês | MEDLINE | ID: mdl-38632252

RESUMO

People correct for movement errors when acquiring new motor skills (de novo learning) or adapting well-known movements (motor adaptation). While de novo learning establishes new control policies, adaptation modifies existing ones, and previous work have distinguished behavioral and underlying brain mechanisms for each motor learning type. However, it is still unclear whether learning in each type interferes with the other. In study 1, we use a within-subjects design where participants train with both 30° visuomotor rotation and mirror reversal perturbations, to compare adaptation and de novo learning respectively. We find no perturbation order effects, and find no evidence for differences in learning rates and asymptotes for both perturbations. Explicit instructions also provide an advantage during early learning in both perturbations. However, mirror reversal learning shows larger inter-participant variability and slower movement initiation. Furthermore, we only observe reach aftereffects following rotation training. In study 2, we incorporate the mirror reversal in a browser-based task, to investigate under-studied de novo learning mechanisms like retention and generalization. Learning persists across three or more days, substantially transfers to the untrained hand, and to targets on both sides of the mirror axis. Our results extend insights for distinguishing motor skill acquisition from adapting well-known movements.


Assuntos
Generalização Psicológica , Desempenho Psicomotor , Humanos , Destreza Motora , Movimento , Reversão de Aprendizagem , Adaptação Fisiológica
5.
PLoS One ; 19(4): e0300502, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38635515

RESUMO

Fire and smoke detection is crucial for the safe mining of coal energy, but previous fire-smoke detection models did not strike a perfect balance between complexity and accuracy, which makes it difficult to deploy efficient fire-smoke detection in coal mines with limited computational resources. Therefore, we improve the current advanced object detection model YOLOv8s based on two core ideas: (1) we reduce the model computational complexity and ensure real-time detection by applying faster convolutions to the backbone and neck parts; (2) to strengthen the model's detection accuracy, we integrate attention mechanisms into both the backbone and head components. In addition, we improve the model's generalization capacity by augmenting the data. Our method has 23.0% and 26.4% fewer parameters and FLOPs (Floating-Point Operations) than YOLOv8s, which means that we have effectively reduced the computational complexity. Our model also achieves a mAP (mean Average Precision) of 91.0%, which is 2.5% higher than the baseline model. These results show that our method can improve the detection accuracy while reducing complexity, making it more suitable for real-time fire-smoke detection in resource-constrained environments.


Assuntos
Algoritmos , Fumaça , Carvão Mineral , Generalização Psicológica
6.
PLoS One ; 19(4): e0300473, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38635663

RESUMO

High-resolution imagery and deep learning models have gained increasing importance in land-use mapping. In recent years, several new deep learning network modeling methods have surfaced. However, there has been a lack of a clear understanding of the performance of these models. In this study, we applied four well-established and robust deep learning models (FCN-8s, SegNet, U-Net, and Swin-UNet) to an open benchmark high-resolution remote sensing dataset to compare their performance in land-use mapping. The results indicate that FCN-8s, SegNet, U-Net, and Swin-UNet achieved overall accuracies of 80.73%, 89.86%, 91.90%, and 96.01%, respectively, on the test set. Furthermore, we assessed the generalization ability of these models using two measures: intersection of union and F1 score, which highlight Swin-UNet's superior robustness compared to the other three models. In summary, our study provides a systematic analysis of the classification differences among these four deep learning models through experiments. It serves as a valuable reference for selecting models in future research, particularly in scenarios such as land-use mapping, urban functional area recognition, and natural resource management.


Assuntos
Aprendizado Profundo , Tecnologia de Sensoriamento Remoto , Benchmarking , Generalização Psicológica , Imagens, Psicoterapia
7.
PLoS One ; 19(4): e0296841, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38568960

RESUMO

Recent research has shown that comparisons of multiple learning stimuli which are associated with the same novel noun favor taxonomic generalization of this noun. These findings contrast with single-stimulus learning in which children follow so-called lexical biases. However, little is known about the underlying search strategies. The present experiment provides an eye-tracking analysis of search strategies during novel word learning in a comparison design. We manipulated both the conceptual distance between the two learning items, i.e., children saw examples which were associated with a noun (e.g., the two learning items were either two bracelets in a "close" comparison condition or a bracelet and a watch in a "far" comparison condition), and the conceptual distance between the learning items and the taxonomically related items in the generalization options (e.g., the taxonomic generalization answer; a pendant, a near generalization item; versus a bow tie, a distant generalization item). We tested 5-, 6- and 8-year-old children's taxonomic (versus perceptual and thematic) generalization of novel names for objects. The search patterns showed that participants first focused on the learning items and then compared them with each of the possible choices. They also spent less time comparing the various options with one another; this search profile remained stable across age groups. Data also revealed that early comparisons, (i.e., reflecting alignment strategies) predicted generalization performance. We discuss four search strategies as well as the effect of age and conceptual distance on these strategies.


Assuntos
Tecnologia de Rastreamento Ocular , Vocabulário , Criança , Humanos , Idioma , Aprendizagem , Generalização Psicológica
8.
PLoS One ; 19(4): e0297068, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38593127

RESUMO

Compared with visible light images, thermal infrared images have poor resolution, low contrast, signal-to-noise ratio, blurred visual effects, and less information. Thermal infrared sports target detection methods relying on traditional convolutional networks capture the rich semantics in high-level features but blur the spatial details. The differences in physical information content and spatial distribution of high and low features are ignored, resulting in a mismatch between the region of interest and the target. To address these issues, we propose a local attention-guided Swin-transformer thermal infrared sports object detection method (LAGSwin) to encode sports objects' spatial transformation and orientation information. On the one hand, Swin-transformer guided by local attention is adopted to enrich the semantic knowledge of low-level features by embedding local focus from high-level features and generating high-quality anchors while increasing the embedding of contextual information. On the other hand, an active rotation filter is employed to encode orientation information, resulting in orientation-sensitive and invariant features to reduce the inconsistency between classification and localization regression. A bidirectional criss-cross fusion strategy is adopted in the feature fusion stage to enable better interaction and embedding features of different resolutions. At last, the evaluation and verification of multiple open-source sports target datasets prove that the proposed LAGSwin detection framework has good robustness and generalization ability.


Assuntos
Fontes de Energia Elétrica , Exame Físico , Generalização Psicológica , Conhecimento , Luz
9.
J Exp Anal Behav ; 121(3): 327-345, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38629655

RESUMO

Can simple choice conditional-discrimination choice be accounted for by recent quantitative models of combined stimulus and reinforcer control? In Experiment 1, two sets of five blackout durations, one using shorter intervals and one using longer intervals, conditionally signaled which subsequent choice response might provide food. In seven conditions, the distribution of blackout durations across the sets was varied. An updated version of the generalization-across-dimensions model nicely described the way that choice changed across durations. In Experiment 2, just two blackout durations acted as the conditional stimuli and the durations were varied over 10 conditions. The parameters of the model obtained in Experiment 1 failed adequately to predict choice in Experiment 2, but the model again fitted the data nicely. The failure to predict the Experiment 2 data from the Experiment 1 parameters occurred because in Experiment 1 differential control by reinforcer locations progressively decreased with blackout durations, whereas in Experiment 2 this control remained constant. These experiments extend the ability of the model to describe data from procedures based on concurrent schedules in which reinforcer ratios reverse at fixed times to those from conditional-discrimination procedures. Further research is needed to understand why control by reinforcer location differed between the two experiments.


Assuntos
Comportamento de Escolha , Aprendizagem por Discriminação , Generalização Psicológica , Modelos Psicológicos , Esquema de Reforço , Animais , Reforço Psicológico , Condicionamento Operante , Discriminação Psicológica , Columbidae , Fatores de Tempo
10.
Neural Netw ; 174: 106219, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38442489

RESUMO

Extrapolating future events based on historical information in temporal knowledge graphs (TKGs) holds significant research value and practical applications. In this field, the methods currently utilized can be classified as either embedding-based or logical rule-based. Embedding-based methods depend on learned entity and relation embeddings for prediction, but they suffer from the lack of interpretability due to the opaque reasoning process. On the other hand, logical rule-based methods face scalability challenges as they heavily rely on predefined logical rules. To overcome these limitations, we propose a hybrid model that combines embedding-based and logical rule-based methods to capture deep causal logic. Our model, called the Inductive Reasoning Model based on Interpretable Logical Rule (ILR-IR), aims to provide interpretable insights while effectively predicting future events in TKGs. ILR-IR delves into historical information, extracting valuable insights from logical rules embedded within relations and interaction preferences between entities. By considering both logical rules and interaction preferences, ILR-IR offers a comprehensive perspective for predicting future events. In addition, we propose the incorporation of a one-class augmented matching loss during optimization, which serves to enhance performance of the model during training. We evaluate ILR-IR on multiple datasets, including ICEWS14, ICEWS0515, and ICEWS18. Experimental results demonstrate that ILR-IR outperforms state-of-the-art baselines, showcasing its superior performance in TKG extrapolation reasoning. Moreover, ILR-IR demonstrates remarkable generalization capabilities, even when applied to related datasets that share a common relation vocabulary. This suggests that our proposed model exhibits robust zero-shot reasoning abilities. For interested parties, we have made our code publicly available at https://github.com/mxadorable/ILR-IR.


Assuntos
Reconhecimento Automatizado de Padrão , Resolução de Problemas , Aprendizagem , Generalização Psicológica , Conhecimento
11.
Neural Netw ; 174: 106129, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38508044

RESUMO

Multi-task multi-agent systems (MASs) are challenging to model because they involve heterogeneous agents with different behavior patterns that need to cooperate across various tasks. Existing networks for single-agent policies are not suitable for this setting, as they cannot share policies among agents without losing task-specific performance. We propose a novel framework called Role-based Multi-Agent Transformer (RoMAT), which uses a sequence modeling technique and a role-based actor to enable agents to adapt to different tasks and roles in MASs. RoMAT has a modular model architecture, where backbone networks are shared by all agents, but a small part of the parameters (role-based actor) is independent, depending on the agents' exclusive structures. We evaluate RoMAT on several benchmark tasks and show that it can capture the behavior patterns of heterogeneous agents and achieve better performance and generalization than other methods in both single and multi-task settings.


Assuntos
Benchmarking , Generalização Psicológica , Políticas
12.
Neural Netw ; 174: 106258, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38555722

RESUMO

Cropping-and-segmenting pattern parsers often combine diverse inner correlations into a single metric/scheme, resulting in over-generalizations and redundant representations. It is proposed to streamline pattern parsing by using presenting a redundant association elimination network (RAEN) with capsule attention twisters (CATs) and capsule-attention routing agreement (CARA). CATs trim delicate relationships between parts and wholes that are weak and interchangeable. Senior entities can only be updated by primary entities that meet the requirements of inter-part diversity and intra-object cohesiveness. In order to enhance results, CARA is designed to protect against the unnecessary voting signals of traditional routing protocols. Experiments involving facial and human segmentation show that RAEN is better than current remarkable methods, particularly for defining detailed semantic boundaries.


Assuntos
Face , Generalização Psicológica , Humanos , Semântica , Software , Votação
13.
Nat Neurosci ; 27(5): 988-999, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38499855

RESUMO

A fundamental human cognitive feat is to interpret linguistic instructions in order to perform novel tasks without explicit task experience. Yet, the neural computations that might be used to accomplish this remain poorly understood. We use advances in natural language processing to create a neural model of generalization based on linguistic instructions. Models are trained on a set of common psychophysical tasks, and receive instructions embedded by a pretrained language model. Our best models can perform a previously unseen task with an average performance of 83% correct based solely on linguistic instructions (that is, zero-shot learning). We found that language scaffolds sensorimotor representations such that activity for interrelated tasks shares a common geometry with the semantic representations of instructions, allowing language to cue the proper composition of practiced skills in unseen settings. We show how this model generates a linguistic description of a novel task it has identified using only motor feedback, which can subsequently guide a partner model to perform the task. Our models offer several experimentally testable predictions outlining how linguistic information must be represented to facilitate flexible and general cognition in the human brain.


Assuntos
Neurônios , Humanos , Neurônios/fisiologia , Modelos Neurológicos , Idioma , Generalização Psicológica/fisiologia , Processamento de Linguagem Natural , Aprendizagem/fisiologia , Redes Neurais de Computação , Encéfalo/fisiologia , Rede Nervosa/fisiologia
14.
Sci Rep ; 14(1): 5695, 2024 03 08.
Artigo em Inglês | MEDLINE | ID: mdl-38459104

RESUMO

The successful integration of neural networks in a clinical setting is still uncommon despite major successes achieved by artificial intelligence in other domains. This is mainly due to the black box characteristic of most optimized models and the undetermined generalization ability of the trained architectures. The current work tackles both issues in the radiology domain by focusing on developing an effective and interpretable cardiomegaly detection architecture based on segmentation models. The architecture consists of two distinct neural networks performing the segmentation of both cardiac and thoracic areas of a radiograph. The respective segmentation outputs are subsequently used to estimate the cardiothoracic ratio, and the corresponding radiograph is classified as a case of cardiomegaly based on a given threshold. Due to the scarcity of pixel-level labeled chest radiographs, both segmentation models are optimized in a semi-supervised manner. This results in a significant reduction in the costs of manual annotation. The resulting segmentation outputs significantly improve the interpretability of the architecture's final classification results. The generalization ability of the architecture is assessed in a cross-domain setting. The assessment shows the effectiveness of the semi-supervised optimization of the segmentation models and the robustness of the ensuing classification architecture.


Assuntos
Inteligência Artificial , Cardiomegalia , Humanos , Cardiomegalia/diagnóstico por imagem , Generalização Psicológica , Coração , Processamento de Imagem Assistida por Computador , Redes Neurais de Computação
15.
PLoS One ; 19(3): e0299471, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38451909

RESUMO

Structural planes decrease the strength and stability of rock masses, severely affecting their mechanical properties and deformation and failure characteristics. Therefore, investigation and analysis of structural planes are crucial tasks in mining rock mechanics. The drilling camera obtains image information of deep structural planes of rock masses through high-definition camera methods, providing important data sources for the analysis of deep structural planes of rock masses. This paper addresses the problems of high workload, low efficiency, high subjectivity, and poor accuracy brought about by manual processing based on current borehole image analysis and conducts an intelligent segmentation study of borehole image structural planes based on the U2-Net network. By collecting data from 20 different borehole images in different lithological regions, a dataset consisting of 1,013 borehole images with structural plane type, lithology, and color was established. Data augmentation methods such as image flipping, color jittering, blurring, and mixup were applied to expand the dataset to 12,421 images, meeting the requirements for deep network training data. Based on the PyTorch deep learning framework, the initial U2-Net network weights were set, the learning rate was set to 0.001, the training batch was 4, and the Adam optimizer adaptively adjusted the learning rate during the training process. A dedicated network model for segmenting structural planes was obtained, and the model achieved a maximum F-measure value of 0.749 when the confidence threshold was set to 0.7, with an accuracy rate of up to 0.85 within the range of recall rate greater than 0.5. Overall, the model has high accuracy for segmenting structural planes and very low mean absolute error, indicating good segmentation accuracy and certain generalization of the network. The research method in this paper can serve as a reference for the study of intelligent identification of structural planes in borehole images.


Assuntos
Rememoração Mental , Reconhecimento Psicológico , Comportamento Compulsivo , Generalização Psicológica , Processamento de Imagem Assistida por Computador
16.
Sci Rep ; 14(1): 5644, 2024 03 07.
Artigo em Inglês | MEDLINE | ID: mdl-38453977

RESUMO

Visual perceptual learning is traditionally thought to arise in visual cortex. However, typical perceptual learning tasks also involve systematic mapping of visual information onto motor actions. Because the motor system contains both effector-specific and effector-unspecific representations, the question arises whether visual perceptual learning is effector-specific itself, or not. Here, we study this question in an orientation discrimination task. Subjects learn to indicate their choices either with joystick movements or with manual reaches. After training, we challenge them to perform the same task with eye movements. We dissect the decision-making process using the drift diffusion model. We find that learning effects on the rate of evidence accumulation depend on effectors, albeit not fully. This suggests that during perceptual learning, visual information is mapped onto effector-specific integrators. Overlap of the populations of neurons encoding motor plans for these effectors may explain partial generalization. Taken together, visual perceptual learning is not limited to visual cortex, but also affects sensorimotor mapping at the interface of visual processing and decision making.


Assuntos
Córtex Visual , Percepção Visual , Humanos , Percepção Visual/fisiologia , Movimentos Oculares , Córtex Visual/fisiologia , Aprendizagem Espacial , Generalização Psicológica
17.
Neuropsychologia ; 196: 108848, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38432323

RESUMO

This study aimed to investigate whether neurological patients presenting with a bias in line bisection show specific problems in bisecting a line into two equal parts or their line bisection bias rather reflects a special case of a deficit in proportional reasoning more generally. In the latter case, the bias should also be observed for segmentations into thirds or quarters. To address this question, six neglect patients with a line bisection bias were administered additional tasks involving horizontal lines (e.g., segmentation into thirds and quarters, number line estimation, etc.). Their performance was compared to five neglect patients without a line bisection bias, 10 patients with right hemispheric lesions without neglect, and 32 healthy controls. Most interestingly, results indicated that neglect patients with a line bisection bias also overestimated segments on the left of the line (e.g., one third, one quarter) when dissecting lines into parts smaller than halves. In contrast, such segmentation biases were more nuanced when the required line segmentation was framed as a number line estimation task with either fractions or whole numbers. Taken together, this suggests a generalization of line bisection bias towards a segmentation or proportional processing bias, which is congruent with attentional weighting accounts of line bisection/neglect. As such, patients with a line bisection bias do not seem to have specific problems bisecting a line, but seem to suffer from a more general deficit processing proportions.


Assuntos
Lateralidade Funcional , Transtornos da Percepção , Humanos , Transtornos da Percepção/etiologia , Atenção , Viés , Generalização Psicológica , Percepção Espacial
18.
PLoS One ; 19(3): e0293440, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38512838

RESUMO

Recent work has suggested that feedforward residual neural networks (ResNets) approximate iterative recurrent computations. Iterative computations are useful in many domains, so they might provide good solutions for neural networks to learn. However, principled methods for measuring and manipulating iterative convergence in neural networks remain lacking. Here we address this gap by 1) quantifying the degree to which ResNets learn iterative solutions and 2) introducing a regularization approach that encourages the learning of iterative solutions. Iterative methods are characterized by two properties: iteration and convergence. To quantify these properties, we define three indices of iterative convergence. Consistent with previous work, we show that, even though ResNets can express iterative solutions, they do not learn them when trained conventionally on computer-vision tasks. We then introduce regularizations to encourage iterative convergent computation and test whether this provides a useful inductive bias. To make the networks more iterative, we manipulate the degree of weight sharing across layers using soft gradient coupling. This new method provides a form of recurrence regularization and can interpolate smoothly between an ordinary ResNet and a "recurrent" ResNet (i.e., one that uses identical weights across layers and thus could be physically implemented with a recurrent network computing the successive stages iteratively across time). To make the networks more convergent we impose a Lipschitz constraint on the residual functions using spectral normalization. The three indices of iterative convergence reveal that the gradient coupling and the Lipschitz constraint succeed at making the networks iterative and convergent, respectively. To showcase the practicality of our approach, we study how iterative convergence impacts generalization on standard visual recognition tasks (MNIST, CIFAR-10, CIFAR-100) or challenging recognition tasks with partial occlusions (Digitclutter). We find that iterative convergent computation, in these tasks, does not provide a useful inductive bias for ResNets. Importantly, our approach may be useful for investigating other network architectures and tasks as well and we hope that our study provides a useful starting point for investigating the broader question of whether iterative convergence can help neural networks in their generalization.


Assuntos
Aprendizagem , Redes Neurais de Computação , Generalização Psicológica
19.
PLoS One ; 19(3): e0299902, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38512917

RESUMO

Accurate identification of small tea buds is a key technology for tea harvesting robots, which directly affects tea quality and yield. However, due to the complexity of the tea plantation environment and the diversity of tea buds, accurate identification remains an enormous challenge. Current methods based on traditional image processing and machine learning fail to effectively extract subtle features and morphology of small tea buds, resulting in low accuracy and robustness. To achieve accurate identification, this paper proposes a small object detection algorithm called STF-YOLO (Small Target Detection with Swin Transformer and Focused YOLO), which integrates the Swin Transformer module and the YOLOv8 network to improve the detection ability of small objects. The Swin Transformer module extracts visual features based on a self-attention mechanism, which captures global and local context information of small objects to enhance feature representation. The YOLOv8 network is an object detector based on deep convolutional neural networks, offering high speed and precision. Based on the YOLOv8 network, modules including Focus and Depthwise Convolution are introduced to reduce computation and parameters, increase receptive field and feature channels, and improve feature fusion and transmission. Additionally, the Wise Intersection over Union loss is utilized to optimize the network. Experiments conducted on a self-created dataset of tea buds demonstrate that the STF-YOLO model achieves outstanding results, with an accuracy of 91.5% and a mean Average Precision of 89.4%. These results are significantly better than other detectors. Results show that, compared to mainstream algorithms (YOLOv8, YOLOv7, YOLOv5, and YOLOx), the model improves accuracy and F1 score by 5-20.22 percentage points and 0.03-0.13, respectively, proving its effectiveness in enhancing small object detection performance. This research provides technical means for the accurate identification of small tea buds in complex environments and offers insights into small object detection. Future research can further optimize model structures and parameters for more scenarios and tasks, as well as explore data augmentation and model fusion methods to improve generalization ability and robustness.


Assuntos
Algoritmos , Redes Neurais de Computação , Fontes de Energia Elétrica , Generalização Psicológica , Chá
20.
Neural Netw ; 174: 106224, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38479186

RESUMO

Adversarial training has become the mainstream method to boost adversarial robustness of deep models. However, it often suffers from the trade-off dilemma, where the use of adversarial examples hurts the standard generalization of models on natural data. To study this phenomenon, we investigate it from the perspective of spatial attention. In brief, standard training typically encourages a model to conduct a comprehensive check to input space. But adversarial training often causes a model to overly concentrate on sparse spatial regions. This reduced tendency is beneficial to avoid adversarial accumulation but easily makes the model ignore abundant discriminative information, thereby resulting in weak generalization. To address this issue, this paper introduces an Attention-Enhanced Learning Framework (AELF) for robustness training. The main idea is to enable the model to inherit the attention pattern of standard pre-trained model through an embedding-level regularization. To be specific, given a teacher model built on natural examples, the embedding distribution of teacher model is used as a static constraint to regulate the embedding outputs of the objective model. This design is mainly supported with that the embedding feature of standard model is usually recognized as a rich semantic integration of input. For implementation, we present a simplified AELFs that can achieve the regularization with single cross entropy loss via the parameter initialization and parameter update strategy. This avoids the extra consistency comparison operation between embedding vectors. Experimental observations verify the rationality of our argument, and experimental results demonstrate that it can achieve remarkable improvements in generalization under the high-level robustness.


Assuntos
Generalização Psicológica , Aprendizagem , Entropia , Semântica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...