RESUMO
It has been demonstrated that observers can accurately estimate their self-motion direction (i.e., heading) from optic flow, which can be affected by attention. However, it remains unclear how attention affects the serial dependence in the estimation. In the current study, participants conducted two experiments. The results showed that the estimation accuracy decreased when attentional resources allocated to the heading estimation task were reduced. Additionally, the estimates of currently presented headings were biased toward the headings of previously seen headings, showing serial dependence. Especially, this effect decreased (increased) when the attentional resources allocated to the previously (currently) seen headings were reduced. Furthermore, importantly, we developed a Bayesian inference model, which incorporated attention-modulated likelihoods and qualitatively predicted changes in the estimation accuracy and serial dependence. In summary, the current study shows that attention affects the serial dependence in heading estimation from optic flow and reveals the Bayesian computational mechanism behind the heading estimation.
Assuntos
Atenção , Teorema de Bayes , Percepção de Movimento , Fluxo Óptico , Humanos , Atenção/fisiologia , Fluxo Óptico/fisiologia , Percepção de Movimento/fisiologia , Adulto Jovem , Estimulação Luminosa/métodos , Masculino , Adulto , FemininoRESUMO
Complex steerable pyramid (CSP) performs well when applied to magnify subtle motions of structures for observing the dynamic characteristics of facilities. However, the impact of the types and parameters of CSP filters upon the performance of phase-based optical flow (PBOF) in measuring motion parameters has not been systematically studied. The purpose of this study is to comprehensively evaluate the impact of different CSP filter types (Octave, HalfOctave, SmoothHalfOctave, and QuarterOctave) and parameters on the performance of PBOF in measuring motion parameters. Firstly, by measuring simulated translational motion, the influence of the CSP's down-sampling rates on the displacement measurement accuracy of PBOF is analyzed to determine appropriate settings. Subsequently, the effective displacement measurement interval and accuracy of PBOF using the CSP are studied through simulated and experimental translational motion measurements. Further, the vibration parameter's accuracy is analyzed through simulated periodic vibration measurements. Finally, the characteristics of PBOF using the four kinds of CSP and practical considerations are discussed. Simulation and experimental results demonstrate that when using middle-level filters within the effective level range of HalfOctave, PBOF achieves the best overall displacement measurement performance. Additionally, this method can easily integrate with signal processing techniques in analyzing structural dynamic characteristics under field conditions.
Assuntos
Fluxo Óptico , Movimento (Física) , Simulação por Computador , VibraçãoRESUMO
A core challenge in perception is recognizing objects across the highly variable retinal input that occurs when objects are viewed from different directions (e.g. front versus side views). It has long been known that certain views are of particular importance, but it remains unclear why. We reasoned that characterizing the computations underlying visual comparisons between objects could explain the privileged status of certain qualitatively special views. We measured pose discrimination for a wide range of objects, finding large variations in performance depending on the object and the viewing angle, with front and back views yielding particularly good discrimination. Strikingly, a simple and biologically plausible computational model based on measuring the projected three-dimensional optical flow between views of objects accurately predicted both successes and failures of discrimination performance. This provides a computational account of why certain views have a privileged status.
Assuntos
Fluxo Óptico , Humanos , Percepção Visual , Modelos Biológicos , Discriminação PsicológicaRESUMO
Hippocampal place cells are influenced by both self-motion (idiothetic) signals and external sensory landmarks as an animal navigates its environment. To continuously update a position signal on an internal 'cognitive map', the hippocampal system integrates self-motion signals over time, a process that relies on a finely calibrated path integration gain that relates movement in physical space to movement on the cognitive map. It is unclear whether idiothetic cues alone, such as optic flow, exert sufficient influence on the cognitive map to enable recalibration of path integration, or if polarizing position information provided by landmarks is essential for this recalibration. Here, we demonstrate both recalibration of path integration gain and systematic control of place fields by pure optic flow information in freely moving rats. These findings demonstrate that the brain continuously rebalances the influence of conflicting idiothetic cues to fine-tune the neural dynamics of path integration, and that this recalibration process does not require a top-down, unambiguous position signal from landmarks.
Assuntos
Fluxo Óptico , Células de Lugar , Ratos Long-Evans , Animais , Fluxo Óptico/fisiologia , Ratos , Masculino , Células de Lugar/fisiologia , Sinais (Psicologia) , Percepção Espacial/fisiologia , Hipocampo/fisiologia , Hipocampo/citologiaRESUMO
Optic flow provides useful information in service of spatial navigation. However, whether brain networks supporting these two functions overlap is still unclear. Here we used Activation Likelihood Estimation (ALE) to assess the correspondence between brain correlates of optic flow processing and spatial navigation and their specific neural activations. Since computational and connectivity evidence suggests that visual input from optic flow provides information mainly during egocentric navigation, we further tested the correspondence between brain correlates of optic flow processing and that of both egocentric and allocentric navigation. Optic flow processing shared activation with egocentric (but not allocentric) navigation in the anterior precuneus, suggesting its role in providing information about self-motion, as derived from the analysis of optic flow, in service of egocentric navigation. We further documented that optic flow perception and navigation are partially segregated into two functional and anatomical networks, i.e., the dorsal and the ventromedial networks. Present results point to a dynamic interplay between the dorsal and ventral visual pathways aimed at coordinating visually guided navigation in the environment.
Assuntos
Mapeamento Encefálico , Encéfalo , Fluxo Óptico , Navegação Espacial , Humanos , Fluxo Óptico/fisiologia , Encéfalo/fisiologia , Encéfalo/diagnóstico por imagem , Navegação Espacial/fisiologia , Mapeamento Encefálico/métodos , Neuroimagem/métodos , Vias Visuais/fisiologia , Vias Visuais/diagnóstico por imagem , Percepção Visual/fisiologiaRESUMO
BACKGROUND: Flight accidents caused by spatial disorientation (SD) greatly affect flight safety. OBJECTIVE: Few studies have been devoted to the evaluation of SD. METHODS: 10 pilots and 10 non-pilots were recruited for the experimental induction of SD. Videos for giving optical flow stimuli were played at two different flow speeds to induce SD. Subjective judgment and center of foot pressure (CoP) data were collected from the tests. The data were combined to determine the occurrence of SD and analyze the SD types. RESULTS: The number of self-reported SD events was slightly smaller in the pilots than in the non-pilots. The average upper bound of the confidence interval for the standard deviation of CoP was 0.32 ± 0.09 cm and 0.38 ± 0.12 cm in the pilots and non-pilots, respectively. This indicator was significantly lower in the pilots than in the non-pilots (P= 0.03). The success rate of the experimental induction of unrecognized SD was 26.7% and 45.0% in the pilots and non-pilots, respectively. CONCLUSION: The method offered a new to analyze unrecognized SD. We could determine the occurrence unrecognized SD. This is an essential means of reducing flight accidents caused by unrecognized SD.
Assuntos
Confusão , Fluxo Óptico , Humanos , Masculino , Adulto , Fluxo Óptico/fisiologia , Pilotos , FemininoRESUMO
We provide the first perceptual quantification of user's sensitivity to radial optic flow artifacts and demonstrate a promising approach for masking this optic flow artifact via blink suppression. Near-eye HMOs allow users to feel immersed in virtual environments by providing visual cues, like motion parallax and stereoscopy, that mimic how we view the physical world. However, these systems exhibit a variety of perceptual artifacts that can limit their usability and the user's sense of presence in VR. One well-known artifact is the vergence-accommodation conflict (VAC). Varifocal displays can mitigate VAC, but bring with them other artifacts such as a change in virtual image size (radial optic flow) when the focal plane changes. We conducted a set of psychophysical studies to measure users' ability to perceive this radial flow artifact before, during, and after self-initiated blinks. Our results showed that visual sensitivity was reduced by a factor of 10 at the start and for ~70 ms after a blink was detected. Pre- and post-blink sensitivity was, on average, ~O.15% image size change during normal viewing and increased to ~1.5- 2.0% during blinks. Our results imply that a rapid (under 70 ms) radial optic flow distortion can go unnoticed during a blink. Furthermore, our results provide empirical data that can be used to inform engineering requirements for both hardware design and software-based graphical correction algorithms for future varifocal near-eye displays. Our project website is available at https://gamma.umd.edu/ROF/.
Assuntos
Fluxo Óptico , Gráficos por Computador , Acomodação Ocular , Algoritmos , SoftwareRESUMO
The independent effects of short- and long-term experiences on visual perception have been discussed for decades. However, no study has investigated whether and how these experiences simultaneously affect our visual perception. To address this question, we asked participants to estimate their self-motion directions (i.e., headings) simulated from optic flow, in which a long-term experience learned in everyday life (i.e., straight-forward motion being more common than lateral motion) plays an important role. The headings were selected from three distributions that resembled a peak, a hill, and a flat line, creating different short-term experiences. Importantly, the proportions of headings deviating from the straight-forward motion gradually increased in the peak, hill, and flat distributions, leading to a greater conflict between long- and short-term experiences. The results showed that participants biased their heading estimates towards the straight-ahead direction and previously seen headings, which increased with the growing experience conflict. This suggests that both long- and short-term experiences simultaneously affect visual perception. Finally, we developed two Bayesian models (Model 1 vs. Model 2) based on two assumptions that the experience conflict altered the likelihood distribution of sensory representation or the motor response system. The results showed that both models accurately predicted participants' estimation biases. However, Model 1 predicted a higher variance of serial dependence compared to Model 2, while Model 2 predicted a higher variance of the bias towards the straight-ahead direction compared to Model 1. This suggests that the experience conflict can influence visual perception by affecting both sensory and motor response systems. Taken together, the current study systematically revealed the effects of long- and short-term experiences on visual perception and the underlying Bayesian processing mechanisms.
Assuntos
Percepção de Movimento , Fluxo Óptico , Humanos , Percepção de Movimento/fisiologia , Teorema de Bayes , Percepção Visual/fisiologia , AprendizagemRESUMO
This paper proposes an adaptive river discharge measurement method based on spatiotemporal image velocimetry (STIV) and optical flow to solve the problem of blurred texture features and limited measurement accuracy under complex natural environmental conditions. Optical flow tracking generates spatiotemporal images by following the flow mainstream direction of rivers with both regular and irregular natural banks. A texture similarity function filtering method effectively enhances spatiotemporal texture features. The proposed method is applied to a natural river, with measurement results from a propeller-type current meter used as truth values. It is evaluated and compared with three other methods regarding measurement accuracy, error, and other evaluation indices. The results demonstrate that the method significantly improves spatiotemporal image quality. Its estimation outcomes perform better across all evaluation metrics, enhancing the adaptability and accuracy of the flow measurement method.
Assuntos
Fluxo Óptico , Rios , Reologia/métodosRESUMO
Minimally invasive percutaneous insertion procedures are widely used techniques in medicine. Their success is highly dependent on the skills of the practitioner. This paper presents a haptic simulator for training in these procedures, whose key component is a real percutaneous insertion needle with a sensory system incorporated to track its 3D location at every instant. By means of the proposed embedded vision system, the attitude (spatial orientation) and depth of insertion of a real needle are estimated. The proposal is founded on a novel depth estimation procedure based on optical flow techniques, complemented by sensory fusion techniques with the attitude calculated with data from an Inertial Measurement Unit (IMU) sensor. This procedure allows estimating the needle attitude with an accuracy of tenths of a degree and the displacement with an accuracy of millimeters. The computational algorithm runs on an embedded computer with real-time constraints for tracking the movement of a real needle. This haptic needle location data is used to reproduce the movement of a virtual needle within a simulation app. As a fundamental result, an ergonomic and realistic training simulator has been successfully constructed for healthcare professionals to acquire the mental model and motor skills necessary to practice percutaneous procedures successfully.
Assuntos
Fluxo Óptico , Humanos , Agulhas , Simulação por Computador , Movimento , Algoritmos , Interface Usuário-ComputadorRESUMO
We evaluated the role of visual stimulation on postural muscles and the changes in the center of pressure (CoP) during standing posture in expert and amateur basketball players. Participants were instructed to look at a fixation point presented on a screen during foveal, peripheral, and full field optic flow stimuli. Postural mechanisms and motor strategies were assessed by simultaneous recordings of stabilometric, oculomotor, and electromyographic data during visual stimulation. We found significant differences between experts and amateurs in the orientation of visual attention. Experts oriented attention to the right of their visual field, while amateurs to the bottom-right. The displacement in the CoP mediolateral direction showed that experts had a greater postural sway of the right leg, while amateurs on the left leg. The entropy-based data analysis of the CoP mediolateral direction exhibited a greater value in amateurs than in experts. The root-mean-square and the coactivation index analysis showed that experts activated mainly the right leg while amateurs the left leg. In conclusion, playing sports for years seems to have induced some strong differences in the standing posture between the right and left sides. Even during non-ecological visual stimulation, athletes maintain postural adaptations to counteract the body oscillation.
Assuntos
Basquetebol , Fluxo Óptico , Humanos , Músculo Esquelético/fisiologia , Perna (Membro) , Postura/fisiologia , Equilíbrio Postural/fisiologiaRESUMO
Detecting violent behavior in videos to ensure public safety and security poses a significant challenge. Precisely identifying and categorizing instances of violence in real-life closed-circuit television, which vary across specifications and locations, requires comprehensive understanding and processing of the sequential information embedded in these videos. This study aims to introduce a model that adeptly grasps the spatiotemporal context of videos within diverse settings and specifications of violent scenarios. We propose a method to accurately capture spatiotemporal features linked to violent behaviors using optical flow and RGB data. The approach leverages a Conv3D-based ResNet-3D model as the foundational network, capable of handling high-dimensional video data. The efficiency and accuracy of violence detection are enhanced by integrating an attention mechanism, which assigns greater weight to the most crucial frames within the RGB and optical-flow sequences during instances of violence. Our model was evaluated on the UBI-Fight, Hockey, Crowd, and Movie-Fights datasets; the proposed method outperformed existing state-of-the-art techniques, achieving area under the curve scores of 95.4, 98.1, 94.5, and 100.0 on the respective datasets. Moreover, this research not only has the potential to be applied in real-time surveillance systems but also promises to contribute to a broader spectrum of research in video analysis and understanding.
Assuntos
Fluxo Óptico , Violência , Sistemas ComputacionaisRESUMO
Optic flow provides information on movement direction and speed during locomotion. Changing the relationship between optic flow and walking speed via training has been shown to influence subsequent distance and hill steepness estimations. Previous research has shown that experience with slow optic flow at a given walking speed was associated with increased effort and distance overestimation in comparison to experiencing with fast optic flow at the same walking speed. Here, we investigated whether exposure to different optic flow speeds relative to gait influences perceptions of leaping and jumping ability. Participants estimated their maximum leaping and jumping ability after exposure to either fast or moderate optic flow at the same walking speed. Those calibrated to fast optic flow estimated farther leaping and jumping abilities than those calibrated to moderate optic flow. Findings suggest that recalibration between optic flow and walking speed may specify an action boundary when calibrated or scaled to actions such as leaping, and possibly, the manipulation of optic flow speed has resulted in a change in the associated anticipated effort for walking a prescribed distance, which in turn influence one's perceived action capabilities for jumping and leaping.
Assuntos
Fluxo Óptico , Humanos , Fluxo Óptico/fisiologia , Adulto , Adulto Jovem , Masculino , Feminino , Velocidade de Caminhada/fisiologia , Caminhada/fisiologia , Desempenho Psicomotor/fisiologia , Locomoção/fisiologiaRESUMO
OBJECTIVES: The paralaryngeal muscles are thought to be hyperfunctional with phonation in patients with primary muscle tension dysphonia (pMTD). However, objective, quantitative tools to assess paralaryngeal movement patterns lack. The objectives of this study were to (1) validate the use of optical flow to characterize paralaryngeal movement patterns with phonation, (2) characterize phonatory optical flow velocities and variability of the paralaryngeal muscles before and after a vocal load challenge, and (3) compare phonatory optical flow measures to standard laryngoscopic, acoustic, and self-perceptual assessments. METHODS: Phonatory movement velocities and variability of the paralaryngeal muscles at vocal onsets and offsets were quantified from ultrasound videos and optical flow methods across 42 subjects with and without a diagnosis of pMTD, before and after a vocal load challenge. Severity of laryngoscopic mediolateral supraglottic compression, acoustic perturbation, and ratings of vocal effort and discomfort were also obtained at both time points. RESULTS: There were no significant differences in optical flow measures of the paralaryngeal muscles with phonation between patients with pMTD and controls. Patients with pMTD had significantly more supraglottic compression, higher acoustic perturbations, and higher vocal effort and vocal tract discomfort ratings. Vocal load had a significant effect on vocal effort and discomfort but not on supraglottic compression, acoustics, or optical flow measures of the paralaryngeal muscles. CONCLUSION: Optical flow methods can be used to study paralaryngeal muscle movement velocity and variability patterns during vocal productions, although the role of the paralaryngeal in pMTD diagnostics (e.g., vocal hyperfunction) remains suspect. LEVEL OF EVIDENCE: 2 Laryngoscope, 134:1792-1801, 2024.
Assuntos
Disfonia , Fluxo Óptico , Humanos , Disfonia/diagnóstico , Fonação/fisiologia , Laringoscopia , MúsculosRESUMO
In cardiac cine magnetic resonance imaging (MRI), the heart is repeatedly imaged at numerous time points during the cardiac cycle. Frequently, the temporal evolution of a certain region of interest such as the ventricles or the atria is highly relevant for clinical diagnosis. In this paper, we devise a novel approach that allows for an automatized propagation of an arbitrary region of interest (ROI) along the cardiac cycle from respective annotated ROIs provided by medical experts at two different points in time, most frequently at the end-systolic (ES) and the end-diastolic (ED) cardiac phases. At its core, a 3D TV- L1 -based optical flow algorithm computes the apparent motion of consecutive MRI images in forward and backward directions. Subsequently, the given terminal annotated masks are propagated by this bidirectional optical flow in 3D, which results, however, in improper initial estimates of the segmentation masks due to numerical inaccuracies. These initially propagated segmentation masks are then refined by a 3D U-Net-based convolutional neural network (CNN), which was trained to enforce consistency with the forward and backward warped masks using a novel loss function. Moreover, a penalization term in the loss function controls large deviations from the initial segmentation masks. This method is benchmarked both on a new dataset with annotated single ventricles containing patients with severe heart diseases and on a publicly available dataset with different annotated ROIs. We emphasize that our novel loss function enables fine-tuning the CNN on a single patient, thereby yielding state-of-the-art results along the complete cardiac cycle.
Assuntos
Imagem Cinética por Ressonância Magnética , Fluxo Óptico , Humanos , Imagem Cinética por Ressonância Magnética/métodos , Processamento de Imagem Assistida por Computador/métodos , Coração/diagnóstico por imagem , Ventrículos do Coração , Imageamento por Ressonância Magnética/métodos , Átrios do CoraçãoRESUMO
Dynamic occlusion, such as the accretion and deletion of texture near a boundary, is a major factor in determining relative depth of surfaces. However, the shape of the contour bounding the dynamic texture can significantly influence what kind of 3D shape, and what relative depth, are conveyed by the optic flow. This can lead to percepts that are inconsistent with traditional accounts of shape and depth from motion, where accreting/deleting texture can indicate the figural region, and/or 3D rotation can be perceived despite the constant speed of the optic flow. This suggests that the speed profile of the dynamic texture and the shape of its bounding contours combine to determine relative depth in a way that is not explained by existing models. Here, we investigated how traditional structure-from-motion principles and contour geometry interact to determine the relative-depth interpretation of dynamic textures. We manipulated the consistency of the dynamic texture with rotational or translational motion by varying the speed profile of the texture. In Experiment 1, we used a multi-region figure-ground display consisting of regions with dots moving horizontally in opposite directions in adjacent regions. In Experiment 2, we used stimuli including two regions separated by a common border, with dot textures moving horizontally in opposite directions. Both contour geometry (convexity) and the speed profile of the dynamic dot texture influenced relative-depth judgments, but contour geometry was the stronger factor. The results underscore the importance of contour geometry, which most current models disregard, in determining depth from motion.
Assuntos
Percepção de Forma , Percepção de Movimento , Fluxo Óptico , Humanos , Rotação , Percepção de ProfundidadeRESUMO
Multiple sclerosis is a neurodegenerative disease that causes balance deficits, even in early stages. Evidence suggests that people with multiple sclerosis (PwMS) rely more on vision to maintain balance, and challenging balance with optical flow perturbations may be a practical screening for balance deficits. Whether these perturbations affect standing balance in PwMS is unknown. Therefore, the purpose of this study was to examine how optical flow perturbations affect standing balance in PwMS. We hypothesized that perturbations would cause higher variability in PwMS compared with matched controls during standing and that standing balance would be more susceptible to anterior-posterior (A-P) perturbations than medial-lateral (M-L) perturbations. Thirteen PwMS and 13 controls stood under 3 conditions: unperturbed, M-L perturbation, and A-P perturbations. A-P perturbations caused significantly higher A-P trunk sway variability in PwMS than controls, although both groups had similar center-of-pressure variability. Both perturbations increased variability in A-P trunk sway and center of pressure. Trunk variability data supported the hypothesis that PwMS were more susceptible to optical flow perturbations than controls. However, the hypothesis that A-P perturbations would affect balance more than M-L perturbations was partially supported. These results suggest potential for optical flow perturbations to identify balance deficits in PwMS.
Assuntos
Esclerose Múltipla , Doenças Neurodegenerativas , Fluxo Óptico , Humanos , Equilíbrio Postural , Posição OrtostáticaRESUMO
The short frames of low-count positron emission tomography (PET) images generally cause high levels of statistical noise. Thus, improving the quality of low-count images by using image postprocessing algorithms to achieve better clinical diagnoses has attracted widespread attention in the medical imaging community. Most existing deep learning-based low-count PET image enhancement methods have achieved satisfying results, however, few of them focus on denoising low-count PET images with the magnetic resonance (MR) image modality as guidance. The prior context features contained in MR images can provide abundant and complementary information for single low-count PET image denoising, especially in ultralow-count (2.5%) cases. To this end, we propose a novel two-stream dual PET/MR cross-modal interactive fusion network with an optical flow pre-alignment module, namely, OIF-Net. Specifically, the learnable optical flow registration module enables the spatial manipulation of MR imaging inputs within the network without any extra training supervision. Registered MR images fundamentally solve the problem of feature misalignment in the multimodal fusion stage, which greatly benefits the subsequent denoising process. In addition, we design a spatial-channel feature enhancement module (SC-FEM) that considers the interactive impacts of multiple modalities and provides additional information flexibility in both the spatial and channel dimensions. Furthermore, instead of simply concatenating two extracted features from these two modalities as an intermediate fusion method, the proposed cross-modal feature fusion module (CM-FFM) adopts cross-attention at multiple feature levels and greatly improves the two modalities' feature fusion procedure. Extensive experimental assessments conducted on real clinical datasets, as well as an independent clinical testing dataset, demonstrate that the proposed OIF-Net outperforms the state-of-the-art methods.
Assuntos
Processamento de Imagem Assistida por Computador , Fluxo Óptico , Processamento de Imagem Assistida por Computador/métodos , Tomografia por Emissão de Pósitrons/métodos , Imageamento por Ressonância Magnética/métodos , Encéfalo/diagnóstico por imagemRESUMO
Balance perturbations are used to study locomotor instability. However, these perturbations are designed to provoke a specific context of instability that may or may not generalize to a broader understanding of falls risk. The purpose of this study was to determine if the effect of balance perturbations on instability generalizes across contexts. 29 younger adults and 28 older adults completed four experimental trials, including unperturbed walking and walking while responding to three perturbation contexts: mediolateral optical flow, treadmill-induced slips, and lateral waist-pulls. We quantified the effect of perturbations as an absolute change in margin of stability from unperturbed walking. We found significant changes in mediolateral and anteroposterior margin of stability for all perturbations compared to unperturbed walking in both cohorts (p-values ≤ 0.042). In older adults, the mediolateral effects of lateral waist-pulls significantly correlated with those of optical flow perturbations and treadmill-induced slips (r ≥ 0.398, p-values ≤ 0.036). In younger adults but not in older adults, we found positive and significant correlations between the anteroposterior effect of waist-pull perturbations and optical flow perturbations, and the anteroposterior and mediolateral effect of treadmill-induced slips (r ≥ 0.428, p-values ≤ 0.021). We found no "goldilocks" perturbation paradigm to endorse that would support universal interpretations about locomotor instability. Building the most accurate patient profiles of instability likely requires a series of perturbation paradigms designed to emulate the variety of environmental contexts in which falls may occur.
Assuntos
Fluxo Óptico , Equilíbrio Postural , Humanos , Idoso , Caminhada , Acidentes por Quedas/prevenção & controle , Teste de Esforço , Marcha , Fenômenos BiomecânicosRESUMO
Estimating depth, ego-motion, and optical flow from consecutive frames is a critical task in robot navigation and has received significant attention in recent years. In this study, we propose PDF-Former, an unsupervised joint estimation network comprising a full transformer-based framework, as well as a competition and cooperation mechanism. The transformer framework captures global feature dependencies and is customized for different task types, thereby improving the performance of sequential tasks. The competition and cooperation mechanisms enable the network to obtain additional supervisory information at different training stages. Specifically, the competition mechanism is implemented early in training to achieve iterative optimization of 6 DOF poses (rotation and translation information from the target image to the two reference images), the depth of target image, and optical flow (from the target image to the two reference images) estimation in a competitive manner. In contrast, the cooperation mechanism is implemented later in training to facilitate the transmission of results among the three networks and mutually optimize the estimation results. We conducted experiments on the KITTI dataset, and the results indicate that PDF-Former has significant potential to enhance the accuracy and robustness of sequential tasks in robot navigation.