Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
Sensors (Basel) ; 23(20)2023 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-37896686

RESUMO

The precise detection of stratum interfaces holds significant importance in geological discontinuity recognition and roadway support optimization. In this study, the model for locating rock interfaces through change point detection was proposed, and a drilling test on composite strength mortar specimens was conducted. With the logistic function and the particle swarm optimization algorithm, the drilling specific energy was modulated to detect the stratum interface. The results indicate that the drilling specific energy after the modulation of the logistic function showed a good anti-interference quality under stable drilling and sensitivity under interface drilling, and its average recognition error was 2.83 mm, which was lower than the error of 6.56 mm before modulation. The particle swarm optimization algorithm facilitated the adaptive matching of drive parameters to drilling data features, yielding a substantial 50.88% decrease in the recognition error rate. This study contributes to enhancing the perception accuracy of stratum interfaces and eliminating the potential danger of roof collapse.

2.
Artigo em Inglês | MEDLINE | ID: mdl-38512732

RESUMO

Self-supervised learning aims to learn representation that can be effectively generalized to downstream tasks. Many self-supervised approaches regard two views of an image as both the input and the self-supervised signals, assuming that either view contains the same task-relevant information and the shared information is (approximately) sufficient for predicting downstream tasks. Recent studies show that discarding superfluous information not shared between the views can improve generalization. Hence, the ideal representation is sufficient for downstream tasks and contains minimal superfluous information, termed minimal sufficient representation. One can learn this representation by maximizing the mutual information between the representation and the supervised view while eliminating superfluous information. Nevertheless, the computation of mutual information is notoriously intractable. In this work, we propose an objective termed multi-view entropy bottleneck (MVEB) to learn minimal sufficient representation effectively. MVEB simplifies the minimal sufficient learning to maximizing both the agreement between the embeddings of two views and the differential entropy of the embedding distribution. Our experiments confirm that MVEB significantly improves performance. For example, it achieves top-1 accuracy of 76.9% on ImageNet with a vanilla ResNet-50 backbone on linear evaluation. To the best of our knowledge, this is the new state-of-the-art result with ResNet-50.

3.
Artigo em Inglês | MEDLINE | ID: mdl-38819971

RESUMO

Vision-Language Navigation (VLN) requires the agent to follow language instructions to reach a target position. A key factor for successful navigation is to align the landmarks implied in the instruction with diverse visual observations. However, previous VLN agents fail to perform accurate modality alignment especially in unexplored scenes, since they learn from limited navigation data and lack sufficient open-world alignment knowledge. In this work, we propose a new VLN paradigm, called COrrectable LaNdmark DiScOvery via Large ModEls (CONSOLE). In CONSOLE, we cast VLN as an open-world sequential landmark discovery problem, by introducing a novel correctable landmark discovery scheme based on two large models ChatGPT and CLIP. Specifically, we use ChatGPT to provide rich open-world landmark cooccurrence commonsense, and conduct CLIP-driven landmark discovery based on these commonsense priors. To mitigate the noise in the priors due to the lack of visual constraints, we introduce a learnable cooccurrence scoring module, which corrects the importance of each cooccurrence according to actual observations for accurate landmark discovery. We further design an observation enhancement strategy for an elegant combination of our framework with different VLN agents, where we utilize the corrected landmark features to obtain enhanced observation features for action decision. Extensive experimental results on multiple popular VLN benchmarks (R2R, REVERIE, R4R, RxR) show the significant superiority of CONSOLE over strong baselines. Especially, our CONSOLE establishes the new state-of-the-art results on R2R and R4R in unseen scenarios.

4.
Sci Rep ; 13(1): 20667, 2023 Nov 24.
Artigo em Inglês | MEDLINE | ID: mdl-38001131

RESUMO

Aiming at the engineering problem of roadway deformation and instability of swelling soft rock widely existed in Kailuan mining area, the mineral composition and microstructure of such soft rock were obtained by conducting scanning electron microscopy, X-ray diffraction experiments, uniaxial and conventional triaxial tests, and the law of softening and expanding of such soft rock and the failure mechanism of surrounding rock were identified. The combined support scheme of multi-level anchor bolt, bottom corner pressure relief and fractional grouting is proposed. The roadway supporting parameters are adjusted and optimized by FLAC3D numerical simulation, and three supporting methods of multi-layer anchor bolt, bottom corner pressure relief and fractional grouting are determined and their parameters are optimized. The study results show that: the total amount of clay minerals is 53-75%, pores, fissures, nanoscale and micron layer gaps are developed, providing a penetrating channel for water infiltration to soften the surrounding rock; the three-level anchor pressure-relief and grouting support technology can control the sinking amount of the roof within 170 mm, the bottom drum amount within 210 mm, the bolts of each level is evenly distributed in tension, and the maximum stress and bottom drum displacement in the pressure relief area are significantly reduced; the pressure-relief groove promotes the development of bottom corner cracks, accelerates the secondary distribution of peripheral stress, and weakens the effect of high stress on the shallow area. Using time or displacement as the index, optimizing the grouting time, filling the primary and excavation cracks, blocking the expansion and softening effect of water on the rock mass, realizing the dynamic unity of structural yielding pressure and surrounding rock modification, has guiding significance for the support control of soft rock roadway.

5.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 10027-10043, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37022275

RESUMO

Super-Resolution from a single motion Blurred image (SRB) is a severely ill-posed problem due to the joint degradation of motion blurs and low spatial resolution. In this article, we employ events to alleviate the burden of SRB and propose an Event-enhanced SRB (E-SRB) algorithm, which can generate a sequence of sharp and clear images with High Resolution (HR) from a single blurry image with Low Resolution (LR). To achieve this end, we formulate an event-enhanced degeneration model to consider the low spatial resolution, motion blurs, and event noises simultaneously. We then build an event-enhanced Sparse Learning Network (eSL-Net++) upon a dual sparse learning scheme where both events and intensity frames are modeled with sparse representations. Furthermore, we propose an event shuffle-and-merge scheme to extend the single-frame SRB to the sequence-frame SRB without any additional training process. Experimental results on synthetic and real-world datasets show that the proposed eSL-Net++ outperforms state-of-the-art methods by a large margin. Datasets, codes, and more results are available at https://github.com/ShinyWang33/eSL-Net-Plusplus.

6.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12133-12147, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37200122

RESUMO

Despite the substantial progress of active learning for image recognition, there lacks a systematic investigation of instance-level active learning for object detection. In this paper, we propose to unify instance uncertainty calculation with image uncertainty estimation for informative image selection, creating a multiple instance differentiation learning (MIDL) method for instance-level active learning. MIDL consists of a classifier prediction differentiation module and a multiple instance differentiation module. The former leverages two adversarial instance classifiers trained on the labeled and unlabeled sets to estimate instance uncertainty of the unlabeled set. The latter treats unlabeled images as instance bags and re-estimates image-instance uncertainty using the instance classification model in a multiple instance learning fashion. Through weighting the instance uncertainty using instance class probability and instance objectness probability under the total probability formula, MIDL unifies the image uncertainty with instance uncertainty in the Bayesian theory framework. Extensive experiments validate that MIDL sets a solid baseline for instance-level active learning. On commonly used object detection datasets, it outperforms other state-of-the-art methods by significant margins, particularly when the labeled sets are small.

7.
IEEE Trans Image Process ; 31: 4023-4038, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35679376

RESUMO

In recent years, image denoising has benefited a lot from deep neural networks. However, these models need large amounts of noisy-clean image pairs for supervision. Although there have been attempts in training denoising networks with only noisy images, existing self-supervised algorithms suffer from inefficient network training, heavy computational burden, or dependence on noise modeling. In this paper, we proposed a self-supervised framework named Neighbor2Neighbor for deep image denoising. We develop a theoretical motivation and prove that by designing specific samplers for training image pairs generation from only noisy images, we can train a self-supervised denoising network similar to the network trained with clean images supervision. Besides, we propose a regularizer in the perspective of optimization to narrow the optimization gap between the self-supervised denoiser and the supervised denoiser. We present a very simple yet effective self-supervised training scheme based on the theoretical understandings: training image pairs are generated by random neighbor sub-samplers, and denoising networks are trained with a regularized loss. Moreover, we propose a training strategy named BayerEnsemble to adapt the Neighbor2Neighbor framework in raw image denoising. The proposed Neighbor2Neighbor framework can enjoy the progress of state-of-the-art supervised denoising networks in network architecture design. It also avoids heavy dependence on the assumption of the noise distribution. We evaluate the Neighbor2Neighbor framework through extensive experiments, including synthetic experiments with different noise distributions and real-world experiments under various scenarios. The code is available online: https://github.com/TaoHuang2018/Neighbor2Neighbor.

8.
IEEE Trans Neural Netw Learn Syst ; 33(12): 7091-7100, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34125685

RESUMO

We propose a novel network pruning approach by information preserving of pretrained network weights (filters). Network pruning with the information preserving is formulated as a matrix sketch problem, which is efficiently solved by the off-the-shelf frequent direction method. Our approach, referred to as FilterSketch, encodes the second-order information of pretrained weights, which enables the representation capacity of pruned networks to be recovered with a simple fine-tuning procedure. FilterSketch requires neither training from scratch nor data-driven iterative optimization, leading to a several-orders-of-magnitude reduction of time cost in the optimization of pruning. Experiments on CIFAR-10 show that FilterSketch reduces 63.3% of floating-point operations (FLOPs) and prunes 59.9% of network parameters with negligible accuracy cost for ResNet-110. On ILSVRC-2012, it reduces 45.5% of FLOPs and removes 43.0% of parameters with only 0.69% accuracy drop for ResNet-50. Our code and pruned models can be found at https://github.com/lmbxmu/FilterSketch.

9.
IEEE Trans Image Process ; 30: 2538-2548, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33481714

RESUMO

Natural language moment localization aims at localizing video clips according to a natural language description. The key to this challenging task lies in modeling the relationship between verbal descriptions and visual contents. Existing approaches often sample a number of clips from the video, and individually determine how each of them is related to the query sentence. However, this strategy can fail dramatically, in particular when the query sentence refers to some visual elements that appear outside of, or even are distant from, the target clip. In this paper, we address this issue by designing an Interaction-Integrated Network (I2N), which contains a few Interaction-Integrated Cells (I2Cs). The idea lies in the observation that the query sentence not only provides a description to the video clip, but also contains semantic cues on the structure of the entire video. Based on this, I2Cs go one step beyond modeling short-term contexts in the time domain by encoding long-term video content into every frame feature. By stacking a few I2Cs, the obtained network, I2N, enjoys an improved ability of inference, brought by both (I) multi-level correspondence between vision and language and (II) more accurate cross-modal alignment. When evaluated on a challenging video moment localization dataset named DiDeMo, I2N outperforms the state-of-the-art approach by a clear margin of 1.98%. On other two challenging datasets, Charades-STA and TACoS, I2N also reports competitive performance.

10.
IEEE Trans Image Process ; 30: 3908-3921, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33750690

RESUMO

This paper presents a learning-based approach to synthesize the view from an arbitrary camera position given a sparse set of images. A key challenge for this novel view synthesis arises from the reconstruction process, when the views from different input images may not be consistent due to obstruction in the light path. We overcome this by jointly modeling the epipolar property and occlusion in designing a convolutional neural network. We start by defining and computing the aperture disparity map, which approximates the parallax and measures the pixel-wise shift between two views. While this relates to free-space rendering and can fail near the object boundaries, we further develop a warping confidence map to address pixel occlusion in these challenging regions. The proposed method is evaluated on diverse real-world and synthetic light field scenes, and it shows better performance over several state-of-the-art techniques.

11.
IEEE Trans Image Process ; 30: 2060-2071, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33460378

RESUMO

Person re-identification is a crucial task of identifying pedestrians of interest across multiple surveillance camera views. For person re-identification, a pedestrian is usually represented with features extracted from a rectangular image region that inevitably contains the scene background, which incurs ambiguity to distinguish different pedestrians and degrades the accuracy. Thus, we propose an end-to-end foreground-aware network to discriminate against the foreground from the background by learning a soft mask for person re-identification. In our method, in addition to the pedestrian ID as supervision for the foreground, we introduce the camera ID of each pedestrian image for background modeling. The foreground branch and the background branch are optimized collaboratively. By presenting a target attention loss, the pedestrian features extracted from the foreground branch become more insensitive to backgrounds, which greatly reduces the negative impact of changing backgrounds on pedestrian matching across different camera views. Notably, in contrast to existing methods, our approach does not require an additional dataset to train a human landmark detector or a segmentation model for locating the background regions. The experimental results conducted on three challenging datasets, i.e., Market-1501, DukeMTMC-reID, and MSMT17, demonstrate the effectiveness of our approach.


Assuntos
Identificação Biométrica/métodos , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina , Algoritmos , Humanos , Pedestres , Gravação em Vídeo
12.
IEEE Trans Pattern Anal Mach Intell ; 31(1): 180-6, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19029556

RESUMO

In computer vision, shape matching is a challenging problem, especially when articulation and deformation of parts occur. These variations may be insignificant in terms of human recognition, but often cause a matching algorithm to give results that are inconsistent with our perception. In this paper, we propose a novel shape descriptor of planar contours, called contour flexibility, which represents the deformable potential at each point along a contour. With this descriptor, The local and global features can be obtained from the contour. We then present a shape matching scheme based on the features obtained. Experiments with comparisons to recently published algorithms show that our algorithm performs best.


Assuntos
Algoritmos , Inteligência Artificial , Interpretação de Imagem Assistida por Computador/métodos , Modelos Teóricos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Simulação por Computador
13.
IEEE Trans Pattern Anal Mach Intell ; 31(8): 1535-6, 2009 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-19542587

RESUMO

Varley [1] made comments on our paper in [2] section by section. We answer them in this response paper.

14.
IEEE Trans Image Process ; 18(1): 140-50, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19095525

RESUMO

For the problem of image registration, the top few reliable correspondences are often relatively easy to obtain, while the overall matching accuracy may fall drastically as the desired correspondence number increases. In this paper, we present an efficient feature matching algorithm to employ sparse reliable correspondence priors for piloting the feature matching process. First, the feature geometric relationship within individual image is encoded as a spatial graph, and the pairwise feature similarity is expressed as a bipartite similarity graph between two feature sets; then the geometric neighborhood of the pairwise assignment is represented by a categorical product graph, along which the reliable correspondences are propagated; and finally a closed-form solution for feature matching is deduced by ensuring the feature geometric coherency as well as pairwise feature agreements. Furthermore, our algorithm is naturally applicable for incorporating manual correspondence priors for semi-supervised feature matching. Extensive experiments on both toy examples and real-world applications demonstrate the superiority of our algorithm over the state-of-the-art feature matching techniques.


Assuntos
Algoritmos , Inteligência Artificial , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
15.
IEEE Trans Pattern Anal Mach Intell ; 30(3): 507-17, 2008 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-18195443

RESUMO

The human vision system can interpret a single 2D line drawing as a 3D object without much difficulty even if the hidden lines of the object are invisible. Many reconstruction methods have been proposed to emulate this ability, but they cannot recover the complete object if the hidden lines of the object are not shown. This paper proposes a novel approach to reconstructing a complete 3D object, including the shape of the back of the object, from a line drawing without hidden lines. First, we develop theoretical constraints and an algorithm for the inference of the topology of the invisible edges and vertices of an object. Then we present a reconstruction method based on perceptual symmetry and planarity of the object. We show a number of examples to demonstrate the success of our approach.


Assuntos
Algoritmos , Inteligência Artificial , Desenho Assistido por Computador , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Reconhecimento Automatizado de Padrão/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
16.
IEEE Trans Pattern Anal Mach Intell ; 30(2): 315-27, 2008 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-18084061

RESUMO

In previous optimization-based methods of 3D planar-faced object reconstruction from single 2D line drawings, the missing depths of the vertices of a line drawing (and other parameters in some methods) are used as the variables of the objective functions. A 3D object with planar faces is derived by finding values for these variables that minimize the objective functions. These methods work well for simple objects with a small number N of variables. As N grows, however, it is very difficult for them to find expected objects. This is because with the nonlinear objective functions in a space of large dimension N, the search for optimal solutions can easily get trapped into local minima. In this paper, we use the parameters of the planes that pass through the planar faces of an object as the variables of the objective function. This leads to a set of linear constraints on the planes of the object, resulting in a much lower dimensional nullspace where optimization is easier to achieve. We prove that the dimension of this nullspace is exactly equal to the minimum number of vertex depths which define the 3D object. Since a practical line drawing is usually not an exact projection of a 3D object, we expand the nullspace to a larger space based on the singular value decomposition of the projection matrix of the line drawing. In this space, robust 3D reconstruction can be achieved. Compared with two most related methods, our method not only can reconstruct more complex 3D objects from 2D line drawings, but also is computationally more efficient.

17.
IEEE Trans Image Process ; 27(9): 4357-4366, 2018 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-29870353

RESUMO

In steerable filters, a filter of arbitrary orientation can be generated by a linear combination of a set of "basis filters." Steerable properties dominate the design of the traditional filters, e.g., Gabor filters and endow features the capability of handling spatial transformations. However, such properties have not yet been well explored in the deep convolutional neural networks (DCNNs). In this paper, we develop a new deep model, namely, Gabor convolutional networks (GCNs or Gabor CNNs), with Gabor filters incorporated into DCNNs such that the robustness of learned features against the orientation and scale changes can be reinforced. By manipulating the basic element of DCNNs, i.e., the convolution operator, based on Gabor filters, GCNs can be easily implemented and are readily compatible with any popular deep learning architecture. We carry out extensive experiments to demonstrate the promising performance of our GCNs framework, and the results show its superiority in recognizing objects, especially when the scale and rotation changes take place frequently. Moreover, the proposed GCNs have much fewer network parameters to be learned and can effectively reduce the training complexity of the network, leading to a more compact deep learning model while still maintaining a high feature representation capacity. The source code can be found at https://github.com/bczhangbczhang.

18.
IEEE Trans Image Process ; 16(11): 2802-10, 2007 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-17990756

RESUMO

This paper presents a unified solution to three unsolved problems existing in face verification with subspace learning techniques: selection of verification threshold, automatic determination of subspace dimension, and deducing feature fusing weights. In contrast to previous algorithms which search for the projection matrix directly, our new algorithm investigates a similarity metric matrix (SMM). With a certain verification threshold, this matrix is learned by a semidefinite programming approach, along with the constraints of the kindred pairs with similarity larger than the threshold, and inhomogeneous pairs with similarity smaller than the threshold. Then, the subspace dimension and the feature fusing weights are simultaneously inferred from the singular value decomposition of the derived SMM. In addition, the weighted and tensor extensions are proposed to further improve the algorithmic effectiveness and efficiency, respectively. Essentially, the verification is conducted within an affine subspace in this new algorithm and is, hence, called the affine subspace for verification (ASV). Extensive experiments show that the ASV can achieve encouraging face verification accuracy in comparison to other subspace algorithms, even without the need to explore any parameters.


Assuntos
Algoritmos , Inteligência Artificial , Biometria/métodos , Face/anatomia & histologia , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Humanos , Armazenamento e Recuperação da Informação/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
19.
IEEE Trans Pattern Anal Mach Intell ; 27(6): 861-72, 2005 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-15943419

RESUMO

Single 2D line drawing is a straightforward method to illustrate 3D objects. The faces of an object depicted by a line drawing give very useful information for the reconstruction of its 3D geometry. Two recently proposed methods for face identification from line drawings are based on two steps: finding a set of circuits that may be faces and searching for real faces from the set according to some criteria. The two steps, however, involve two combinatorial problems. The number of the circuits generated in the first step grows exponentially with the number of edges of a line drawing. These circuits are then used as the input to the second combinatorial search step. When dealing with objects having more faces, the combinatorial explosion prevents these methods from finding solutions within feasible time. This paper proposes a new method to tackle the face identification problem by a variable-length genetic algorithm with a novel heuristic and geometric constraints incorporated for local search. The hybrid GA solves the two combinatorial problems simultaneously. Experimental results show that our algorithm can find the faces of a line drawing having more than 30 faces much more efficiently. In addition, simulated annealing for solving the face identification problem is also implemented for comparison.


Assuntos
Algoritmos , Inteligência Artificial , Gráficos por Computador , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodos , Aumento da Imagem/métodos
20.
IEEE Trans Neural Netw ; 13(4): 961-71, 2002.
Artigo em Inglês | MEDLINE | ID: mdl-18244491

RESUMO

We present a video caption detection and recognition system based on a fuzzy-clustering neural network (FCNN) classifier. Using a novel caption-transition detection scheme we locate both spatial and temporal positions of video captions with high precision and efficiency. Then employing several new character segmentation and binarization techniques, we improve the Chinese video-caption recognition accuracy from 13% to 86% on a set of news video captions. As the first attempt on Chinese video-caption recognition, our experiment results are very encouraging.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA