Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
1.
Mach Vis Appl ; 34(4): 68, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37457592

RESUMO

Our objective is to locate and provide a unique identifier for each mouse in a cluttered home-cage environment through time, as a precursor to automated behaviour recognition for biological research. This is a very challenging problem due to (i) the lack of distinguishing visual features for each mouse, and (ii) the close confines of the scene with constant occlusion, making standard visual tracking approaches unusable. However, a coarse estimate of each mouse's location is available from a unique RFID implant, so there is the potential to optimally combine information from (weak) tracking with coarse information on identity. To achieve our objective, we make the following key contributions: (a) the formulation of the object identification problem as an assignment problem (solved using Integer Linear Programming), (b) a novel probabilistic model of the affinity between tracklets and RFID data, and (c) a curated dataset with per-frame BB and regularly spaced ground-truth annotations for evaluating the models. The latter is a crucial part of the model, as it provides a principled probabilistic treatment of object detections given coarse localisation. Our approach achieves 77% accuracy on this animal identification problem, and is able to reject spurious detections when the animals are hidden.

2.
Bone ; 172: 116775, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37080371

RESUMO

BACKGROUND: Scoliosis is spinal curvature that may progress to require surgical stabilisation. Risk factors for progression are little understood due to lack of population-based research, since radiographs cannot be performed on entire populations due to high levels of radiation. To help address this, we have previously developed and validated a method for quantification of spinal curvature from total body dual energy X-ray absorptiometry (DXA) scans. The purpose of this study was to automate this quantification of spinal curve size from DXA scans using machine learning techniques. METHODS: To develop the automation of curve size, we utilised manually annotated scans from 7298 participants from the Avon Longitudinal Study of Parents and Children (ALSPAC) at age 9 and 5122 at age 15. To validate the automation we assessed (1) agreement between manual vs automation using the Bland-Altman limits of agreement, (2) reliability by calculating the coefficient of variation, and (3) clinical validity by running the automation on 4969 non-annotated scans at age 18 to assess the associations with physical activity, body composition, adipocyte function and backpain compared to previous literature. RESULTS: The mean difference between manual vs automated readings was less than one degree, and 90.4 % of manual vs automated readings fell within 10°. The coefficient of variation was 25.4 %. Clinical validation showed the expected relationships between curve size and physical activity, adipocyte function, height and weight. CONCLUSION: We have developed a reasonably accurate and valid automated method for quantifying spinal curvature from DXA scans for research purposes.


Assuntos
Curvaturas da Coluna Vertebral , Coluna Vertebral , Criança , Humanos , Adolescente , Absorciometria de Fóton/métodos , Estudos Longitudinais , Reprodutibilidade dos Testes , Composição Corporal
3.
IEEE Trans Pattern Anal Mach Intell ; 44(12): 8717-8727, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-30582526

RESUMO

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio. Unlike previous works that have focussed on recognising a limited number of words or phrases, we tackle lip reading as an open-world problem - unconstrained natural language sentences, and in the wild videos. Our key contributions are: (1) we compare two models for lip reading, one using a CTC loss, and the other using a sequence-to-sequence loss. Both models are built on top of the transformer self-attention architecture; (2) we investigate to what extent lip reading is complementary to audio speech recognition, especially when the audio signal is noisy; (3) we introduce and publicly release a new dataset for audio-visual speech recognition, LRS2-BBC, consisting of thousands of natural sentences from British television. The models that we train surpass the performance of all previous work on a lip reading benchmark dataset by a significant margin.


Assuntos
Percepção da Fala , Humanos , Algoritmos , Leitura Labial , Fala
4.
IEEE Trans Pattern Anal Mach Intell ; 44(10): 6767-6781, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-34166184

RESUMO

We tackle the problem of discovering novel classes in an image collection given labelled examples of other classes. We present a new approach called AutoNovel to address this problem by combining three ideas: (1) we suggest that the common approach of bootstrapping an image representation using the labelled data only introduces an unwanted bias, and that this can be avoided by using self-supervised learning to train the representation from scratch on the union of labelled and unlabelled data; (2) we use ranking statistics to transfer the model's knowledge of the labelled classes to the problem of clustering the unlabelled images; and, (3) we train the data representation by optimizing a joint objective function on the labelled and unlabelled subsets of the data, improving both the supervised classification of the labelled data, and the clustering of the unlabelled data. Moreover, we propose a method to estimate the number of classes for the case where the number of new categories is not known a priori. We evaluate AutoNovel on standard classification benchmarks and substantially outperform current methods for novel category discovery. In addition, we also show that AutoNovel can be used for fully unsupervised image clustering, achieving promising results.


Assuntos
Algoritmos , Movimento , Análise por Conglomerados
5.
IEEE Trans Pattern Anal Mach Intell ; 44(6): 3069-3081, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-33382648

RESUMO

Capturing the 'mutual gaze' of people is essential for understanding and interpreting the social interactions between them. To this end, this paper addresses the problem of detecting people Looking At Each Other (LAEO) in video sequences. For this purpose, we propose LAEO-Net++, a new deep CNN for determining LAEO in videos. In contrast to previous works, LAEO-Net++ takes spatio-temporal tracks as input and reasons about the whole track. It consists of three branches, one for each character's tracked head and one for their relative position. Moreover, we introduce two new LAEO datasets: UCO-LAEO and AVA-LAEO. A thorough experimental evaluation demonstrates the ability of LAEO-Net++ to successfully determine if two people are LAEO and the temporal window where it happens. Our model achieves state-of-the-art results on the existing TVHID-LAEO video dataset, significantly outperforming previous approaches. Finally, we apply LAEO-Net++ to a social network, where we automatically infer the social relationship between pairs of people based on the frequency and duration that they LAEO, and show that LAEO can be a useful tool for guided search of human interactions in videos.


Assuntos
Algoritmos , Humanos
6.
Sci Adv ; 7(46): eabi4883, 2021 Nov 12.
Artigo em Inglês | MEDLINE | ID: mdl-34767448

RESUMO

Large video datasets of wild animal behavior are crucial to produce longitudinal research and accelerate conservation efforts; however, large-scale behavior analyses continue to be severely constrained by time and resources. We present a deep convolutional neural network approach and fully automated pipeline to detect and track two audiovisually distinctive actions in wild chimpanzees: buttress drumming and nut cracking. Using camera trap and direct video recordings, we train action recognition models using audio and visual signatures of both behaviors, attaining high average precision (buttress drumming: 0.87 and nut cracking: 0.85), and demonstrate the potential for behavioral analysis using the automatically parsed video. Our approach produces the first automated audiovisual action recognition of wild primate behavior, setting a milestone for exploiting large datasets in ethology and conservation.

7.
Calcif Tissue Int ; 107(2): 201, 2020 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-32306058

RESUMO

In the original version of the article, the co-author would like to add to the acknowledgements section to highlight their funding stream (EPSRC). The revised acknowledgements is given below.

8.
Sci Data ; 7(1): 102, 2020 03 26.
Artigo em Inglês | MEDLINE | ID: mdl-32218449

RESUMO

Time-lapse cameras facilitate remote and high-resolution monitoring of wild animal and plant communities, but the image data produced require further processing to be useful. Here we publish pipelines to process raw time-lapse imagery, resulting in count data (number of penguins per image) and 'nearest neighbour distance' measurements. The latter provide useful summaries of colony spatial structure (which can indicate phenological stage) and can be used to detect movement - metrics which could be valuable for a number of different monitoring scenarios, including image capture during aerial surveys. We present two alternative pathways for producing counts: (1) via the Zooniverse citizen science project Penguin Watch and (2) via a computer vision algorithm (Pengbot), and share a comparison of citizen science-, machine learning-, and expert- derived counts. We provide example files for 14 Penguin Watch cameras, generated from 63,070 raw images annotated by 50,445 volunteers. We encourage the use of this large open-source dataset, and the associated processing methodologies, for both ecological studies and continued machine learning and computer vision development.


Assuntos
Ciência do Cidadão , Processamento de Imagem Assistida por Computador , Aprendizado de Máquina , Imagem com Lapso de Tempo , Algoritmos , Animais , Spheniscidae
9.
Calcif Tissue Int ; 106(4): 378-385, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-31919556

RESUMO

Scoliosis is a 3D-torsional rotation of the spine, but risk factors for initiation and progression are little understood. Research is hampered by lack of population-based research since radiographs cannot be performed on entire populations due to the relatively high levels of ionising radiation. Hence we have developed and validated a manual method for identifying scoliosis from total body dual energy X-ray absorptiometry (DXA) scans for research purposes. However, to allow full utilisation of population-based research cohorts, this needs to be automated. The purpose of this study was therefore to automate the identification of spinal curvature from total body DXA scans using machine learning techniques. To validate the automation, we assessed: (1) sensitivity, specificity and area under the receiver operator curve value (AUC) by comparison with 12,000 manually annotated images; (2) reliability by rerunning the automation on a subset of DXA scans repeated 2-6 weeks apart and calculating the kappa statistic; (3) validity by applying the automation to 5000 non-annotated images to assess associations with epidemiological variables. The final automated model had a sensitivity of 86.5%, specificity of 96.9% and an AUC of 0.80 (95%CI 0.74-0.87). There was almost perfect agreement of identification of those with scoliosis (kappa 0.90). Those with scoliosis identified by the automated model showed similar associations with gender, ethnicity, socioeconomic status, BMI and lean mass to previous literature. In conclusion, we have developed an accurate and valid automated method for identifying and quantifying spinal curvature from total body DXA scans.


Assuntos
Automação , Radiografia , Escoliose/diagnóstico por imagem , Coluna Vertebral/diagnóstico por imagem , Absorciometria de Fóton/métodos , Automação/métodos , Feminino , Humanos , Masculino , Radiografia/métodos , Reprodutibilidade dos Testes
10.
IEEE Trans Pattern Anal Mach Intell ; 42(4): 780-792, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-30596569

RESUMO

The objective of this work is automatic labelling of characters in TV video and movies, given weak supervisory information provided by an aligned transcript. We make five contributions: (i) a new strategy for obtaining stronger supervisory information from aligned transcripts; (ii) an explicit model for classifying background characters, based on their face-tracks; (iii) employing new ConvNet based face features, and (iv) a novel approach for labelling all face tracks jointly using linear programming. Each of these contributions delivers a boost in performance, and we demonstrate this on standard benchmarks using tracks provided by authors of prior work. As a fifth contribution, we also investigate the generalisation and strength of the features and classifiers by applying them "in the raw" on new video material where no supervisory information is used. In particular, to provide high quality tracks on those material, we propose efficient track classifiers to remove false positive tracks by the face tracker. Overall we achieve a dramatic improvement over the state of the art on both TV series and film datasets, and almost saturate performance on some benchmarks.

11.
NPJ Digit Med ; 2: 128, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31872068

RESUMO

The implementation of video-based non-contact technologies to monitor the vital signs of preterm infants in the hospital presents several challenges, such as the detection of the presence or the absence of a patient in the video frame, robustness to changes in lighting conditions, automated identification of suitable time periods and regions of interest from which vital signs can be estimated. We carried out a clinical study to evaluate the accuracy and the proportion of time that heart rate and respiratory rate can be estimated from preterm infants using only a video camera in a clinical environment, without interfering with regular patient care. A total of 426.6 h of video and reference vital signs were recorded for 90 sessions from 30 preterm infants in the Neonatal Intensive Care Unit (NICU) of the John Radcliffe Hospital in Oxford. Each preterm infant was recorded under regular ambient light during daytime for up to four consecutive days. We developed multi-task deep learning algorithms to automatically segment skin areas and to estimate vital signs only when the infant was present in the field of view of the video camera and no clinical interventions were undertaken. We propose signal quality assessment algorithms for both heart rate and respiratory rate to discriminate between clinically acceptable and noisy signals. The mean absolute error between the reference and camera-derived heart rates was 2.3 beats/min for over 76% of the time for which the reference and camera data were valid. The mean absolute error between the reference and camera-derived respiratory rate was 3.5 breaths/min for over 82% of the time. Accurate estimates of heart rate and respiratory rate could be derived for at least 90% of the time, if gaps of up to 30 seconds with no estimates were allowed.

12.
Physiol Meas ; 40(11): 115001, 2019 12 02.
Artigo em Inglês | MEDLINE | ID: mdl-31661680

RESUMO

Non-contact vital sign monitoring enables the estimation of vital signs, such as heart rate, respiratory rate and oxygen saturation (SpO2), by measuring subtle color changes on the skin surface using a video camera. For patients in a hospital ward, the main challenges in the development of continuous and robust non-contact monitoring techniques are the identification of time periods and the segmentation of skin regions of interest (ROIs) from which vital signs can be estimated. We propose a deep learning framework to tackle these challenges. APPROACH: This paper presents two convolutional neural network (CNN) models. The first network was designed for detecting the presence of a patient and segmenting the patient's skin area. The second network combined the output from the first network with optical flow for identifying time periods of clinical intervention so that these periods can be excluded from the estimation of vital signs. Both networks were trained using video recordings from a clinical study involving 15 pre-term infants conducted in the high dependency area of the neonatal intensive care unit (NICU) of the John Radcliffe Hospital in Oxford, UK. MAIN RESULTS: Our proposed methods achieved an accuracy of 98.8% for patient detection, a mean intersection-over-union (IOU) score of 88.6% for skin segmentation and an accuracy of 94.5% for clinical intervention detection using two-fold cross validation. Our deep learning models produced accurate results and were robust to different skin tones, changes in light conditions, pose variations and different clinical interventions by medical staff and family visitors. SIGNIFICANCE: Our approach allows cardio-respiratory signals to be continuously derived from the patient's skin during which the patient is present and no clinical intervention is undertaken.


Assuntos
Aprendizado Profundo , Coração/fisiologia , Monitorização Fisiológica , Respiração , Processamento de Sinais Assistido por Computador , Gravação em Vídeo , Sinais Vitais/fisiologia , Automação , Feminino , Humanos , Processamento de Imagem Assistida por Computador , Recém-Nascido , Recém-Nascido Prematuro , Masculino , Redes Neurais de Computação , Pele
13.
Sci Adv ; 5(9): eaaw0736, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31517043

RESUMO

Video recording is now ubiquitous in the study of animal behavior, but its analysis on a large scale is prohibited by the time and resources needed to manually process large volumes of data. We present a deep convolutional neural network (CNN) approach that provides a fully automated pipeline for face detection, tracking, and recognition of wild chimpanzees from long-term video records. In a 14-year dataset yielding 10 million face images from 23 individuals over 50 hours of footage, we obtained an overall accuracy of 92.5% for identity recognition and 96.2% for sex recognition. Using the identified faces, we generated co-occurrence matrices to trace changes in the social network structure of an aging population. The tools we developed enable easy processing and annotation of video datasets, including those from other species. Such automated analysis unveils the future potential of large-scale longitudinal video archives to address fundamental questions in behavior and conservation.


Assuntos
Reconhecimento Facial/fisiologia , Pan troglodytes/fisiologia , Gravação em Vídeo , Animais , Feminino , Masculino
15.
IEEE Trans Pattern Anal Mach Intell ; 41(1): 93-106, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-29990013

RESUMO

Our goal in this paper is to investigate properties of 3D shape that can be determined from a single image. We define 3D shape attributes-generic properties of the shape that capture curvature, contact and occupied space. Our first objective is to infer these 3D shape attributes from a single image. A second objective is to infer a 3D shape embedding-a low dimensional vector representing the 3D shape. We study how the 3D shape attributes and embedding can be obtained from a single image by training a Convolutional Neural Network (CNN) for this task. We start with synthetic images so that the contribution of various cues and nuisance parameters can be controlled. We then turn to real images and introduce a large scale image dataset of sculptures containing 143K images covering 2197 works from 242 artists. For the CNN trained on the sculpture dataset we show the following: (i) which regions of the imaged sculpture are used by the CNN to infer the 3D shape attributes; (ii) that the shape embedding can be used to match previously unseen sculptures largely independent of viewpoint; and (iii) that the 3D attributes generalize to images of other (non-sculpture) object classes.

16.
Sci Data ; 5: 180124, 2018 06 26.
Artigo em Inglês | MEDLINE | ID: mdl-29944146

RESUMO

Automated time-lapse cameras can facilitate reliable and consistent monitoring of wild animal populations. In this report, data from 73,802 images taken by 15 different Penguin Watch cameras are presented, capturing the dynamics of penguin (Spheniscidae; Pygoscelis spp.) breeding colonies across the Antarctic Peninsula, South Shetland Islands and South Georgia (03/2012 to 01/2014). Citizen science provides a means by which large and otherwise intractable photographic data sets can be processed, and here we describe the methodology associated with the Zooniverse project Penguin Watch, and provide validation of the method. We present anonymised volunteer classifications for the 73,802 images, alongside the associated metadata (including date/time and temperature information). In addition to the benefits for ecological monitoring, such as easy detection of animal attendance patterns, this type of annotated time-lapse imagery can be employed as a training tool for machine learning algorithms to automate data extraction, and we encourage the use of this data set for computer vision development.


Assuntos
Spheniscidae , Imagem com Lapso de Tempo/métodos , Animais , Regiões Antárticas , Monitorização de Parâmetros Ecológicos/métodos , Dinâmica Populacional
17.
Sci Adv ; 4(4): eaar4004, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-29662954

RESUMO

The crystallization of solidifying Al-Cu alloys over a wide range of conditions was studied in situ by synchrotron x-ray radiography, and the data were analyzed using a computer vision algorithm trained using machine learning. The effect of cooling rate and solute concentration on nucleation undercooling, crystal formation rate, and crystal growth rate was measured automatically for thousands of separate crystals, which was impossible to achieve manually. Nucleation undercooling distributions confirmed the efficiency of extrinsic grain refiners and gave support to the widely assumed free growth model of heterogeneous nucleation. We show that crystallization occurred in temporal and spatial bursts associated with a solute-suppressed nucleation zone.

18.
Med Image Anal ; 46: 1-14, 2018 05.
Artigo em Inglês | MEDLINE | ID: mdl-29499436

RESUMO

Methods for aligning 3D fetal neurosonography images must be robust to (i) intensity variations, (ii) anatomical and age-specific differences within the fetal population, and (iii) the variations in fetal position. To this end, we propose a multi-task fully convolutional neural network (FCN) architecture to address the problem of 3D fetal brain localization, structural segmentation, and alignment to a referential coordinate system. Instead of treating these tasks as independent problems, we optimize the network by simultaneously learning features shared within the input data pertaining to the correlated tasks, and later branching out into task-specific output streams. Brain alignment is achieved by defining a parametric coordinate system based on skull boundaries, location of the eye sockets, and head pose, as predicted from intracranial structures. This information is used to estimate an affine transformation to align a volumetric image to the skull-based coordinate system. Co-alignment of 140 fetal ultrasound volumes (age range: 26.0 ±â€¯4.4 weeks) was achieved with high brain overlap and low eye localization error, regardless of gestational age or head size. The automatically co-aligned volumes show good structural correspondence between fetal anatomies.


Assuntos
Encéfalo/diagnóstico por imagem , Encéfalo/embriologia , Imageamento Tridimensional/métodos , Redes Neurais de Computação , Neuroimagem/métodos , Ultrassonografia Pré-Natal/métodos , Adulto , Algoritmos , Feminino , Idade Gestacional , Humanos , Processamento de Imagem Assistida por Computador/métodos , Gravidez
19.
Med Image Anal ; 41: 63-73, 2017 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-28756059

RESUMO

The objective of this work is to automatically produce radiological gradings of spinal lumbar MRIs and also localize the predicted pathologies. We show that this can be achieved via a Convolutional Neural Network (CNN) framework that takes intervertebral disc volumes as inputs and is trained only on disc-specific class labels. Our contributions are: (i) a CNN architecture that predicts multiple gradings at once, and we propose variants of the architecture including using 3D convolutions; (ii) showing that this architecture can be trained using a multi-task loss function without requiring segmentation level annotation; and (iii) a localization method that clearly shows pathological regions in the disc volumes. We compare three visualization methods for the localization. The network is applied to a large corpus of MRI T2 sagittal spinal MRIs (using a standard clinical scan protocol) acquired from multiple machines, and is used to automatically compute disk and vertebra gradings for each MRI. These are: Pfirrmann grading, disc narrowing, upper/lower endplate defects, upper/lower marrow changes, spondylolisthesis, and central canal stenosis. We report near human performances across the eight gradings, and also visualize the evidence for these gradings localized on the original scans.


Assuntos
Processamento Eletrônico de Dados , Vértebras Lombares/diagnóstico por imagem , Imageamento por Ressonância Magnética/métodos , Humanos , Disco Intervertebral/diagnóstico por imagem , Redes Neurais de Computação , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
20.
Eur Spine J ; 26(5): 1374-1383, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-28168339

RESUMO

STUDY DESIGN: Investigation of the automation of radiological features from magnetic resonance images (MRIs) of the lumbar spine. OBJECTIVE: To automate the process of grading lumbar intervertebral discs and vertebral bodies from MRIs. MR imaging is the most common imaging technique used in investigating low back pain (LBP). Various features of degradation, based on MRIs, are commonly recorded and graded, e.g., Modic change and Pfirrmann grading of intervertebral discs. Consistent scoring and grading is important for developing robust clinical systems and research. Automation facilitates this consistency and reduces the time of radiological analysis considerably and hence the expense. METHODS: 12,018 intervertebral discs, from 2009 patients, were graded by a radiologist and were then used to train: (1) a system to detect and label vertebrae and discs in a given scan, and (2) a convolutional neural network (CNN) model that predicts several radiological gradings. The performance of the model, in terms of class average accuracy, was compared with the intra-observer class average accuracy of the radiologist. RESULTS: The detection system achieved 95.6% accuracy in terms of disc detection and labeling. The model is able to produce predictions of multiple pathological gradings that consistently matched those of the radiologist. The model identifies 'Evidence Hotspots' that are the voxels that most contribute to the degradation scores. CONCLUSIONS: Automation of radiological grading is now on par with human performance. The system can be beneficial in aiding clinical diagnoses in terms of objectivity of gradings and the speed of analysis. It can also draw the attention of a radiologist to regions of degradation. This objectivity and speed is an important stepping stone in the investigation of the relationship between MRIs and clinical diagnoses of back pain in large cohorts. LEVEL OF EVIDENCE: Level 3.


Assuntos
Disco Intervertebral/diagnóstico por imagem , Vértebras Lombares/diagnóstico por imagem , Imageamento por Ressonância Magnética , Redes Neurais de Computação , Radiologistas , Medula Óssea/diagnóstico por imagem , Humanos , Degeneração do Disco Intervertebral/diagnóstico por imagem , Masculino , Pessoa de Meia-Idade , Estenose Espinal/diagnóstico por imagem , Espondilolistese/diagnóstico por imagem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...