Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 8.451
Filter
1.
J Acoust Soc Am ; 156(1): 326-340, 2024 Jul 01.
Article in English | MEDLINE | ID: mdl-38990035

ABSTRACT

Humans are adept at identifying spectral patterns, such as vowels, in different rooms, at different sound levels, or produced by different talkers. How this feat is achieved remains poorly understood. Two psychoacoustic analogs of spectral pattern recognition are spectral profile analysis and spectrotemporal ripple direction discrimination. This study tested whether pattern-recognition abilities observed previously at low frequencies are also observed at extended high frequencies. At low frequencies (center frequency ∼500 Hz), listeners were able to achieve accurate profile-analysis thresholds, consistent with prior literature. However, at extended high frequencies (center frequency ∼10 kHz), listeners' profile-analysis thresholds were either unmeasurable or could not be distinguished from performance based on overall loudness cues. A similar pattern of results was observed with spectral ripple discrimination, where performance was again considerably better at low than at high frequencies. Collectively, these results suggest a severe deficit in listeners' ability to analyze patterns of intensity across frequency in the extended high-frequency region that cannot be accounted for by cochlear frequency selectivity. One interpretation is that the auditory system is not optimized to analyze such fine-grained across-frequency profiles at extended high frequencies, as they are not typically informative for everyday sounds.


Subject(s)
Acoustic Stimulation , Auditory Threshold , Psychoacoustics , Humans , Young Adult , Female , Male , Adult , Cues , Speech Perception/physiology , Sound Spectrography , Loudness Perception , Pattern Recognition, Physiological
2.
Sensors (Basel) ; 24(14)2024 Jul 17.
Article in English | MEDLINE | ID: mdl-39066023

ABSTRACT

Patients suffering from Parkinson's disease suffer from voice impairment. In this study, we introduce models to classify normal and Parkinson's patients using their speech. We used an AST (audio spectrogram transformer), a transformer-based speech classification model that has recently outperformed CNN-based models in many fields, and a CNN-based PSLA (pretraining, sampling, labeling, and aggregation), a high-performance model in the existing speech classification field, for the study. This study compares and analyzes the models from both quantitative and qualitative perspectives. First, qualitatively, PSLA outperformed AST by more than 4% in accuracy, and the AUC was also higher, with 94.16% for AST and 97.43% for PSLA. Furthermore, we qualitatively evaluated the ability of the models to capture the acoustic features of Parkinson's through various CAM (class activation map)-based XAI (eXplainable AI) models such as GradCAM and EigenCAM. Based on PSLA, we found that the model focuses well on the muffled frequency band of Parkinson's speech, and the heatmap analysis of false positives and false negatives shows that the speech features are also visually represented when the model actually makes incorrect predictions. The contribution of this paper is that we not only found a suitable model for diagnosing Parkinson's through speech using two different types of models but also validated the predictions of the model in practice.


Subject(s)
Parkinson Disease , Speech , Humans , Parkinson Disease/diagnosis , Parkinson Disease/classification , Parkinson Disease/physiopathology , Speech/physiology , Male , Female , Sound Spectrography/methods , Reproducibility of Results , Neural Networks, Computer , Aged , Middle Aged
3.
J Acoust Soc Am ; 156(1): 16-28, 2024 Jul 01.
Article in English | MEDLINE | ID: mdl-38949290

ABSTRACT

Echolocating bats are known to vary their waveforms at the phases of searching, approaching, and capturing the prey. It is meaningful to estimate the parameters of the calls for bat species identification and the technological improvements of the synthetic systems, such as radar and sonar. The type of bat calls is species-related, and many calls can be modeled as hyperbolic frequency- modulated (HFM) signals. To obtain the parameters of the HFM-modeled bat calls, a reversible integral transform, i.e., hyperbolic scale transform (HST), is proposed to transform a call into two-dimensional peaks in the "delay-scale" domain, based on which harmonic separation and parameter estimation are realized. Compared with the methods based on time-frequency analysis, the HST-based method does not need to extract the instantaneous frequency of the bat calls, only searching for peaks. The verification results show that the HST is suitable for analyzing the HFM-modeled bat calls containing multiple harmonics with a large energy difference, and the estimated parameters imply that the use of the waveforms from the searching phase to the capturing phase is beneficial to reduce the ranging bias, and the trends in parameters may be useful for bat species identification.


Subject(s)
Acoustics , Chiroptera , Echolocation , Signal Processing, Computer-Assisted , Vocalization, Animal , Chiroptera/physiology , Chiroptera/classification , Animals , Vocalization, Animal/classification , Sound Spectrography , Time Factors , Models, Theoretical
4.
J Acoust Soc Am ; 156(1): 524-533, 2024 Jul 01.
Article in English | MEDLINE | ID: mdl-39024385

ABSTRACT

Advertisement vocalizations that function in mate acquisition and resource defense within species may also mediate behavioral interactions among species. While olfactory signals play an important role in mate choice and territoriality in rodents, less is known about the function of acoustic signals in influencing interspecific interactions. In this study, we used playback experiments in the laboratory to assess the function of long-distance vocalizations within and among three sympatric species of grasshopper mice. We found that, within each species, individuals of both sexes varied widely in spontaneous vocal behavior and response to playback. The largest species (Onychomys leucogaster) was most responsive to conspecifics, but smaller O. arenicola and O. torridus exhibited no clear pattern in their vocal behavior and were even responsive to the white noise controls. Our results indicate that grasshopper mice are broadly responsive to a range of sounds that resemble calls and that long-distance vocalizations function primarily as signals that facilitate localization for subsequent close-distance assessment by both sexes in various social contexts. Variation in vocal responses among species may depend on competitive dominance, degree of interaction, acoustic similarity, or behavioral changes resulting from captivity. Replicating playback experiments in the field will help validate whether the observed variation in the laboratory reflects ecologically relevant patterns in nature.


Subject(s)
Species Specificity , Vocalization, Animal , Animals , Male , Female , Sympatry , Sound Spectrography , Acoustic Stimulation , Acoustics
5.
J Acoust Soc Am ; 155(6): 3822-3832, 2024 Jun 01.
Article in English | MEDLINE | ID: mdl-38874464

ABSTRACT

This study proposes the use of vocal resonators to enhance cardiac auscultation signals and evaluates their performance for voice-noise suppression. Data were collected using two electronic stethoscopes while each study subject was talking. One collected auscultation signal from the chest while the other collected voice signals from one of the three voice resonators (cheek, back of the neck, and shoulder). The spectral subtraction method was applied to the signals. Both objective and subjective metrics were used to evaluate the quality of enhanced signals and to investigate the most effective vocal resonator for noise suppression. Our preliminary findings showed a significant improvement after enhancement and demonstrated the efficacy of vocal resonators. A listening survey was conducted with thirteen physicians to evaluate the quality of enhanced signals, and they have received significantly better scores regarding the sound quality than their original signals. The shoulder resonator group demonstrated significantly better sound quality than the cheek group when reducing voice sound in cardiac auscultation signals. The suggested method has the potential to be used for the development of an electronic stethoscope with a robust noise removal function. Significant clinical benefits are expected from the expedited preliminary diagnostic procedure.


Subject(s)
Heart Auscultation , Signal Processing, Computer-Assisted , Stethoscopes , Humans , Heart Auscultation/instrumentation , Heart Auscultation/methods , Heart Auscultation/standards , Male , Female , Adult , Heart Sounds/physiology , Sound Spectrography , Equipment Design , Voice/physiology , Middle Aged , Voice Quality , Vibration , Noise
6.
Nat Commun ; 15(1): 4835, 2024 Jun 06.
Article in English | MEDLINE | ID: mdl-38844457

ABSTRACT

Humans produce two forms of cognitively complex vocalizations: speech and song. It is debated whether these differ based primarily on culturally specific, learned features, or if acoustical features can reliably distinguish them. We study the spectro-temporal modulation patterns of vocalizations produced by 369 people living in 21 urban, rural, and small-scale societies across six continents. Specific ranges of spectral and temporal modulations, overlapping within categories and across societies, significantly differentiate speech from song. Machine-learning classification shows that this effect is cross-culturally robust, vocalizations being reliably classified solely from their spectro-temporal features across all 21 societies. Listeners unfamiliar with the cultures classify these vocalizations using similar spectro-temporal cues as the machine learning algorithm. Finally, spectro-temporal features are better able to discriminate song from speech than a broad range of other acoustical variables, suggesting that spectro-temporal modulation-a key feature of auditory neuronal tuning-accounts for a fundamental difference between these categories.


Subject(s)
Machine Learning , Speech , Humans , Speech/physiology , Male , Female , Adult , Acoustics , Cross-Cultural Comparison , Auditory Perception/physiology , Sound Spectrography , Singing/physiology , Music , Middle Aged , Young Adult
7.
Physiol Behav ; 281: 114581, 2024 Jul 01.
Article in English | MEDLINE | ID: mdl-38734358

ABSTRACT

Bird song is a crucial feature for mate choice and reproduction. Song can potentially communicate information related to the quality of the mate, through song complexity, structure or finer changes in syllable characteristics. It has been shown in zebra finches that those characteristics can be affected by various factors including motivation, hormone levels or extreme temperature. However, although the literature on zebra finch song is substantial, some factors have been neglected. In this paper, we recorded male zebra finches in two breeding contexts (before and after pairing) and in two ambient temperature conditions (stable and variable) to see how those factors could influence song production. We found strong differences between the two breeding contexts: compared to their song before pairing, males that were paired had lower song rate, syllable consistency, frequency and entropy, while surprisingly the amplitude of their syllables increased. Temperature variability had an impact on the extent of these differences, but did not directly affect the song parameters that we measured. Our results describe for the first time how breeding status and temperature variability can affect zebra finch song, and give some new insights into the subtleties of the acoustic communication of this model species.


Subject(s)
Finches , Sexual Behavior, Animal , Temperature , Vocalization, Animal , Animals , Male , Finches/physiology , Vocalization, Animal/physiology , Sexual Behavior, Animal/physiology , Sound Spectrography , Female
8.
J Acoust Soc Am ; 155(5): 3071-3089, 2024 May 01.
Article in English | MEDLINE | ID: mdl-38717213

ABSTRACT

This study investigated how 40 Chinese learners of English as a foreign language (EFL learners) differed from 40 native English speakers in the production of four English tense-lax contrasts, /i-ɪ/, /u-ʊ/, /ɑ-ʌ/, and /æ-ε/, by examining the acoustic measurements of duration, the first three formant frequencies, and the slope of the first formant movement (F1 slope). The dynamic formant trajectory was modeled using discrete cosine transform coefficients to demonstrate the time-varying properties of formant trajectories. A discriminant analysis was employed to illustrate the extent to which Chinese EFL learners relied on different acoustic parameters. This study found that: (1) Chinese EFL learners overemphasized durational differences and weakened spectral differences for the /i-ɪ/, /u-ʊ/, and /ɑ-ʌ/ pairs, although they maintained sufficient spectral differences for /æ-ε/. In contrast, native English speakers predominantly used spectral differences across all four pairs; (2) in non-low tense-lax contrasts, unlike native English speakers, Chinese EFL learners failed to exhibit different F1 slope values, indicating a non-nativelike tongue-root placement during the articulatory process. The findings underscore the contribution of dynamic spectral patterns to the differentiation between English tense and lax vowels, and reveal the influence of precise articulatory gestures on the realization of the tense-lax contrast.


Subject(s)
Multilingualism , Phonetics , Speech Acoustics , Humans , Male , Female , Young Adult , Speech Production Measurement , Adult , Language , Acoustics , Learning , Voice Quality , Sound Spectrography , East Asian People
9.
Am J Speech Lang Pathol ; 33(4): 1952-1964, 2024 Jul 03.
Article in English | MEDLINE | ID: mdl-38809826

ABSTRACT

PURPOSE: The current study compared temporal and spectral acoustic contrast between vowel segments produced by speakers with dysarthria across three speech tasks-interactive, solo habitual, and solo clear. METHOD: Nine speakers with dysarthria secondary to amyotrophic lateral sclerosis participated in the study. Each speaker was paired with a typical interlocutor over videoconferencing software. The speakers produced the vowels /i, ɪ, ɛ, æ/ in /h/-vowel-/d/ words. For the solo tasks, speakers read the stimuli aloud in both their habitual and clear speaking styles. For the interactive task, speakers produced a target stimulus for their interlocutor to select among the four possibilities. We measured the duration difference between long and short vowels, as well as the F1/F2 Euclidean distance between adjacent vowels, and also determined how well the vowels could be classified based on their acoustic characteristics. RESULTS: Temporal contrast between long and short vowels was higher in the interactive task than in both solo tasks. Spectral distance between adjacent vowel pairs was also higher for some pairs in the interactive task than the habitual speech task. Finally, vowel classification accuracy was highest in the interactive task. CONCLUSIONS: Overall, we found evidence that individuals with dysarthria produced vowels with greater acoustic contrast in structured interactions than they did in solo tasks. Furthermore, the speech adjustments they made to the vowel segments differed from those observed in solo speech.


Subject(s)
Amyotrophic Lateral Sclerosis , Dysarthria , Phonetics , Speech Acoustics , Speech Production Measurement , Humans , Dysarthria/etiology , Dysarthria/physiopathology , Dysarthria/diagnosis , Male , Female , Middle Aged , Aged , Amyotrophic Lateral Sclerosis/complications , Amyotrophic Lateral Sclerosis/physiopathology , Speech Intelligibility , Voice Quality , Preliminary Data , Sound Spectrography , Time Factors , Aged, 80 and over , Acoustics
10.
PeerJ ; 12: e17320, 2024.
Article in English | MEDLINE | ID: mdl-38766489

ABSTRACT

Vocal complexity is central to many evolutionary hypotheses about animal communication. Yet, quantifying and comparing complexity remains a challenge, particularly when vocal types are highly graded. Male Bornean orangutans (Pongo pygmaeus wurmbii) produce complex and variable "long call" vocalizations comprising multiple sound types that vary within and among individuals. Previous studies described six distinct call (or pulse) types within these complex vocalizations, but none quantified their discreteness or the ability of human observers to reliably classify them. We studied the long calls of 13 individuals to: (1) evaluate and quantify the reliability of audio-visual classification by three well-trained observers, (2) distinguish among call types using supervised classification and unsupervised clustering, and (3) compare the performance of different feature sets. Using 46 acoustic features, we used machine learning (i.e., support vector machines, affinity propagation, and fuzzy c-means) to identify call types and assess their discreteness. We additionally used Uniform Manifold Approximation and Projection (UMAP) to visualize the separation of pulses using both extracted features and spectrogram representations. Supervised approaches showed low inter-observer reliability and poor classification accuracy, indicating that pulse types were not discrete. We propose an updated pulse classification approach that is highly reproducible across observers and exhibits strong classification accuracy using support vector machines. Although the low number of call types suggests long calls are fairly simple, the continuous gradation of sounds seems to greatly boost the complexity of this system. This work responds to calls for more quantitative research to define call types and quantify gradedness in animal vocal systems and highlights the need for a more comprehensive framework for studying vocal complexity vis-à-vis graded repertoires.


Subject(s)
Vocalization, Animal , Animals , Vocalization, Animal/physiology , Male , Pongo pygmaeus/physiology , Reproducibility of Results , Machine Learning , Acoustics , Sound Spectrography , Borneo
11.
PLoS One ; 19(5): e0300607, 2024.
Article in English | MEDLINE | ID: mdl-38787824

ABSTRACT

Listening to music is a crucial tool for relieving stress and promoting relaxation. However, the limited options available for stress-relief music do not cater to individual preferences, compromising its effectiveness. Traditional methods of curating stress-relief music rely heavily on measuring biological responses, which is time-consuming, expensive, and requires specialized measurement devices. In this paper, a deep learning approach to solve this problem is introduced that explicitly uses convolutional neural networks and provides a more efficient and economical method for generating large datasets of stress-relief music. These datasets are composed of Mel-scaled spectrograms that include essential sound elements (such as frequency, amplitude, and waveform) that can be directly extracted from the music. The trained model demonstrated a test accuracy of 98.7%, and a clinical study indicated that the model-selected music was as effective as researcher-verified music in terms of stress-relieving capacity. This paper underlines the transformative potential of deep learning in addressing the challenge of limited music options for stress relief. More importantly, the proposed method has profound implications for music therapy because it enables a more personalized approach to stress-relief music selection, offering the potential for enhanced emotional well-being.


Subject(s)
Music Therapy , Music , Neural Networks, Computer , Stress, Psychological , Humans , Music/psychology , Stress, Psychological/therapy , Music Therapy/methods , Deep Learning , Male , Female , Adult , Sound Spectrography/methods , Young Adult
12.
Stud Health Technol Inform ; 314: 151-152, 2024 May 23.
Article in English | MEDLINE | ID: mdl-38785022

ABSTRACT

This study proposes an innovative application of the Goertzel Algorithm (GA) for the processing of vocal signals in dysphonia evaluation. Compared to the Fast Fourier Transform (FFT) representing the gold standard analysis technique in this context, GA demonstrates higher efficiency in terms of processing time and memory usage, also showing an improved discrimination between healthy and pathological conditions. This suggests that GA-based approaches could enhance the reliability and efficiency of vocal signal analysis, thus supporting physicians in dysphonia research and clinical monitoring.


Subject(s)
Algorithms , Dysphonia , Humans , Dysphonia/diagnosis , Signal Processing, Computer-Assisted , Sound Spectrography/methods , Reproducibility of Results , Fourier Analysis , Female , Male
13.
Ter Arkh ; 96(3): 228-232, 2024 Apr 16.
Article in Russian | MEDLINE | ID: mdl-38713036

ABSTRACT

AIM: To evaluate the possibility of using spectral analysis of cough sounds in the diagnosis of a new coronavirus infection COVID-19. MATERIALS AND METHODS: Spectral toussophonobarography was performed in 218 patients with COVID-19 [48.56% men, 51.44% women, average age 40.2 (32.4; 51.0)], in 60 healthy individuals [50% men, 50% women, average age 41.7 (32.2; 53.0)] with induced cough (by inhalation of citric acid solution at a concentration of 20 g/l through a nebulizer). The recording was made using a contact microphone located on a special tripod at a distance of 15-20 cm from the face of the subject. The resulting recordings were processed in a computer program, after which spectral analysis of cough sounds was performed using Fourier transform algorithms. The following parameters of cough sounds were evaluated: the duration of the cough act (ms), the ratio of the energy of low frequencies (60-600 Hz) to the energy of high frequencies (600-6000 Hz), the frequency of the maximum energy of the cough sound (Hz). RESULTS: After statistical processing, it was found out that the parameters of the cough sound of COVID-19 patients differ from the cough of healthy individuals. The obtained data were substituted into the developed regression equation. Rounded to integers, the resulting number had the following interpretation: "0" - there is no COVID-19, "1" - there is COVID-19. CONCLUSION: The technique showed high levels of sensitivity and specificity. In addition, the method is characterized by sufficient ease of use and does not require expensive equipment, therefore it can be used in practice for timely diagnosis of COVID-19.


Subject(s)
COVID-19 , Cough , SARS-CoV-2 , Humans , Cough/diagnosis , Cough/etiology , Cough/physiopathology , COVID-19/diagnosis , Female , Male , Adult , Middle Aged , Sound Spectrography/methods
14.
Nat Commun ; 15(1): 3617, 2024 May 07.
Article in English | MEDLINE | ID: mdl-38714699

ABSTRACT

Sperm whales (Physeter macrocephalus) are highly social mammals that communicate using sequences of clicks called codas. While a subset of codas have been shown to encode information about caller identity, almost everything else about the sperm whale communication system, including its structure and information-carrying capacity, remains unknown. We show that codas exhibit contextual and combinatorial structure. First, we report previously undescribed features of codas that are sensitive to the conversational context in which they occur, and systematically controlled and imitated across whales. We call these rubato and ornamentation. Second, we show that codas form a combinatorial coding system in which rubato and ornamentation combine with two context-independent features we call rhythm and tempo to produce a large inventory of distinguishable codas. Sperm whale vocalisations are more expressive and structured than previously believed, and built from a repertoire comprising nearly an order of magnitude more distinguishable codas. These results show context-sensitive and combinatorial vocalisation can appear in organisms with divergent evolutionary lineage and vocal apparatus.


Subject(s)
Sperm Whale , Vocalization, Animal , Animals , Vocalization, Animal/physiology , Sperm Whale/physiology , Sperm Whale/anatomy & histology , Male , Female , Sound Spectrography
15.
J Acoust Soc Am ; 155(5): 3037-3050, 2024 May 01.
Article in English | MEDLINE | ID: mdl-38717209

ABSTRACT

The progress of fin whale study is hindered by the debate about whether the two typical type-A and type-B calls (characterized by central source frequencies of 17-20 Hz and 20-30 Hz, respectively) originate from a single fin whale or two individual fin whales. Here, hydroacoustic data is employed to study the type, vocal behavior, and temporal evolution of fin whale calls around the Southern Wake Island from 2010 to 2022. It is identified that (1) type-A and type-B calls come from two individuals based on the large source separation of the two calls through high-precision determination of source location; (2) type-A fin whales exhibit vocal influence on type-B fin whales, where type-B fin whales become paired with type-A calls and vocalize regularly when type-A fin whales appear, and type-A fin whales always lead the call sequences; and (3) some type-A fin whales stop calling when another type-A fin whale approaches at a distance of about 1.6 km. During 2010-2022, type-A calls occur every year, whereas type-B calls are prevalent only after November 2018. A culture transmission is proposed from type-A fin whales to type-B fin whales and/or a population increase of type-B fin whales in the region after November 2018.


Subject(s)
Acoustics , Fin Whale , Vocalization, Animal , Animals , Fin Whale/physiology , Sound Spectrography , Time Factors , Islands
16.
J Acoust Soc Am ; 155(4): 2724-2727, 2024 Apr 01.
Article in English | MEDLINE | ID: mdl-38656337

ABSTRACT

The auditory sensitivity of a small songbird, the red-cheeked cordon bleu, was measured using the standard methods of animal psychophysics. Hearing in cordon bleus is similar to other small passerines with best hearing in the frequency region from 2 to 4 kHz and sensitivity declining at the rate of about 10 dB/octave below 2 kHz and about 35 dB/octave as frequency increases from 4 to 9 kHz. While critical ratios are similar to other songbirds, the long-term average power spectrum of cordon bleu song falls above the frequency of best hearing in this species.


Subject(s)
Acoustic Stimulation , Auditory Threshold , Hearing , Songbirds , Vocalization, Animal , Animals , Vocalization, Animal/physiology , Hearing/physiology , Songbirds/physiology , Male , Psychoacoustics , Sound Spectrography , Female
17.
J Acoust Soc Am ; 155(4): 2627-2635, 2024 Apr 01.
Article in English | MEDLINE | ID: mdl-38629884

ABSTRACT

Passive acoustic monitoring (PAM) is an optimal method for detecting and monitoring cetaceans as they frequently produce sound while underwater. Cue counting, counting acoustic cues of deep-diving cetaceans instead of animals, is an alternative method for density estimation, but requires an average cue production rate to convert cue density to animal density. Limited information about click rates exists for sperm whales in the central North Pacific Ocean. In the absence of acoustic tag data, we used towed hydrophone array data to calculate the first sperm whale click rates from this region and examined their variability based on click type, location, distance of whales from the array, and group size estimated by visual observers. Our findings show click type to be the most important variable, with groups that include codas yielding the highest click rates. We also found a positive relationship between group size and click detection rates that may be useful for acoustic predictions of group size in future studies. Echolocation clicks detected using PAM methods are often the only indicator of deep-diving cetacean presence. Understanding the factors affecting their click rates provides important information for acoustic density estimation.


Subject(s)
Echolocation , Sperm Whale , Animals , Vocalization, Animal , Acoustics , Whales , Sound Spectrography
18.
PLoS One ; 19(4): e0299250, 2024.
Article in English | MEDLINE | ID: mdl-38635752

ABSTRACT

Passive acoustic monitoring has improved our understanding of vocalizing organisms in remote habitats and during all weather conditions. Many vocally active species are highly mobile, and their populations overlap. However, distinct vocalizations allow the tracking and discrimination of individuals or populations. Using signature whistles, the individually distinct calls of bottlenose dolphins, we calculated a minimum abundance of individuals, characterized and compared signature whistles from five locations, and determined reoccurrences of individuals throughout the Mid-Atlantic Bight and Chesapeake Bay, USA. We identified 1,888 signature whistles in which the duration, number of extrema, start, end, and minimum frequencies of signature whistles varied significantly by site. All characteristics of signature whistles were deemed important for determining from which site the whistle originated and due to the distinct signature whistle characteristics and lack of spatial mixing of the dolphins detected at the Offshore site, we suspect that these dolphins are of a different population than those at the Coastal and Bay sites. Signature whistles were also found to be shorter when sound levels were higher. Using only the passively recorded vocalizations of this marine top predator, we obtained information about its population and how it is affected by ambient sound levels, which will increase as offshore wind energy is developed. In this rapidly developing area, these calls offer critical management insights for this protected species.


Subject(s)
Bottle-Nosed Dolphin , Vocalization, Animal , Animals , Sound Spectrography , Ecosystem
19.
J Acoust Soc Am ; 155(4): 2803-2816, 2024 Apr 01.
Article in English | MEDLINE | ID: mdl-38662608

ABSTRACT

Urban expansion has increased pollution, including both physical (e.g., exhaust, litter) and sensory (e.g., anthropogenic noise) components. Urban avian species tend to increase the frequency and/or amplitude of songs to reduce masking by low-frequency noise. Nevertheless, song propagation to the receiver can also be constrained by the environment. We know relatively little about how this propagation may be altered across species that (1) vary in song complexity and (2) inhabit areas along an urbanization gradient. We investigated differences in song amplitude, attenuation, and active space, or the maximum distance a receiver can detect a signal, in two human-commensal species: the house sparrow (Passer domesticus) and house finch (Haemorhous mexicanus). We described urbanization both discretely and quantitatively to investigate the habitat characteristics most responsible for propagation changes. We found mixed support for our hypothesis of urban-specific degradation of songs. Urban songs propagated with higher amplitude; however, urban song fidelity was species-specific and showed lowered active space for urban house finch songs. Taken together, our results suggest that urban environments may constrain the propagation of vocal signals in species-specific manners. Ultimately, this has implications for the ability of urban birds to communicate with potential mates or kin.


Subject(s)
Finches , Species Specificity , Urbanization , Vocalization, Animal , Animals , Vocalization, Animal/physiology , Finches/physiology , Sparrows/physiology , Noise , Sound Spectrography , Ecosystem , Humans , Perceptual Masking/physiology , Male
20.
J Acoust Soc Am ; 155(3): 2050-2064, 2024 Mar 01.
Article in English | MEDLINE | ID: mdl-38477612

ABSTRACT

The study of humpback whale song using passive acoustic monitoring devices requires bioacousticians to manually review hours of audio recordings to annotate the signals. To vastly reduce the time of manual annotation through automation, a machine learning model was developed. Convolutional neural networks have made major advances in the previous decade, leading to a wide range of applications, including the detection of frequency modulated vocalizations by cetaceans. A large dataset of over 60 000 audio segments of 4 s length is collected from the North Atlantic and used to fine-tune an existing model for humpback whale song detection in the North Pacific (see Allen, Harvey, Harrell, Jansen, Merkens, Wall, Cattiau, and Oleson (2021). Front. Mar. Sci. 8, 607321). Furthermore, different data augmentation techniques (time-shift, noise augmentation, and masking) are used to artificially increase the variability within the training set. Retraining and augmentation yield F-score values of 0.88 on context window basis and 0.89 on hourly basis with false positive rates of 0.05 on context window basis and 0.01 on hourly basis. If necessary, usage and retraining of the existing model is made convenient by a framework (AcoDet, acoustic detector) built during this project. Combining the tools provided by this framework could save researchers hours of manual annotation time and, thus, accelerate their research.


Subject(s)
Humpback Whale , Animals , Vocalization, Animal , Sound Spectrography , Time Factors , Seasons , Acoustics
SELECTION OF CITATIONS
SEARCH DETAIL