RESUMEN
We reflect on 30 years of the journal Evolutionary Computation. Taking the papers published in the first volume in 1993 as a springboard, as the founding and current Editors-in-Chief, we comment on the beginnings of the field, evaluate the extent to which the field has both grown and itself evolved, and provide our own perpectives on where the future lies.
Asunto(s)
Evolución Biológica , Políticas EditorialesRESUMEN
English [É«] exhibits retracted tongue dorsum and low F2 frequencies compared to Korean [l], but is frequently asserted to be perceptually similar to Korean [l] and therefore difficult for Korean learners to acquire due to articulatory transfer. This study examines the articulatory and acoustic characteristics of Korean and English word-final laterals produced by Korean learners. Korean learners' productions of English [É«] were systematically different from Korean [l], with retracted tongue dorsum and low F2 similar to L1 English [É«]. The findings suggest Korean learners form a distinct phonetic category for English [É«] rather than modifying an existing Korean category.
RESUMEN
Many real-world problems involve massive amounts of data. Under these circumstances learning algorithms often become prohibitively expensive, making scalability a pressing issue to be addressed. A common approach is to perform sampling to reduce the size of the dataset and enable efficient learning. Alternatively, one customizes learning algorithms to achieve scalability. In either case, the key challenge is to obtain algorithmic efficiency without compromising the quality of the results. In this article we discuss a meta-learning algorithm (PSBML) that combines concepts from spatially structured evolutionary algorithms (SSEAs) with concepts from ensemble and boosting methodologies to achieve the desired scalability property. We present both theoretical and empirical analyses which show that PSBML preserves a critical property of boosting, specifically, convergence to a distribution centered around the margin. We then present additional empirical analyses showing that this meta-level algorithm provides a general and effective framework that can be used in combination with a variety of learning classifiers. We perform extensive experiments to investigate the trade-off achieved between scalability and accuracy, and robustness to noise, on both synthetic and real-world data. These empirical results corroborate our theoretical analysis, and demonstrate the potential of PSBML in achieving scalability without sacrificing accuracy.
Asunto(s)
Algoritmos , Inteligencia Artificial , Simulación por Computador , Modelos Teóricos , Bases de Datos Factuales , HumanosRESUMEN
BACKGROUND: Structural excursions of a protein at equilibrium are key to biomolecular recognition and function modulation. Protein modeling research is driven by the need to aid wet laboratories in characterizing equilibrium protein dynamics. In principle, structural excursions of a protein can be directly observed via simulation of its dynamics, but the disparate temporal scales involved in such excursions make this approach computationally impractical. On the other hand, an informative representation of the structure space available to a protein at equilibrium can be obtained efficiently via stochastic optimization, but this approach does not directly yield information on equilibrium dynamics. METHODS: We present here a novel methodology that first builds a multi-dimensional map of the energy landscape that underlies the structure space of a given protein and then queries the computed map for energetically-feasible excursions between structures of interest. An evolutionary algorithm builds such maps with a practical computational budget. Graphical techniques analyze a computed multi-dimensional map and expose interesting features of an energy landscape, such as basins and barriers. A path searching algorithm then queries a nearest-neighbor graph representation of a computed map for energetically-feasible basin-to-basin excursions. RESULTS: Evaluation is conducted on intrinsically-dynamic proteins of importance in human biology and disease. Visual statistical analysis of the maps of energy landscapes computed by the proposed methodology reveals features already captured in the wet laboratory, as well as new features indicative of interesting, unknown thermodynamically-stable and semi-stable regions of the equilibrium structure space. Comparison of maps and structural excursions computed by the proposed methodology on sequence variants of a protein sheds light on the role of equilibrium structure and dynamics in the sequence-function relationship. CONCLUSIONS: Applications show that the proposed methodology is effective at locating basins in complex energy landscapes and computing basin-basin excursions of a protein with a practical computational budget. While the actual temporal scales spanned by a structural excursion cannot be directly obtained due to the foregoing of simulation of dynamics, hypotheses can be formulated regarding the impact of sequence mutations on protein function. These hypotheses are valuable in instigating further research in wet laboratories.
Asunto(s)
Biología Computacional/métodos , Conformación Proteica , Proteínas/química , Algoritmos , Análisis por Conglomerados , Humanos , Modelos Moleculares , TermodinámicaRESUMEN
Phonological feature structure is inherently multidimensional, and decades' worth of research in acoustic phonetics has documented both the complex mappings between features and associated acoustic cues as well as the prosodic modulation of these mappings. Most previous studies have focused on how the mean values of acoustic cues vary in complex ways across multiple phonological dimensions, relying on strong assumptions of statistical independence and/or homogeneity of variance across acoustic measures. The present study probes these assumptions by exploring the mapping between phonological voicing, place, and manner features and 8 acoustic cues from tokens of 14 English consonants produced in onset and coda position. Multivariate linear models exhibiting a variety of feature-cue mappings and between-cue statistical relationships were fit to this corpus of acoustic data. Model comparisons indicate that the best statistical description of the data requires pervasive interactions between features with respect to both the locations and the shapes of phonological categories. The implications of these results for work on the production and perception of phonological contrasts is discussed.
RESUMEN
Radiological incidental findings (IFs) are previously undetected abnormalities which are unrelated to the original indication for imaging and are unexpectedly discovered. In brain magnetic resonance imaging (MRI), the prevalence of IFs is increasing. By reviewing the literature on IFs in brain MRI performed for research purposes and discussing ethical considerations of IFs, this paper provides an overview of brain IF research results and factors contributing to inconsistencies and considers how the consent process can be improved from an ethical perspective. We found that despite extensive literature regarding IFs in research MRI of the brain, there are major inconsistencies in the reported prevalence, ranging from 1.3% to 99%. Many factors appear to contribute to this broad range: lack of standardised definition, participant demographics variance, heterogenous MRI scanner strength and sequences, reporter variation and results classification. We also found significant discrepancies in the review, consent and clinical communication processes pertaining to the ethical nature of these studies. These findings have implications for future studies, particularly those involving artificial intelligence. Further research, particularly in relation to MRI brain IFs would be useful to explore the generalisability of study results.
RESUMEN
We have long known that characterizing protein structures structure is key to understanding protein function. Computational approaches have largely addressed a narrow formulation of the problem, seeking to compute one native structure from an amino-acid sequence. Now AlphaFold2 is shown to be able to reveal a high-quality native structure for many proteins. However, researchers over the years have argued for broadening our view to account for the multiplicity of native structures. We now know that many protein molecules switch between different structures to regulate interactions with molecular partners in the cell. Elucidating such structures de novo is exceptionally difficult, as it requires exploration of possibly a very large structure space in search of competing, near-optimal structures. Here we report on a novel stochastic optimization method capable of revealing very different structures for a given protein from knowledge of its amino-acid sequence. The method leverages evolutionary search techniques and adapts its exploration of the search space to balance between exploration and exploitation in the presence of a computational budget. In addition to demonstrating the utility of this method for identifying multiple native structures, we additionally provide a benchmark dataset for researchers to continue work on this problem.
RESUMEN
In hearing children, reading skills have been found to be closely related to phonological awareness. We used several standardized tests to investigate the reading and phonological awareness skills of 27 deaf school-age children who were experienced cochlear implant users. Approximately two-thirds of the children performed at or above the level of their hearing peers on the phonological awareness and reading tasks. Reading scores were found to be strongly correlated with measures of phonological awareness. These correlations remained the same when we statistically controlled for potentially confounding demographic variables such as age at testing and speech perception skills. However, these correlations decreased even after we statistically controlled for vocabulary size. This finding suggests that lexicon size is a mediating factor in the relationship between the children's phonological awareness and reading skills, a finding that has also been reported for typically developing hearing children.
Asunto(s)
Implantes Cocleares , Cognición , Lingüística , Lectura , Vocabulario , Adolescente , Niño , Escolaridad , Humanos , Masculino , Personas con Deficiencia Auditiva/rehabilitaciónRESUMEN
This paper explores the relationship between speaker normalization and dialectal identity in sociolinguistic data, examining a database of vowel formants collected from 88 monolingual American English speakers in Michigan's Upper Peninsula. Audio recordings of Finnish- and Italian-heritage American English speakers reading a passage and a word list were normalized using two normalization procedures. These algorithms are based on different concepts of normalization: Lobanov, which models normalization as based on experience with individual talkers, and Labov ANAE, which models normalization as based on experience with scale-factors inherent in acoustic resonators of all kinds. The two procedures yielded different results; while the Labov ANAE method reveals a cluster shifting of low and back vowels that correlated with heritage, the Lobanov procedure seems to eliminate this sociolinguistic variation. The difference between the two procedures lies in how they treat relations between formant changes, suggesting that dimensions of variation in the vowel space may be treated differently by different normalization procedures, raising the question of how anatomical variation and dialectal variation interact in the real world. The structure of the sociolinguistic effects found with the Labov ANAE normalized data, but not in the Lobanov normalized data, suggest that the Lobanov normalization does over-normalize formant measures and remove sociolinguistically relevant information.
Asunto(s)
Lenguaje , Fonética , Conducta Social , Habla/fisiología , Conducta Verbal , Algoritmos , Humanos , Psicolingüística , Lectura , Acústica del LenguajeRESUMEN
BACKGROUND: Post-operative imaging aims to assess fracture reduction and fixation with better resolution than intraoperative fluoroscopy (IF). However, this routine practice may increase costs and delay the discharge of patients. The aim of this study is to assess the role of post-operative imaging in identifying patients that require a return to theatre following the use of IF. METHODS: A retrospective cohort study was conducted in a single health network comprising of two hospitals over 1 year. All fracture fixations that required IF were included. Patients who had post-operative imaging were identified and complications requiring a return to theatre were obtained. Non-trauma patients and those who did not have IF were excluded. RESULTS: A total of 1319 patients had IF. Of these patients, 1131 patients had post-operative radiographs within 7 days of their operation. In total, 12 patients (1.1%) returned to theatre as a result of a finding identified in their post-operative imaging. The calculated number of X-rays required to be taken to identify a complication was 94. The main reasons identified for these cases to require a return to theatre despite having had IF included: (i) insufficient quality/views of IF, (ii) loss of position/new injury occurring in post-operative period and (iii) poor reduction/fixation demonstrated intraoperatively that was missed/accepted. CONCLUSION: The use of post-operative radiographs can identify significant complications despite the use of IF in trauma patients. However, further consideration needs to be made regarding the benefits and costs of this practice in evaluating its clinical effectiveness.
Asunto(s)
Fijación Interna de Fracturas , Fijación de Fractura , Fluoroscopía , Humanos , Estudios Retrospectivos , Rayos XRESUMEN
Because they consist, in large part, of random turbulent noise, fricatives present a challenge to attempts to specify the phonetic correlates of phonological features. Previous research has focused on temporal properties, acoustic power, and a variety of spectral properties of fricatives in a number of contexts [Jongman et al., J. Acoust. Soc. Am. 108, 1252-1263 (2000); Jesus and Shadle, J. Phonet. 30, 437-467 (2002); Crystal and House, J. Acoust. Soc. Am. 83, 1553-1573 (1988a)]. However, no systematic investigation of the effects of focus and prosodic context on fricative production has been carried out. Manipulation of explicit focus can serve to selectively exaggerate linguistically relevant properties of speech in much the same manner as stress [de Jong, J. Acoust. Soc. Am. 97, 491-504 (1995); de Jong, J. Phonet. 32, 493-516 (2004); de Jong and Zawaydeh, J. Phonet. 30, 53-75 (2002)]. This experimental technique was exploited to investigate acoustic power along with temporal and spectral characteristics of American English fricatives in two prosodic contexts, to probe whether native speakers selectively attend to subsegmental features, and to consider variability in fricative production across speakers. While focus in general increased noise power and duration, speakers did not selectively enhance spectral features of the target fricatives.
Asunto(s)
Fonética , Acústica del Lenguaje , Habla , Humanos , Lenguaje , Actividad Motora , Espectrografía del Sonido , Pruebas de Articulación del Habla , Percepción del HablaRESUMEN
Stochastic search is often the only viable option to address complex optimization problems. Recently, evolutionary algorithms have been shown to handle challenging continuous optimization problems related to protein structure modeling. Building on recent work in our laboratories, we propose an evolutionary algorithm for efficiently mapping the multi-basin energy landscapes of dynamic proteins that switch between thermodynamically stable or semi-stable structural states to regulate their biological activity in the cell. The proposed algorithm balances computational resources between exploration and exploitation of the nonlinear, multimodal landscapes that characterize multi-state proteins via a novel combination of global and local search to generate a dynamically-updated, information-rich map of a protein's energy landscape. This new mapping-oriented EA is applied to several dynamic proteins and their disease-implicated variants to illustrate its ability to map complex energy landscapes in a computationally feasible manner. We further show that, given the availability of such maps, comparison between the maps of wildtype and variants of a protein allows for the formulation of a structural and thermodynamic basis for the impact of sequence mutations on dysfunction that may prove useful in guiding further wet-laboratory investigations of dysfunction and molecular interventions.
Asunto(s)
Algoritmos , Biología Computacional/métodos , Conformación Proteica , Proteínas/química , Proteínas/genética , Humanos , Modelos Moleculares , TermodinámicaRESUMEN
The diversity of intrinsic dynamics observed in neurons may enhance the computations implemented in the circuit by enriching network-level emergent properties such as synchronization and phase locking. Large-scale spiking network models of entire brain regions offer a platform to test theories of neural computation and cognitive function, providing useful insights on information processing in the nervous system. However, a systematic in-depth investigation requires network simulations to capture the biological intrinsic diversity of individual neurons at a sufficient level of accuracy. The computationally efficient Izhikevich model can reproduce a wide range of neuronal behaviors qualitatively. Previous studies using optimization techniques, however, were less successful in quantitatively matching experimentally recorded voltage traces. In this article, we present an automated pipeline based on evolutionary algorithms to quantitatively reproduce features of various classes of neuronal spike patterns using the Izhikevich model. Employing experimental data from Hippocampome.org, a comprehensive knowledgebase of neuron types in the rodent hippocampus, we demonstrate that our approach reliably fit Izhikevich models to nine distinct classes of experimentally recorded spike patterns, including delayed spiking, spiking with adaptation, stuttering, and bursting. Importantly, by leveraging the parameter-exploration capabilities of evolutionary algorithms, and by representing qualitative spike pattern class definitions in the error landscape, our approach creates several suitable models for each neuron type, exhibiting appropriate feature variabilities among neurons. Moreover, we demonstrate the flexibility of our methodology by creating multi-compartment Izhikevich models for each neuron type in addition to single-point versions. Although the results presented here focus on hippocampal neuron types, the same strategy is broadly applicable to any neural systems.
RESUMEN
Examining phonetic categorization in multidimensional stimulus spaces poses a number of practical problems. The traditional method of forced identification becomes prohibitive when the number and size of stimulus dimensions becomes increasingly large. In response, Evans and Iverson [J. Acoust. Soc. Am. 115, 352-361 (2004)] proposed an adaptive tracking algorithm for finding vowel best exemplars in a multidimensional space. This algorithm converged on best exemplars in a small number of trials; however, the search method was designed explicitly for vowel stimuli. In this paper, a more general multidimensional search algorithm is described, and results from simulations and experiments using the proposed algorithm are presented.
RESUMEN
Stetson (1951) noted that repeating singleton coda consonants at fast speech rates makes them be perceived as onset consonants affiliated with a following vowel. The current study documents the perception of rate-induced resyllabification, as well as what temporal properties give rise to the perception of syllable affiliation. Stimuli were extracted from a previous study of repeated stop + vowel and vowel + stop syllables (de Jong, 2001a, 2001b). Forced-choice identification tasks show that slow repetitions are clearly distinguished. As speakers increase rate, they reach a point after which listeners disagree as to the affiliation of the stop. This pattern is found for voiced and voiceless consonants using different stimulus extraction techniques. Acoustic models of the identifications indicate that the sudden shift in syllabification occurs with the loss of an acoustic hiatus between successive syllables. Acoustic models of the fast rate identifications indicate various other qualities, such as consonant voicing, affect the probability that the consonants will be perceived as onsets. These results indicate a model of syllabic affiliation where specific juncture-marking aspects of the signal dominate parsing, and in their absence other differences provide additional, weaker cues to syllabic affiliation.
Asunto(s)
Percepción del Habla/fisiología , Habla/fisiología , Adulto , Femenino , Humanos , Modelos Logísticos , Masculino , Fonética , Acústica del Lenguaje , Pruebas de Discriminación del Habla , Factores de TiempoRESUMEN
BACKGROUND: Many open problems in bioinformatics involve elucidating underlying functional signals in biological sequences. DNA sequences, in particular, are characterized by rich architectures in which functional signals are increasingly found to combine local and distal interactions at the nucleotide level. Problems of interest include detection of regulatory regions, splice sites, exons, hypersensitive sites, and more. These problems naturally lend themselves to formulation as classification problems in machine learning. When classification is based on features extracted from the sequences under investigation, success is critically dependent on the chosen set of features. METHODOLOGY: We present an algorithmic framework (EFFECT) for automated detection of functional signals in biological sequences. We focus here on classification problems involving DNA sequences which state-of-the-art work in machine learning shows to be challenging and involve complex combinations of local and distal features. EFFECT uses a two-stage process to first construct a set of candidate sequence-based features and then select a most effective subset for the classification task at hand. Both stages make heavy use of evolutionary algorithms to efficiently guide the search towards informative features capable of discriminating between sequences that contain a particular functional signal and those that do not. RESULTS: To demonstrate its generality, EFFECT is applied to three separate problems of importance in DNA research: the recognition of hypersensitive sites, splice sites, and ALU sites. Comparisons with state-of-the-art algorithms show that the framework is both general and powerful. In addition, a detailed analysis of the constructed features shows that they contain valuable biological information about DNA architecture, allowing biologists and other researchers to directly inspect the features and potentially use the insights obtained to assist wet-laboratory studies on retainment or modification of a specific signal. Code, documentation, and all data for the applications presented here are provided for the community at http://www.cs.gmu.edu/~ashehu/?q=OurTools.
Asunto(s)
Elementos Alu/genética , Biología Computacional/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Sitios de Empalme de ARN/genética , Algoritmos , Secuencia de Bases , ADN/genética , Secuencias Reguladoras de Ácidos Nucleicos , Análisis de Secuencia de ADN , Transducción de Señal/genéticaRESUMEN
Associating functional information with biological sequences remains a challenge for machine learning methods. The performance of these methods often depends on deriving predictive features from the sequences sought to be classified. Feature generation is a difficult problem, as the connection between the sequence features and the sought property is not known a priori. It is often the task of domain experts or exhaustive feature enumeration techniques to generate a few features whose predictive power is then tested in the context of classification. This paper proposes an evolutionary algorithm to effectively explore a large feature space and generate predictive features from sequence data. The effectiveness of the algorithm is demonstrated on an important component of the gene-finding problem, DNA splice site prediction. This application is chosen due to the complexity of the features needed to obtain high classification accuracy and precision. Our results test the effectiveness of the obtained features in the context of classification by Support Vector Machines and show significant improvement in accuracy and precision over state-of-the-art approaches.
Asunto(s)
Algoritmos , Biología Computacional/métodos , ADN/química , Análisis de Secuencia de ADN/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Empalme del ARNRESUMEN
Hypersensitive (HS) sites in genomic sequences are reliable markers of DNA regulatory regions that control gene expression. Annotation of regulatory regions is important in understanding phenotypical differences among cells and diseases linked to pathologies in protein expression. Several computational techniques are devoted to mapping out regulatory regions in DNA by initially identifying HS sequences. Statistical learning techniques like Support Vector Machines (SVM), for instance, are employed to classify DNA sequences as HS or non-HS. This paper proposes a method to automate the basic steps in designing an SVM that improves the accuracy of such classification. The method proceeds in two stages and makes use of evolutionary algorithms. An evolutionary algorithm first designs optimal sequence motifs to associate explicit discriminating feature vectors with input DNA sequences. A second evolutionary algorithm then designs SVM kernel functions and parameters that optimally separate the HS and non-HS classes. Results show that this two-stage method significantly improves SVM classification accuracy. The method promises to be generally useful in automating the analysis of biological sequences, and we post its source code on our website.
Asunto(s)
Algoritmos , ADN/genética , Análisis de Secuencia de ADN/estadística & datos numéricos , Inteligencia Artificial , Biología Computacional , ADN/química , ADN/clasificación , Desoxirribonucleasa I , Evolución Molecular , Modelos Genéticos , Elementos Reguladores de la Transcripción , Programas InformáticosRESUMEN
Kingston, Diehl, Kirk, and Castleman (Journal of Phonetics, 2008) present a sophisticated experimental design and detection theoretic analysis of the internal auditory structure of phonological contrasts. However, a potentially important aspect of multidimensional detection theory - the covariance structure of assumed underlying multivariate Gaussian perceptual densities - was left unexplored. We discuss Kingston, et al.'s approach in the context of a general definition of multidimensional d' and present a description of two distinct configurations of perceptual densities requiring fundamentally different interpretations that account equally well for the "mean-shift integrality" results reported by Kingston, et al. We end with a brief discussion of approaches to distinguishing these underlying configurations empirically.
RESUMEN
The perception of voicing categories is affected by speaking rate, so that listeners' category boundaries on a VOT continuum shift to a lower value when syllable duration decreases [Miller and Volaitis, Percept. Psychophys. 46, 505-512 (1989); Volaitis and Miller, J. Acoust. Soc. Am. 92, 723-735 (1992)]. Previous rate normalization effects have been found using artificially varied stimuli. This study examines the effect of speech rate on voicing categorization in naturally produced rate-varied speech. The stimuli contained natural decreases in VOT with faster speech rates so that VOT values for /b/ and /p/ overlapped at the fastest rates. Consonant identification results showed that the rate effects on the perceptual boundary between /p/ and /b/ very closely matched the effects of rate on the productions, though there was a small mismatch with fast rate productions whereby voiced stops were systematically miscategorized as voiceless. Another group of listeners judged the goodness of the consonant, indicating that best exemplars were rate-varied and shifted away from the /p/-/b/ boundary. These results are discussed in light of exemplar-based and abstractionist models of speech perception.