RESUMEN
The current study aims to test whether faster recognition memory errors tend to result from stronger misleading retrieval, making them harder to correct in subsequent decisions than slower errors, and whether this pattern holds for both miss and false-alarm errors. We used a paradigm in which each single-item Old/New recognition decision was followed by a two-alternative forced-choice (2AFC) test between a target and a lure. Each 2AFC trial had one item that had just been tested for an Old/New judgment and one item that had not been previously tested. Across 183 participants, the RTs for single-item recognition errors were used to predict accuracy in the 2AFC test using a hierarchical logistic regression model. The results showed a relationship between error RT and subsequent 2AFC accuracy that was qualified by an interaction with error type. Slower miss responses were more likely to be corrected than faster misses, but no accuracy differences were observed between slower and faster false alarms. The implications of these findings are discussed as they relate to assumptions about memory processes underlying inaccurate retrieval, using the diffusion model and the two-high-threshold model as examples of accounts that explain errors in terms of misleading retrieval and failed retrieval, respectively.
Asunto(s)
Memoria , Reconocimiento en Psicología , Humanos , Reconocimiento en Psicología/fisiología , JuicioRESUMEN
People often express high confidence for misremembered sources. Starns and Ksander ([2016]. Item strength influences source confidence and alters source memory zROC slopes. Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(3), 351-365; hereafter SK16) found that this happens more often when a person is highly confident in memory for the item itself, and that simply increasing item memory can increase high-confidence source errors. Under the decision heuristic account, this pattern emerges because strong item memories contaminate source judgments by promoting high confidence responses even when source evidence is relatively weak. Consequently, strengthening item memory is predicted to increase confidence for both correct and incorrect source responses; however, SK16 could not assess this key prediction because their item-strength manipulation also impaired source memory. We report two experiments with new item-strengthening manipulations designed to minimise source memory impairments. Results replicated the evidence for the decision heuristic account reported by SK16 and provided additional support by showing a boost in source confidence for both correct and error responses when item memory was strengthened without accompanying source impairments .
Asunto(s)
Ilusiones , Reconocimiento en Psicología , Humanos , Juicio/fisiología , Aprendizaje , Reconocimiento en Psicología/fisiologíaRESUMEN
In a standard eyewitness lineup scenario, a witness observes a culprit commit a crime and is later asked to identify the culprit from a set of faces, the lineup. Signal detection theory (SDT), a powerful modeling framework for analyzing data, has recently become a common way to analyze lineup data. The goal of this paper is to introduce a new R package, sdtlu (Signal Detection Theory - LineUp), that streamlines and automates the SDT analysis of lineup data. sdtlu provides functions to process lineup data, determine the best-fitting SDT parameters, compute model-based performance measures such as area under the curve (AUC) and diagnosticity, use bootstrapping to determine uncertainty intervals around these parameters and measures, and compare parameters across two different data sets. The package incorporates closed-form solutions for both simultaneous and sequential lineups that allow for model-based analyses without Monte Carlo simulation. Show-ups are also supported. The package can estimate the base-rate of lineups that include a guilty suspect when the guilt or innocence of each suspect in the data set is unknown, as in "real-world" lineups. The package can also produce a full set of graphs, including data and model-based ROC curves and the underlying SDT model.
Asunto(s)
Reconocimiento en Psicología , Detección de Señal Psicológica , Crimen , Humanos , Recuerdo Mental , Curva ROCRESUMEN
In this report, we evaluate single-item and forced-choice recognition memory for the same items and use the resulting accuracy and reaction time data to test the predictions of discrete-state and continuous models. For the single-item trials, participants saw a word and indicated whether or not it was studied on a previous list. The forced-choice trials had one studied and one non-studied word that both appeared in the earlier single-item trials and both received the same response. Thus, forced-choice trials always had one word with a previous correct response and one with a previous error. Participants were asked to select the studied word regardless of whether they previously called both words "studied" or "not studied." The diffusion model predicts that forced-choice accuracy should be lower when the word with a previous error had a fast versus a slow single-item RT, because fast errors are associated with more compelling misleading memory retrieval. The two-high-threshold (2HT) model does not share this prediction because all errors are guesses, so error RT is not related to memory strength. A low-threshold version of the discrete state approach predicts an effect similar to the diffusion model, because errors are a mixture of responses based on misleading retrieval and guesses, and the guesses should tend to be slower. Results showed that faster single-trial errors were associated with lower forced-choice accuracy, as predicted by the diffusion and low-threshold models.
Asunto(s)
Conducta de Elección/fisiología , Modelos Psicológicos , Desempeño Psicomotor/fisiología , Tiempo de Reacción/fisiología , Reconocimiento en Psicología/fisiología , Adulto , Humanos , Adulto JovenRESUMEN
In three experiments we explored cross-dimensional cuing effects in a multidimensional source encoding and retrieval paradigm. We employed a bias-controlled experimental method of source cuing at retrieval (Starns & Hicks, 2013) in an attempt to improve retrieval of location information indirectly by cuing gender information. Encoded words were situated on the left or right side of a computer monitor and associated with either a male or a female face. When multiple faces were used across the set of encoded words, reinstating the correct face at retrieval alongside an incorrect, opposite-gender face cue improved male/female source decisions for test words. However, this powerful test cue did not improve memory for the encoded location of the words, suggesting that within-dimension cuing does not produce cross-dimensional cuing. This null outcome was found when gender decisions were required (Experiments 1A and 2) or not required (Experiment 1B) prior to location decisions. Nor was cross-dimension cuing found when subjects were told to expect a source test of both gender and location information at retrieval (Experiment 2). Our findings reinforce prior work demonstrating that multiple context dimensions can be bound to item information without any direct binding between the contexts.
Asunto(s)
Señales (Psicología) , Reconocimiento Facial/fisiología , Recuerdo Mental/fisiología , Memoria Espacial/fisiología , Adulto , Humanos , Factores Sexuales , Adulto JovenRESUMEN
In recognition memory, participants often fail to shift their response criterion within a test even when they see cues signaling whether they should expect weak or strong memory (e.g., Stretch & Wixted Journal of Experimental Psychology: Learning, Memory, and Cognition, 24, 1397-1410, 1998b). We contrasted two possible accounts for this failure to shift. The first assumes that shifting the criterion requires effortful processing, so participants are unwilling to make shifts even when they expect different levels of strength. The second assumes that participants are unwilling to decide which strength category is indicated by the cue for each trial, so their expectations for memory strength do not change across trials. Targets appeared in different test formats ("cues") depending on whether they were studied once (weak) or five times (strong), and lures were evenly divided between the two formats. Some participants had two response keys for "old" and "new" (2-key), and others had to use different keys to respond "old" for the two strength cues (3-key). The goal of the 3-key condition was to force participants to decide which strength cue was presented on each trial. The 3-key participants had a lower false alarm rate for lures shown with the strong than with the weak cue, but the 2-key participants showed no evidence of a criterion shift. Response times were unaffected by trial-by-trial criterion shifts. We conclude that participants willingly shift their response criterion on the basis of changes in expected strength, but they are unwilling to decide which strength to expect unless they are compelled to do this by other aspects of the task.
Asunto(s)
Señales (Psicología) , Desempeño Psicomotor/fisiología , Reconocimiento en Psicología/fisiología , Adulto , Femenino , Humanos , Masculino , Adulto JovenRESUMEN
Receiver operating characteristic (ROC) functions are often used to make inferences about memory processes, such as claiming that memory strength is more variable for studied versus nonstudied items. However, decision processes can produce the ROC patterns that are usually attributed to memory, so independent forms of data are needed to support strong conclusions. The present experiments tested ROC-based claims about the variability of memory evidence by modeling response time (RT) data with the diffusion model. To ensure that the model can correctly discriminate equal- and unequal-variance distributions, Experiment 1 used a numerousity discrimination task that had a direct manipulation of evidence variability. Fits of the model produced correct conclusions about evidence variability in all cases. Experiments 2 and 3 explored the effect of repeated learning trials on evidence variability in recognition and source memory tasks, respectively. Fits of the diffusion model supported the same conclusions about variability as the ROC literature. For recognition, evidence variability was higher for targets than for lures, but it did not differ on the basis of the number of learning trials for target items. For source memory, evidence variability was roughly equal for source 1 and source 2 items, and variability increased for items with additional learning attempts. These results demonstrate that RT modeling can help resolve ambiguities regarding the processes that produce different patterns in ROC data. The results strengthen the evidence that memory strength distributions have unequal variability across item types in recognition and source memory tasks.
Asunto(s)
Toma de Decisiones/fisiología , Memoria/fisiología , Tiempo de Reacción/fisiología , Reconocimiento en Psicología/fisiología , Adulto , Humanos , Modelos Psicológicos , Adulto JovenRESUMEN
In seven experiments, we explored the potential for strength-based, within-list criterion shifts in recognition memory. People studied a mix of target words, some presented four times (strong) and others studied once (weak). In Experiments 1, 2, 4A, and 4B, the test was organized into alternating blocks of 10, 20, or 40 trials. Each block contained lures intermixed with strong targets only or weak targets only. In strength-cued conditions, test probes appeared in a unique font color for strong and weak blocks. In the uncued conditions of Experiments 1 and 2, similar strength blocks were tested, but strength was not cued with font color. False alarms to lures were lower in blocks containing strong target words, as compared with lures in blocks containing weak targets, but only when strength was cued with font color. Providing test feedback in Experiment 2 did not alter these results. In Experiments 3A-3C, test items were presented in a random order (i.e., not blocked by strength). Of these three experiments, only one demonstrated a significant shift even though strength cues were provided. Overall, the criterion shift was larger and more reliable as block size increased, and the shift occurred only when strength was cued with font color. These results clarify the factors that affect participants' willingness to change their response criterion within a test list.
Asunto(s)
Señales (Psicología) , Desempeño Psicomotor/fisiología , Reconocimiento en Psicología/fisiología , Adulto , Femenino , Humanos , Masculino , Adulto JovenRESUMEN
Reinstating source details at test often has no impact on source memory. We tested the proposition that participants internally reinstate source cues when such cues are not provided by the experimenter, thus making the external cues redundant. Participants studied words paired with either a male or a female face and were later asked to specify the gender of the face studied with each word. To disrupt the ability to internally reinstate sources, some participants saw eight male faces and eight female faces throughout the study list (multiple-face condition), making it difficult to determine which face should be internally reinstated for uncued test trials. Other participants saw only a single face for each gender (single-face condition), which should facilitate internal reinstatement. Across three experiments, participants in the multiple-face condition showed improved source discrimination when the studied faces were reinstated at test, as compared to uncued trials. In contrast, participants in the single-face condition showed no effect of the face cues. Moreover, the cuing effect for the multiple-face condition disappeared when the test structure facilitated internal reinstatement. Overall, the experiments support the contention that internal reinstatement is a natural part of source retrieval that can mask the effects of external cues.
Asunto(s)
Señales (Psicología) , Memoria Episódica , Recuerdo Mental/fisiología , Adulto , Humanos , Distribución Aleatoria , Adulto JovenRESUMEN
We present a method for measuring the efficacy of eyewitness identification procedures by applying fundamental principles of information theory. The resulting measure evaluates the expected information gain (EIG) for an identification attempt, a single value that summarizes an identification procedure's overall potential for reducing uncertainty about guilt or innocence across all possible witness responses. In a series of demonstrations, we show that EIG often disagrees with existing measures (e.g., diagnosticity ratios or area under the receiver operating characteristic) about the relative effectiveness of different identification procedures. Each demonstration is designed to highlight key distinctions between existing measures and EIG. An overarching theme is that EIG provides a complete measure of evidentiary value, in the sense that it factors in all aspects of identification performance. Collectively, these demonstrations show that EIG has substantial potential to inspire new discoveries in eyewitness research and provide a new perspective on policy recommendations for the use of identifications in real investigations. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
Asunto(s)
Derecho Penal , Recuerdo Mental , Humanos , Derecho Penal/métodos , Incertidumbre , Teoría de la Información , Evaluación de Resultado en la Atención de SaludRESUMEN
We tested two explanations for why the slope of the z-transformed receiver operating characteristic (zROC) is less than 1 in recognition memory: the unequal-variance account (target evidence is more variable than lure evidence) and the dual-process account (responding reflects both a continuous familiarity process and a threshold recollection process). These accounts are typically implemented in signal detection models that do not make predictions for response time (RT) data. We tested them using RT data and the diffusion model. Participants completed multiple study/test blocks of an "old"/"new" recognition task with the proportion of targets and the test varying from block to block (.21, .32, .50, .68, or .79 targets). The same participants completed sessions with both speed-emphasis and accuracy-emphasis instructions. zROC slopes were below one for both speed and accuracy sessions, and they were slightly lower for speed. The extremely fast pace of the speed sessions (mean RT=526) should have severely limited the role of the slower recollection process relative to the fast familiarity process. Thus, the slope results are not consistent with the idea that recollection is responsible for slopes below 1. The diffusion model was able to match the empirical zROC slopes and RT distributions when between-trial variability in memory evidence was greater for targets than for lures, but missed the zROC slopes when target and lure variability were constrained to be equal. Therefore, unequal variability in continuous evidence is supported by RT modeling in addition to signal detection modeling. Finally, we found that a two-choice version of the RTCON model could not accommodate the RT distributions as successfully as the diffusion model.
Asunto(s)
Modelos Psicológicos , Curva ROC , Tiempo de Reacción , Adulto , Humanos , Desempeño Psicomotor , Reconocimiento en Psicología , Detección de Señal PsicológicaRESUMEN
Criss (Cognitive Psychology 59:297-319, 2009) reported that subjective ratings of memory strength showed a mirror effect pattern in which strengthening the studied words increased ratings for targets and decreased ratings for lures. She interpreted the effect on lure items as evidence for differentiation, a process whereby lures produce a poorer match to strong than to weak memory traces. However, she also noted that participants might use different mappings between memory evidence and levels of the rating scale when they expected strong versus weak targets; that is, the effect might be produced by decision processes rather than differentiation. We report two experiments designed to distinguish these accounts. Some participants studied pure lists of weak or strong items (presented once or five times, respectively), while others studied mixed lists of half weak and half strong items. The participants from both groups had pure-strength tests: Only strong or only weak items were tested, and the participants were informed of which it would be before the test. The results showed that strength ratings for lures were lower when strong versus weak targets were tested, regardless of whether the study list was pure or mixed. In the mixed-study condition, the effect was produced even after identical study lists, and thus the same degree of differentiation in the studied traces. Therefore, our results suggest that the strength-rating mirror effect is produced by changes in decision processes.
Asunto(s)
Recuerdo Mental/fisiología , Reconocimiento en Psicología/fisiología , Adulto , Toma de Decisiones/fisiología , Femenino , Humanos , Masculino , Modelos Psicológicos , Distribución Aleatoria , Adulto JovenRESUMEN
We explored a two-stage recognition memory paradigm in which people first make single-item "studied"/"not studied" decisions and then have a chance to correct their errors in forced-choice trials. Each forced-choice trial included one studied word ("target") and one nonstudied word ("lure") that received the same previous single-item response. For example, a studied-studied trial would have a target that was correctly called "studied" and a lure that was incorrectly called "studied." The two-high-threshold (2HT) model and the unequal-variance signal detection (UVSD) model predict opposite effects of biasing the initial single-item responses on subsequent forced-choice accuracy. Results from two experiments showed that the bias effect is actually near zero and well out of the range of effects predicted by either model. Follow-up analyses suggested that the model failures were not a function of experiment artifacts like changing memory states between the two types of recognition trials. Follow-up analyses also showed that the dual process signal detection model made better predictions for the forced-choice data than 2HT and UVSD models. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
Asunto(s)
Modelos Psicológicos , Reconocimiento en Psicología , Humanos , Reconocimiento en Psicología/fisiología , Probabilidad , Sesgo , Bases de Datos FactualesRESUMEN
The present study tested diffusion models of processing in the flanker task, in which participants identify a target that is flanked by items that indicate the same (congruent) or opposite response (incongruent). Single- and dual-process flanker models were implemented in a diffusion-model framework and tested against data from experiments that manipulated response bias, speed/accuracy tradeoffs, attentional focus, and stimulus configuration. There was strong mimcry among the models, and each captured the main trends in the data for the standard conditions. However, when more complex conditions were used, a single-process spotlight model captured qualitative and quantitative patterns that the dual-process models could not. Since the single-process model provided the best balance of fit quality and parsimony, the results indicate that processing in the simple versions of the flanker task is better described by gradual rather than discrete narrowing of attention.
Asunto(s)
Atención/fisiología , Percepción Visual/fisiología , Adulto , Humanos , Pruebas Neuropsicológicas , Tiempo de Reacción/fisiologíaRESUMEN
The author compared high- and low-threshold discrete-state models of recognition memory in terms of their ability to account for confidence and response time (RT) data. The 2-high threshold (2HT), 1-low threshold (1LT), and 2-low threshold (2LT) models were clearly distinguished by the commonly observed inverted-U pattern whereby RTs are longer for low-confidence than high-confidence responses on both sides of the confidence scale (correct responses and errors). The 2HT model was able to match the RT-confidence relationship for correct responses, but it was unable to match the same relationship for errors. The 1LT model could not match the RT-confidence relationship for either correct responses or errors. Only the 2LT model was able to match the full pattern. The differences between models were driven by their fundamental assumptions about memory retrieval: only the 2-threshold models could produce an RT-confidence relationship by mixing relatively fast responses from a detection state with relatively slow responses from an uncertain ("guess") state, and only the 2LT model could do so for both correct and error responses because it allows misleading detection. Quantitative fits also showed that the 1LT model could not account for changes in confidence-rating distributions across memory-strength conditions, and thus this model performed substantially worse than the other two models even when RT data were not considered. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Asunto(s)
Memoria , Modelos Psicológicos , Tiempo de Reacción , Reconocimiento en Psicología , Incertidumbre , HumanosRESUMEN
Does the speed of single-item recognition errors predict performance in subsequent two-alternative forced-choice (2AFC) trials that include an item with a previous error response? Starns, Dubé, and Frelinger found effects of this kind in two experiments and accounted for them in terms of continuous memory-strength signal guiding recognition decisions. However, the effects of error speed might just as well only reflect an artefact due to an error-correction strategy that uses response latency as a heuristic cue to guide 2AFC responses, elicited through confounding factors in their experimental design such as error-correction instructions and feedback. Using two conditions, a replication condition, replicating the procedure from Starns et al., and an extension condition (each n = 130), controlling for the named shortcomings, we replicated the error speed effect. In both conditions, speed of errors in a single-item recognition task was predictive of subsequent 2AFC performance, including the respective error item. To be more precise, fast errors were associated with decreased 2AFC performance. As there was no interaction with the factor condition, the results support the idea that speed of single-item recognition responses reflects the amount of memory information underlying the respective response rather than being used for a simple error-correction strategy to improve 2AFC performance.
Asunto(s)
Reconocimiento en Psicología , Conducta de Elección , Humanos , Memoria , Tiempo de ReacciónRESUMEN
Purpose The purpose of this study was to develop and pilot a novel treatment framework called BEARS (Balancing Effort, Accuracy, and Response Speed). People with aphasia (PWA) have been shown to maladaptively balance speed and accuracy during language tasks. BEARS is designed to train PWA to balance speed-accuracy trade-offs and improve system calibration (i.e., to adaptively match system use with its current capability), which was hypothesized to improve treatment outcomes by maximizing retrieval practice and minimizing error learning. In this study, BEARS was applied in the context of a semantically oriented anomia treatment based on semantic feature verification (SFV). Method Nine PWA received 25 hr of treatment in a multiple-baseline single-case series design. BEARS + SFV combined computer-based SFV with clinician-provided BEARS metacognitive training. Naming probe accuracy, efficiency, and proportion of "pass" responses on inaccurate trials were analyzed using Bayesian generalized linear mixed-effects models. Generalization to discourse and correlations between practice efficiency and treatment outcomes were also assessed. Results Participants improved on naming probe accuracy and efficiency of treated and untreated items, although untreated item gains could not be distinguished from the effects of repeated exposure. There were no improvements on discourse performance, but participants demonstrated improved system calibration based on their performance on inaccurate treatment trials, with an increasing proportion of "pass" responses compared to paraphasia or timeout nonresponses. In addition, levels of practice efficiency during treatment were positively correlated with treatment outcomes, suggesting that improved practice efficiency promoted greater treatment generalization and improved naming efficiency. Conclusions BEARS is a promising, theoretically motivated treatment framework for addressing the interplay between effort, accuracy, and processing speed in aphasia. This study establishes the feasibility of BEARS + SFV and provides preliminary evidence for its efficacy. This study highlights the importance of considering processing efficiency in anomia treatment, in addition to performance accuracy. Supplemental Material https://doi.org/10.23641/asha.14935812.
Asunto(s)
Ursidae , Animales , Anomia/terapia , Teorema de Bayes , Humanos , Terapia del Lenguaje , Tiempo de Reacción , Semántica , Resultado del TratamientoRESUMEN
BACKGROUND: The majority of eyewitness lineup studies are laboratory-based. How well the conclusions of these studies, including the relationship between confidence and accuracy, generalize to real-world police lineups is an open question. Signal detection theory (SDT) has emerged as a powerful framework for analyzing lineups that allows comparison of witnesses' memory accuracy under different types of identification procedures. Because the guilt or innocence of a real-world suspect is generally not known, however, it is further unknown precisely how the identification of a suspect should change our belief in their guilt. The probability of guilt after the suspect has been identified, the posterior probability of guilt (PPG), can only be meaningfully estimated if we know the proportion of lineups that include a guilty suspect, P(guilty). Recent work used SDT to estimate P(guilty) on a single empirical data set that shared an important property with real-world data; that is, no information about the guilt or innocence of the suspects was provided. Here we test the ability of the SDT model to recover P(guilty) on a wide range of pre-existing empirical data from more than 10,000 identification decisions. We then use simulations of the SDT model to determine the conditions under which the model succeeds and, where applicable, why it fails. RESULTS: For both empirical and simulated studies, the model was able to accurately estimate P(guilty) when the lineups were fair (the guilty and innocent suspects did not stand out) and identifications of both suspects and fillers occurred with a range of confidence levels. Simulations showed that the model can accurately recover P(guilty) given data that matches the model assumptions. The model failed to accurately estimate P(guilty) under conditions that violated its assumptions; for example, when the effective size of the lineup was reduced, either because the fillers were selected to be poor matches to the suspect or because the innocent suspect was more familiar than the guilty suspect. The model also underestimated P(guilty) when a weapon was shown. CONCLUSIONS: Depending on lineup quality, estimation of P(guilty) and, relatedly, PPG, from the SDT model can range from poor to excellent. These results highlight the need to carefully consider how the similarity relations between fillers and suspects influence identifications.
Asunto(s)
Criminales , Reconocimiento Facial , Juicio , Modelos Teóricos , Reconocimiento en Psicología , Detección de Señal Psicológica , Adulto , Conjuntos de Datos como Asunto , Culpa , Humanos , ProbabilidadRESUMEN
Purpose Aphasia is a language disorder caused by acquired brain injury, which generally involves difficulty naming objects. Naming ability is assessed by measuring picture naming, and models of naming performance have mostly focused on accuracy and excluded valuable response time (RT) information. Previous approaches have therefore ignored the issue of processing efficiency, defined here in terms of optimal RT cutoff, that is, the shortest deadline at which individual people with aphasia produce their best possible naming accuracy performance. The goals of this study were therefore to (a) develop a novel model of aphasia picture naming that could accurately account for RT distributions across response types; (b) use this model to estimate the optimal RT cutoff for individual people with aphasia; and (c) explore the relationships between optimal RT cutoff, accuracy, naming ability, and aphasia severity. Method A total of 4,021 naming trials across 10 people with aphasia were scored for accuracy and RT onset. Data were fit using a novel ex-Gaussian multinomial RT model, which was then used to characterize individual optimal RT cutoffs. Results Overall, the model fitted the empirical data well and provided reliable individual estimates of optimal RT cutoff in picture naming. Optimal cutoffs ranged between approximately 5 and 10 s, which has important implications for assessment and treatment. There was no direct relationship between aphasia severity, naming RT, and optimal RT cutoff. Conclusion The multinomial ex-Gaussian modeling approach appears to be a promising and straightforward way to estimate optimal RT cutoffs in picture naming in aphasia. Limitations and future directions are discussed.
Asunto(s)
Afasia/psicología , Pruebas del Lenguaje/normas , Modelos Estadísticos , Tiempo de Reacción , Anciano , Anomia/psicología , Femenino , Humanos , Masculino , Persona de Mediana Edad , Distribución Normal , Estándares de ReferenciaRESUMEN
A new model for confidence judgments in recognition memory is presented. In the model, the match between a single test item and memory produces a distribution of evidence, with better matches corresponding to distributions with higher means. On this match dimension, confidence criteria are placed, and the areas between the criteria under the distribution are used as drift rates to drive racing Ornstein-Uhlenbeck diffusion processes. The model is fit to confidence judgments and quantile response times from two recognition memory experiments that manipulated word frequency and speed versus accuracy emphasis. The model and data show that the standard signal detection interpretation of z-transformed receiver operating characteristic (z-ROC) functions is wrong. The model also explains sequential effects in which the slope of the z-ROC function changes by about 10% as a function of the prior response in the test list.