The annual symposium “Credibility Assessment and Information Quality in Government and Business” was this year held on the 5th and 6th of January as part of the “Hawaii International Conference on System Sciences” (HICSS). The symposium on technology assisted deception detection was organised by Matthew Jensen, Thomas Meservy, Judee Burgoon and Jay Nunamaker. During this symposium, we presented our paper “to freeze or not to freeze” that was posted on this blog last week, together with a second paper on “mining bodily cues to deception” by Dr. Ronald Poppe. The talks were of very high quality and researchers described a wide variety of techniques and methods to detect deceit, including mouse clicks to detect online fraud, language use on social media and in fraudulent academic papers and the very impressive avatar that can screen passengers when going through airport border control. I have summarized the presentations for you; enjoy!
Monday 05-01-2015, 09.00-09.05
Introduction Symposium by Judee Burgoon
This symposium is being organized annually during the HICSS conference and functions as a platform for presenting research on the use of technology to detect deceit. Burgoon started off describing the different types of research conducted within the Center for the Management of Information (CMI) that she directs, and within the National Center for Border Security and Immigration. Within these centers, members aim to detect deception on a multi-modal scale using different types of technology and sensors. Their deception research includes physiological measures such as respiration and heart rate, kinetics (i.e., bodily movement), eye-movements such as pupil dilation, saccades, fixation, gaze and blinking, and research on timing, which is of particular interest for online deception. Burgoon’s team is currently working on the development of an Avatar (DHS sponsored): a system with different types of sensors that work together for screening purposes (e.g., border control; see abstracts below for more information). The Avatar is currently been tested at Reagan Airport. Sensors include a force platform, Kinect, HD and thermo cameras, oculometric cameras for eye-tracking, and a microphone for Natural Language Processing (NLP) purposes. Burgoon works together with the European border management organization Frontex.
Monday 09.05-09.40; Michael Byrd, Mark Grimes, Jim Marquardson, Judee Burgoon
The Misclassification of Deceptive Behavior
Usually deception researchers aim to identify differences between truth tellers and liars to increase the accuracy of their judgments. In the current study, the authors did not focus on detection rates, but instead looked at the misidentifications in prior research; why are we incorrectly classifying if someone is lying or not. Misidentifications can be false positives (i.e., truth tellers are identified as liars) and misses (liars are identified as truth tellers), and both types are important for different reasons. The authors re-analyzed prior data sets to investigate what the misidentifications in these studies were caused by. They used a signal detection theory, and the objective was a classification test with no false positives. However, real signals can overlap, and achieving a perfect classification is therefore hard. The authors used machine-learning algorithms to automatically choose cutoff scores and thereby manipulate how many false positives and misses occurred. The study analyzed was a culture sensitive study in which participants were instructed to lie. Data included cameras to measure movement, and equipment to measure vocalic and linguistic cues. The authors compared true positives (liars) with false positives (truth tellers identified as liars), and true negatives (truth tellers) and false negatives/misses (liars who were identified as truth tellers). Results showed that false positives were caused predominantly caused by cognitive effort, vocal pleasantness and average shrug duration, while misses were caused by a wide range of factors. This knowledge about causes for misidentifications can be used to increase the accuracy of detection rates, and can serve as a safeguard against innocent people being identified as liars with all potential consequences.
To freeze or not to freeze: A motion-capture approach to detecting deceit
The authors presented a new robust signal for detecting deception: full body motion. They used an automatic analysis of bodily movements to determine how people behave when telling truths and lies. Previous research in this area was conducted using manual coding, which is not only time consuming but also open to issues of reliability and subjectivity. The results from these manually coded studies were often contradicting and typically had small effect sizes (i.e., if there was a difference, the difference was small). To circumvent these reliability issues, the authors measured movement automatically using two Xsens full-body motion-capture suits, which registers movement 120 times per second in 17 sensors located across the body. This data allows for creating a 3D representation of the human body and the movements they display over time, and was used to measure behavioral differences between truth tellers and liars. Suspects (n = 90) did (truth condition) or did not complete (lie condition) two tasks, and were subsequently interviewed by another participant (n = 90) of their own or a different culture about these two tasks. The authors discovered that full body motion – the sum of joint displacements – was indicative of lying approximately 75% of the time. Furthermore, movement was guilt-related, and occurred independently of anxiety, cognitive load and cultural background. Further analyses indicated that including individual limb data, in combination with appropriate questioning strategies, can increase its discriminatory power to 82.2%. This culture-sensitive study provides an objective and inclusive view on how people actually behave when lying. It appears that full body motion can be a robust nonverbal indicator of deceit, and suggests that lying does not cause people to freeze. However, should full body motion capture become a routine investigative technique, liars might freeze in order not to give themselves away; but this in itself should be a telltale.
Van Der Zee finished her talk by promoting Decepticon: The International Conference on Deceptive Behavior (24-26 August 2015, Cambridge UK), where all people interested in the detection and prevention of deception will come together and discuss current work, future directions and how to connect theory to practice. Both people from academy and industry are very welcome to attend, and/or give a talk. Abstract submission deadline is April 1st.
Monday 10.45-11.20; Joseph Y. Thomas, David P. Biros
Theoretical validation of Interpersonal Deception Theory (IDT) in real-world, high-stakes deceptive speech
The authors used the real world case of Mr. Perry to automatically evaluate the Interpersonal Deception Theory in a high stake situation. The results indicate that the sender (liar) is changing their behavior based on interviewer suspicion, highlighting the importance of the interactive processes when detecting deceit, as proposed in the Interpersonal Deception Theory by Buller and Burgoon (1994, 1996). The authors aim to create a database of real case studies for future research on behavioral changes during ecologically valid, high-stake lies.
Monday 11.20-11.55; Victoria L. Rubin, Niall J. Conroy, Yimin Chen
Towards News Verification: Deception detections methods for news discourse
Authors used two different types of analysis for deciding on the veracity of news messages. Although researchers have previously looked at written messages such as fake reviews and fake dating profiles, news messages have not been studied yet. However, news messages can have a large effect on people’s decision making and their worldview, which is why identifying false/fake/deliberately misleading news messages is of high importance. Previous research in the area of written messages have achieved detection rates of 74% using Natural Language Processing (NLP) and 70% with data mining. For their study, the authors identified news messages that were clearly “fake” based on items in the funny “Bluff the listener” NPR radio show and contacted the journalists who wrote the article. These articles were analyzed using Rhetorical Structure Theory (RST) developed by Mann & Thompson (1988) and vector space modeling. RST identifies how elements in texts cohere as a discourse (Elementary Discourse Units, EDUs). A logistic regression between truthful and deceptive stories (normalized for length) a 63% accuracy rate in detecting deceptive news messages using RST clustering (a total of 4 clusters appeared). However, the news messages used in this study were all humorous, so a next step would be to investigate differences between genuine and false news messages with a more neutral tone.
Face and head movement analysis using automated feature extraction software
Some researchers, for example Ekman, claim that micro (facial) expressions, often caused by different emotions experienced when lying and when telling the truth, work well to detect deceit. The authors wanted to test if micro expressions can really be used to detect deceit, especially because a lot of money and effort has been spent on training law enforcement agents to spot micro expressions. For this purpose the authors used a Face Analysis System. Some existing systems are: CERT (developed by University of California, San Diego), IntraFace (developed at Carnegie Mellon University) and Rutgers. These programs usually give information on both specific muscle movements (e.g., brow lower, jaw drop) and the derived emotions from those emotions (e.g., happiness, anxiety). The authors used the CERT system on an existing data set (neutral, charged and target questions about a mock crime, and additionally some Concealed Information Questions; participants were being interviewed by a professional polygraph interviewer) to determine if micro expressions can be used to distinguish between truth tellers and liars. They compared facial expressions during neutral, charged and target questions (within subjects design) and between truth tellers and liars (between subjects design). The authors recorded data with several devices, such as video, audio, thermal cameras, laser Doppler vibrometer and they specifically recorded facial expressions. The results show that there was little results in the between-subjects analysis (i.e., between truth tellers and liars), but they did reveal differences in micro-expressions in the within-subjects sample. In other words, people did change their behavior over time depending on truth telling or lying. Several muscle movements were dependent on veracity condition, such as lip stretch, eye widen and lip corner pull. The authors also discovered that several emotions that are thought to be related to lying, such as fear and anxiety, were not actually related in this sample. On the other hand, intensity was related to deceit, with liars being less expressive than truth tellers.
The discriminant validity of deception indicators: An analysis of data from multiple experiments
The authors reviewed three previous data sets (e.g., soft biometrics study, cultural benchmarks study and the stolen ring mock crime study) and identified a set of promising signals. They used a discriminant analysis to identify cues that are most useful for classifying deception. Within these three studies, all participants were incentivized to deceive, and were subsequently interviewed while different types of data were gathered (e.g., audio for linguistic analysis and vocal analysis, body movements and eye-tracking). Logistic regressions were performed; 156 predictive models were created and summarized. Subsequently, non-helpful parameters were systematically dropped. The most promising preliminary results were blinks (duration, occurrence and blinks per frame), followed by head movement and some vocal features such as AVJ, FFlic and LieProb, and the variance in the triangle area. The linguistic cues that were predictive of deception were self-references and sensory ratio. In the soft biometric study, an additional set of measures were recorded, and classification based on some of these measures worked very well; especially when decisions were based on mouse movement and keyboard dynamics, achieving great accuracy results of 95% for truths and 86% for lies.
Building the Human Firewall: Lessons learned from organizational anti-phishing initiatives (Research in progress)
Phishing is an often targeted, unique and costly type of deception. Phishing involves an attack message being send to individuals and if the individual responds it can have a set of negative consequences including the installation of malware and the retrieval of confidential credentials. These attacks occur through legit channels such as email, so it is not an option to just close the channel. Individual anti-phishing strategies so far include automatic filtering, warning users and training users. However, phishing messages are still being let through. The authors aimed to investigate what organizations are currently doing, and what they can potentially do against phishing attacks. In other words, the research is aimed at an organizational level rather than an individual level. To investigate what organizations are currently doing, structured interviews with IT security managers and directors are being conducted. Organizations are being classified based on the occurrence/absence of security documents, matched organizations across industries and matched organizations on security readiness levels. Currently a small set of interviews is gathered and analyzed, and more will follow.
Lessons learned from successive deception experiments
The authors realized that deception research is difficult; deception research often consist of long and complicated designs that need to resemble real life as much as possible, without it being to similar to sanctioning. The authors wondered what they learned from years of conducting deception experiments, so they and other researchers can learn from previous “mistakes”. A previous border control study (Detecting Impostership through Soft Biometrics and Cognitive Metrics, 2013) involved creating over a hundred fake IDs. They started with 107 participants, but approx. 20% of the participants had to be removed due to design and study errors such as technical failures, difficulties that participants had with understanding the experiment and refusing and confessing participants. This experiment led to understanding problems that can occur during deception research, such as the design being too realistic or too difficult, causing people to refuse participation or to confess during the interview. The authors also realized there were several awkward and unrealistic moments during the experiment, and they realized they could do better, so they changed the study protocol. In the new study they tested 186 participants, and after equipment failure and the removal of confessions, they were left with 152 good data points. The biggest difference they made is that they took the human (experimenter) element out of the experiment, and used a chat bot instead. The bot consisted of a picture and the interaction was based on typed responses. Participants were both instructed by a bot, and interviewed by a different bot. This method led to a more consistent research scenario and more controlled circumstances. Evidence for the improvement of the second study for example is the amount of messages that participants left on the post-sessions questionnaire. Participants also rated the second study as more realistic, participants reported being more dishonest in the second study (which is a good thing in a deception study) and less people confessed in the second study (from 11.8 % in the first study to 6.4% in the second study). This second study shows how simplifying the research design and taking the human element out of the experiment can help to create an ecologically valid research design and to decrease the amount of data that has to be removed. Deception research will need to balance realism, precision and generalizability, and reducing contact with experimenters will help increase consistence and reduce participant confusion.
Tuesday 08.00-8.45; Joseph Valacich
Investigating the effect of fraud on mouse usage in human-computer interactions
Fraud is the second largest white-collar crime in the US and is costing society large sums of money. Unfortunately, identifying fraud can be expensive and slow, for example because fraud can range from making small changes to completely fabricating accounts. Previous fMRI research has shown that when someone’s state changes, their motor skills change as well. The authors were inspired by this research and built software to investigate if there is a relationship between motor skills and cognitive processes in a fraud context. The software registers mouse movements and the authors found that people behave differently when they are deceiving (e.g., when committing fraud), compared to when they are being honest. Importantly, committing fraud can be accompanied by feelings of doubt and uncertainty. People can feel a struggle between the desire to be good and the desire to benefit from committing fraud; a claim also made by Dan Ariely. The authors conducted a mock insurance claim experiment and found that this fraud-related indecisiveness is shown in the fraudster’s mouse movements. Results revealed that the people that committed fraud made more left clicks, were slower and covered more distance in their mouse movements. Based on all features that are measured by the software, fraud was correctly detected in 92.8% based on their mouse movements. Interestingly, the authors also created a plugin that talks to their web service, so it can register similar mouse movement when people are submitting online forms. This can potentially be used when people file job applications and insurance claims online. This line of research can be further developed by extending it to measuring swipe movements on phones and tablets. A live demonstration of the software followed.
Mining cues to deception
A significant body of literature has reported research on the potential correlates of deception and bodily behavior. The vast majority of these studies consider discrete bodily movements such as specific hand or head gestures. While differences in the number of such movements could be an indication of a subject’s veracity, they account for only a small proportion of all performed behavior. Such studies also fail to consider quantitative aspects of body movement: the precise movement direction, magnitude and timing are not taken into account. In this paper, we present and discuss the results of a systematic, bottom-up study of bodily correlates of deception. In other words, are there any behavioral differences between truth tellers and liars? The authors conducted a user experiment where subjects either were deceptive or spoke the truth. Their body movement was measured using motion capture suits yielding a large number of global and local movement descriptors. The results showed that the larger the time window was (varied from 1 second to the full 2.5 minutes), the better deception could be predicted. Therefore, longer samples will be more indicative of deceit than short statements. The authors also identified that arm movements were more indicative of deceit than leg movements, and features that involved 3 or more body parts were also more indicative of deceit than features based on 1 or 2 body parts. Importantly, no feature category by itself performed better than the total feature set. That means that movement patterns involving different limbs might be a viable way to detect deceit.
Tuesday 10.00-10.30; Lisa Crossley, Michael Woodworth, Pamela Black, Robert Hare
The dark triad of personality and computer-mediated communication: Differential outcomes for dark personalities in face-to-face versus online negotiations
The dark triad exists of (subclinical) psychopathy, (subclinical) narcissism and Machiavellianism. All three personality traits have been linked to enhanced manipulation skills and to increased criminal and antisocial behavior. The prevalence of psychopathy is about 1% in the normal population, about 4% in businesses (i.e., snakes in suits) and even higher (approx. 15-20%) in prison populations. The authors ran a study using people who scored low and high on assertiveness, and had other participants rate the vulnerability of those people, followed by questionnaires on dark triad traits. Participants who scored high on dark triad personality traits viewed other people overall (so regardless of their actual assertiveness) as more vulnerable and easy to take advantage of than people who scored low on the dark triad. Interestingly, on the interview part of the study, psychopaths displayed very different behavior than non-psychopathic people; psychopaths moved more, and it looked like they were leading the interview even if they were the ones being interviewed. The authors hypothesize that these nonverbal behaviors can be used as distraction, or to display confidence. The authors ran a second study, n = 205 (females = 130) on the effect of dark triad personalities on negotiation outcome. Participants were instructed to negotiate about the purchase of a pair of concert tickets. Max. 20-minute long negotiations were held or face-to-face (took on average 9-10 minutes), or in computer mediated interactions (majority took the maximum amount of 20 minutes). Subsequently, participants filled out SRP-4, NPI and Mach-IV questionnaires, averaged to 1 dark triad average (based on z-scores). In general, sellers get more money in computer-mediated negotiations. So when buying something, do it face-to-face, and when selling something, use the computer for optimal outcome. Overall, people high on the dark triad were less successful in negotiations than people who scored low on the dark triad. People high who scored high on the Dark Triad (DT) in computer mediated negations were doing the worst; superficial charm associated with the dark triad might not translate well to computer mediated interactions. Also, dark triad people can have problems with language use, which might reduce their charm online where language is “all you’ve got” to make a good impression. This information is important for police interviews, and suggests that conducting interviews with people who score high on the Dark Triad might be better conducted online. Future research will hopefully be conducted with actual psychopaths in a prison sample, rather than students who score relatively high on the dark triad questionnaires.
Linguistic evidence of obfuscation in fraudulent research paper
Inspired by the uncovered falsified papers by the Dutch psychologist Diederik Stapel, the authors were wondering if they would be able to identify false academic papers. There has been a steady increase in the amount of papers that are being retracted. When scientists fake data, do they leave traces in their writing style? Previous research has looked at falsified financial reports, and there it was found that fraudulent reports are less readable and less positive (to avoid overselling) than genuine reports. The authors used the software program LIWC (developed by James Pennebaker) to analyze language use in 253 fraudulent and 253 control papers (i.e., that were not retracted) that were matched on journal, key words, structure and year of publication. LIWC measures how often specific categories of words occur, and the authors made a distinction between the different sections of an academic paper (i.e., the introduction, methods, results and discussion sections). The authors found that fraudulent text is more obfuscated across all paper sections (intro, methods, results and discussion). Fraudulent papers were also written with more causal terms than genuine papers, but only in the results and discussion section of the paper. Fraudulent papers used more positive emotion terms, but only in the results section. Interestingly, fraudulent papers were less readable in the introduction and the discussion section than genuine papers were. In general, academic texts have low readability compared to other types of text, and this is especially the case when the paper is fraudulent. Fraudulent text also contains more jargon than genuine texts, in the intro, method and discussion. The last measure, abstraction, was higher in fraudulent papers across all sections. Based on the before mentioned features, the authors could predict fraudulent papers with an accuracy of 57.2% (obfuscation model). In addition, the authors looked at the amount of references used, and found that 3.5 more references were used in the fraudulent papers compared to genuine papers, and also that the more obfuscated a text was, the more references were used. In previous research the authors had also looked specifically at Diederik Stapel’s writing style in genuine (n = 25) and fabricated (n = 24) papers. They found similar differences in word use and also demonstrated that Stapel on average had listed fewer co-authors on fraudulent papers than on genuine papers.
Understanding psychopathy and social media use with a linguistic lens
Psychopathy consists of callous affect (lack of certain emotions and empathy), erratic life style (reckless, for example shown by high amount of sexual partners), criminal tendency (prison prevalence 15-20%) and interpersonal manipulation (superficial charm and manipulate others frequently to get more resources). Although psychopaths are extremely skilled at manipulating, they are not particularly organized or cohesive, which can cause language difficulties. The authors interviewed psychopathic and non-psychopathic murderers and analyzed their language use. Psychopathic murderers were more disfluent, talked about the crime more in the past tense, mentioned more often cause and effect (because), and psychopaths mentioned more often information about their lower level needs (such as what they ate the day of the murder), while other murderers talked more about higher level needs. A second study using students who scored high and low on psychopathy showed that psychopaths used more first person singular pronouns, but less second person pronouns (2013), so they talked more about themselves than about others. The authors were wondering, with all the written information now available on social media, what can this tell us about psychopathic language use? Language use and word choice turns out to be a good psychological marker of personality traits. In a third study, students wrote a positive / negative story, and gave the experimenters access to their last 20 emails, texts and facebook messages, followed by filling out a psychopathy questionnaire. In general, in social media people use less first person singular pronouns and more 2nd person pronouns, and use more present tense and less past tense. On social media, the more psychopathic, the less reference people make to others and the less readably their messages are, so these findings are in line with the previously discovered language difficulties and focus on the self instead of others. Also, the more psychopathic people are, the more they swear, display anger and create psychological distance. Interestingly, when measuring the written stories (not interactive), the text written by psychopathy was only less readable; the other effects only occurred when interacting with others (i.e., were visible in social media but not in the written stories). This research shows that social media communication can actually give an interesting insight in psychopathic personality traits, and this information is much richer than general written accounts. Last, a logistic regression indicated that we can detect with an accuracy of 78.9% if people belonged to the lower or higher psychopathy quartile within this sample.
Hancock also promoted the Credibility Assessment and Intent Research Network (CAIRN), who will meet in Miami in February 2015. This meeting will include researchers such as James Pennebaker , Paul J. Taylor, Judee Burgoon, David Markowitz and Sophie Van Der Zee, and the attendees are aiming to re-analyze existing deception data files in a consistent manner to identify language indicators of deceit. This will allow for writing a meta-analysis in which the same set of language cues can be examined across a large set of studies, providing a reliable insight in the type of words people use when lying.
Tuesday 14.00-14.45; Elyse Golob
Avatar for automated passport control field trial at the Bucharest international airport
With this study, the authors aimed to close the gap between researchers and practitioners in the area of passport control. The kiosk avatar (described in the introduction of this symposium as well) measures what is said, how it is said (vocal changes), eye and body movement, gaze-tracking and facial expressions. In addition, the avatar comprised an E-passport reader, finger print scanner and credit card reader. When interacting with a passenger, his/her data is summarized and a risk assessment is made, which shown on a tablet that the customer officer can check to decide to talk to the person a bit more or not. There is a line of Frontex studies on the use of technology to promote border security. The first study was the bomb study in 2010 in which the authors aimed to detect passengers carrying a bomb; 50%-50% base rate (the amount of target people was reduced in later studies to resemble real life better, seen as not half of the normal passengers will carry a bomb). Overall, the authors achieved an accuracy rate of 87% with bomb carriers being correctly detected with an accuracy of 100%, and voice and eye-movements turned out to be the most important cues. Afterwards, feedback from the border guards was gathered and implemented in future designs. The second study, conducted in Warsaw, Poland, involved travelling with false documents (2011). Interviews were conducted by the avatar in 5 different languages, and it correctly detected imposters with a 94.7% accuracy rate. The feedback on this experiment was the call for higher stakes, and custom officers noted that the information on the tablet was too detailed, and preferred a “suspicious/not suspicious division”. The researchers adapted this feedback later by using a color-coded decision; green, yellow and red. The next study involved a football hooligan study in the Netherlands (2012), in which participants who strongly supported one of the involved teams received mock tickets for a Feyenoord-Ajax football match. Some participants had to pretend to support the rival team to be “allowed to attend the game”, i.e., the imposters. The avatar outperformed border guards and identified 85% of the imposters. The last Fortex study (2013; n = 140) involved a 3-day experiment in a passport control setting, and took place at Bucharest airport. The avatar (4th generation) now had an automatically adjustable height, and biometric and document scanners were added at the bottom. Because in this study “real passengers” rather than participants were tested, ground truth could not be established and accuracy rates could not be calculated. Also, there was a low likelihood of criminal activity or mal-intent, but it was interesting to see how real travellers interacted with the Avatar. Incidentally, within this real-world sample, 1 person was denied access to the country, and this person was flagged up by the Avatar as well. In total, 2% of passengers were flagged as high risk (i.e., red). 15% were flagged as low/medium risk (i.e., yellow). Because during border control screening time is of the essence (no one wants to miss their flight), interviews length was an important feature. Interviews conducted by the Avatar were between 30 and 90 seconds long, which was not longer than an average passport control interview. Importantly, after the experiment passengers were asked how they experienced their interaction with the avatar, and 90% liked it, 75% would use the avatar in the future and 96% found the avatar easy to use. In the future, voice recognition will need to be improved to avoid false positives, for example by speaking several languages so passengers do not have to respond in their second language, and somehow the noise in the environment will have to be reduced to understand better what the traveller is saying. Other future research will be on integrating the avatar with border guard decision-making, and to harmonize avatar with border control processes such as Smart Borders and ABCs. The authors are aware that as soon as a new technology is implemented, people will try to use counter measures; so research is needed on the type of counter measures that can be expected. More, improved and new sensors can be added, and lastly, more contextual and tailored questions that can be asked.