Published July 9, 2024 | Version v1
Dataset Open

Separable processes for live "in-person" and "zoom-like" faces

Authors/Creators

  • 1. Yale School of Medicine

Description

Increased reliance on Zoom-like (webcam) platforms for interpersonal communications has raised the question of how this new virtual format compares to real face-to-face interactions. This question is also relevant to current models of face processing. Neural coding of simulated faces engages feature-selective processes in the ventral visual stream and two-person live face-to-face interactions engage additional face processes in the lateral and dorsal visual streams. However, it is not known if and/or how live in-person face processes differ from live virtual face processes because the faces and tasks are essentially the same. Current views of functional specificity predict no neural difference between the virtual and live conditions. Here we compare the same live faces viewed both over a video format and in person with measures of functional near-infrared spectroscopy (fNIRS), eye tracking, pupillometry, and electroencephalography (EEG). Neural activity was increased in dorsal regions for in-person face gaze and was increased in ventral regions for virtual face gaze. Longer dwell times on the face, increased arousal indexed by pupil diameter, increased neural oscillation power in the theta band, and increased cross-brain coherence were also observed for the in-person face condition. These findings highlight the fundamental importance of real faces and natural interactions for models of face processing.

Notes

Funding provided by: National Institute of Mental Health
ROR ID: https://ror.org/04xeg9z08
Award Number: R01MH107513

Funding provided by: National Institute of Mental Health
ROR ID: https://ror.org/04xeg9z08
Award Number: R01MH119430

Funding provided by: National Institute of Mental Health
ROR ID: https://ror.org/04xeg9z08
Award Number: R01MH111629

Funding provided by: China Scholarship Council
ROR ID: https://ror.org/04atp4p48
Award Number: 201906140133

Funding provided by: National Institute on Deafness and Other Communication Disorders
ROR ID: https://ror.org/04mhx6838
Award Number: R37HD090153

Methods

Participants

Sample size was determined by a power analysis based on prior face gaze experiments (Noah, et al, 2020) where peak brain activations between task and rest in the rTPJ were 0.00055 + 0.0003 and the distance (signal difference/standard deviation) was 0.534. Using the "pwr" package of R statistical software (Champely, 2020) at a significance of p < 0.05 the sample must include 23 participants to ensure the conventional power of 0.80. Our sample size of 28 meets and exceeds that standard.

All participants provided written informed consent in accordance with guidelines approved by the Yale University Human Investigation Committee (HIC # 1501015178). Dyads were assigned in order of recruitment, and participants were either strangers before the experiment or casually acquainted. Participants were not stratified further by affiliation or dyad gender mix. Six pairs were mixed gender, six pairs were female-female, and two pairs were male-male. 

Paradigm

Each dyad participated in two tasks in which they were seated 140 cm across a table from each other. In both tasks, dyads were instructed to gaze at the eyes of their partner (Figure 1). In the In-person condition, dyads had a direct face-to-face view of each other. A panel of smart glass (glass that is capable of alternating its appearance between opaque and transparent upon application of an appropriate voltage) was positioned in the middle of the table 70 cm away from each participant (Figure 1A). In the Virtual Face condition, each dyad watched their partner's faces projected in real-time on separate 24-inch 16 × 9 computer monitors placed in front of the glass (Figure 1B). The in-person and virtual conditions were performed in the same location by the same dyads (see illustration in Fig 1A and B) to avoid questions regarding whether the virtual partner was real or not. Participants were instructed to minimize head movements, remain as still as possible during the task by avoiding large motions, and maintain facial expressions that were as neutral as possible. The time series (Figure 1C) and experimental details are similar to previous studies (Hirsch et al., 2017; Noah et al., 2020). At the start of a block, prompted by an auditory beep, dyads fixated on a crosshair located in the center of the monitor in the Virtual Face condition or in the center of the opaque smart glass in the In-person condition. The face of the Virtual partner was visual-angle corrected to the same size as the In-person Face (Figure 1B). The auditory tone also cued viewing the crosshair during the rest/baseline condition according to the protocol time series (Figure 1C).

Six 15-second (s) active task periods alternated with a 15-second rest/baseline period for a total of 3 minutes per run. The task period consisted of three 6 s cycles in which face presentation alternated "on" for 3 s and "off" for 3 s for each of the three events (Figure 1C). The smart glass became transparent during the "on" period and opaque during the "off" and rest periods. The time series was performed in the same way for all conditions. During the 15 s rest/baseline period, participants focused on the fixation crosshair, as in the case of the 3 s "off" periods that separated the eye contact and gaze events and were instructed to "clear their minds" during this break. The 3 s time "on" period was selected due to increasing discomfort when maintaining eye contact with a live partner for periods longer than that (Hirsch et al., 2017; Noah et al., 2020). Each 3-minute run was repeated twice. The whole paradigm lasted 18 minutes. Stimulus presentation, eye-tracking data acquisition, fNIRS signal acquisition, and EEG signal acquisition were synchronized using TTL and UDP triggers (details below) that were sent to all machines simultaneously.

Data Acquisition

Eye Tracking. Eye tracking data were acquired using two Tobii Pro x3-120 eye trackers (Tobii Pro, Stockholm, Sweden), one per participant, at a sampling rate of 120 Hz. In the In-person condition, eye trackers were mounted on the smart glass facing each participant. Calibration was performed using three points on their partner's face prior to the start of the experiment. The partner was instructed to stay still and look straight ahead while the participant was told to look first at the partner's right eye, then left eye, then the tip of the chin. In the Virtual Face condition, eye trackers were mounted on the lower edge of the computer monitor facing each participant, and the same three-point calibration approach was applied using the partner's face displayed on the computer monitor via webcam.

Tobii Pro Lab software (Tobii Pro, Stockholm, Sweden) and OpenFace (Baltrušaitis et al., 2016) were used to create areas of interest for subsequent eye-tracking analyses performed in MATLAB 2019a (Mathworks, Natick, MA). UDP signals were used to synchronize the triggers from the stimulus presentation program to a custom virtual keyboard interpretation tool written in Python and sent to the Tobii Pro Lab software. When a face-watching trial started and ended, UDP triggers were sent via Ethernet from the paradigm computer to the eye-tracking computers, and the virtual keyboard "typed" a letter that marked the events in the eye-tracking data recorded in Tobii Pro Lab subsequently used to delimit face-watching intervals.

Pupillometry. Pupil diameter measures were acquired using the Tobii Pro Lab software and post-processing triggers to partition time sequences into face-watching intervals. Left and right pupil diameters were averaged for each frame and interpolated to 120 Hz as gaze position sampling.

Electroencephalography (EEG). A g.USBamp (g.tec medical engineering GmbH, Austria) system with two bio-amplifiers and 32 electrodes per subject were used to collect EEG data at a sampling rate of 256 Hz. Electrodes were arranged in a layout similar to the 10-10 system; however, exact positioning was limited by the location of the electrode holders, which were held rigid between the optode holders. Electrodes were placed as closely as possible to the following positions: Fp1, Fp2, AF3, AF4, F7, F3, Fz, F4, F8, PC5, PC1, PC2, PC6, T7, C3, Cz, C4, T8, CP5, CP1, CP2, CP6, P7, P3, Pz, P4, P8, PO3, PO4, O1, Oz, and O2. Conductive gel was applied to each electrode to reduce resistance by ensuring contact between the electrodes and the scalp. As gel was applied, data were visualized using a bandpass filter to allow frequencies between 1 and 60 Hz. The ground electrode was placed on the forehead between AF3 and AF4 and an ear clip was used for reference..

Functional Near-Infrared Spectroscopy (fNIRS). A Shimadzu LABNIRS system (Shimadzu Corp., Kyoto, Japan) was used to collect fNIRS data at a sampling rate of 123 ms. Each emitter transmitted three wavelengths of light, 780, 805, and 830 nm, and each detector measured the amount of light that was not absorbed. The amount of light absorbed by the blood was converted to concentrations of OxyHb and deOxyHb using the Beer-Lambert equation. Custom-made caps with interspersed optodes and electrode holders were used to acquire concurrent fNIRS and EEG signals (Shimadzu Corp., Kyoto, Japan). The distance between optodes was 2.75 cm or 3 cm, respectively, for participants with head circumferences less than 56.5 cm or greater than 56.5 cm. Caps were placed such that the most anterior midline optode holder was ≈2.0 cm above nasion, and the most posterior and inferior midline optode holder was on or below inion. Optodes consisting of 40 emitters and 40 detectors were placed on each participant to cover bilateral frontal, temporal, and parietal areas (Figure 1D), providing a total of 60 acquisition channels per participant. A lighted fiber-optic probe (Daiso, Hiroshima, Japan) was used to remove hair from the optode channel before optodes were placed. To ensure acceptable signal-to-noise ratios, resistance was measured for each channel before recording. Adjustments were made until all optodes were calibrated and able to sense known quantities of light from each laser wavelength (Noah et al., 2015; Ono et al., 2014; Tachibana et al., 2011).

After the experiment, a Polhemus Patriot digitizer (Polhemus, Colchester, Vermont) was used to record the position of EEG electrodes and fNIRS optodes, as well as five anatomical locations (nasion, inion, Cz, left tragus, and right tragus) for each participant (Eggebrecht et al., 2014; Eggebrecht et al., 2012; Ferradal et al., 2014; Okamoto & Dan, 2005; Singh et al., 2005). Montreal Neurological Institute (MNI) coordinates (Mazziotta et al., 2001) for each channel were obtained using NIRS-SPM software (Ye et al., 2009). Anatomical correlates were estimated with the TD-ICBM152 atlas using WFU PickAtlas (Maldjian et al., 2004; Maldjian et al., 2003).

Data Analysis

Signal processing of eye tracking data and calculation of duration of gaze on faces. Eye tracking data were exported from the Tobii Pro Lab software to the data processing pipeline and custom scripts in MATLAB were used to calculate the duration of gaze on faces, variability of gaze, and pupil diameter. OpenFace (Baltrušaitis et al., 2016) was used to generate the convex hull of an 'average face' using 16 (8 pairs) of the individual OpenFace results from the Tobii videos to partition gaze directed at the face or not.

Statistical analysis of eye contact. The eye gaze task alternated between eye gaze (participants were expected to fixate on either the eyes of their partner's virtual face or the real eyes of their live partner) and rest (participants were expected to fixate on either the crosshair on the computer monitor (Virtual Face condition) or a red dot on the smart glass (In-person condition). The eye gaze portions of the task were 3 s in length, with 6 per trial, for 18 s of expected eye contact over the trial duration (Figure 1C). Usable eye-tracking data were acquired for 18 participants (9 dyads). To avoid possible transition effects caused by shifting eye gaze between stimuli (partner's eyes) and fixation, the initial 1000 ms of each eye gaze trial were excluded from analysis. Samples marked by Tobii as "invalid" and samples outside of the polygon defined by the average "face" by OpenFace were also discarded. Measures derived for each trial included Dwell Time (DT), computed as the number of retained samples over the gaze interval normalized by sampling rate (seconds), which represents the duration of gaze contacts on either the virtual face or the face of the live partner. To measure the variability of the gaze on partner's face, standard deviations were calculated by computing the log horizontal (HSD) and vertical (VSD) deviations from the mean-centered samples of each gaze interval normalized by the number of retained samples. Pupil diameter over face-watching intervals was z-scored by participant (PDZ). Linear mixed-effects models (Bates et al., 2007) were fitted in R (R Core Team, 2018) on DT, HSD, VSD, and PDZ separately.

Electroencephalography (EEG). EEG signals were preprocessed using EEGLAB v13.5.4b in MATLAB 2014a (Mathworks, Natick, Massachusetts). EEG was digitized at a sampling rate of 256 Hz. MATLAB was used to filter the data with a bandwidth of 1-50 Hz for each participant. Two types of channels exhibiting noise characteristic of poor contact with the scalp were rejected based on visual inspection: (1) signals with amplitude exceeding 100 μV, and (2) signals that were completely flat with low-frequency drift. With these criteria, an average of 3 channels per person were removed, and signals from the surrounding channels were interpolated. A common average reference was computed using the 32 data channels and averaged to produce one epoch data file per condition with -100 to 3000 ms epochs, where the 0 ms point is locked to face presentation (In-person Face vs. Virtual Face). The 100 ms before task onset served as a baseline. These files were manually inspected for epochs containing eye movements and blinks, which were discarded from further analysis. Wavelet decomposition algorithms were applied to EEG signals within the first 250ms to calculate the EEG power in the following frequency bands: theta (4-8 Hz), alpha (8-13 Hz), and beta (13-30 Hz). T-tests (Virtual Face vs. In-person Face) were conducted on each frequency band.

Functional Near-Infrared Spectroscopy (fNIRS). The analysis methods used here have been described previously (Dravida et al., 2018; Hirsch et al., 2018; Noah et al., 2017; Noah et al., 2015; Piva et al., 2017; Zhang et al., 2017; Zhang et al., 2016) and are briefly summarized below. First, wavelet detrending was applied to the combined (Hb Diff) hemoglobin signal (the sum of the oxyhemoglobin and the inverted deoxyhemoglobin signals, HbDiff) (Tachtsidis et al., 2009) to remove baseline drift using the algorithm provided by NIRS-SPM (Ye et al., 2009). The combined OxyHb and deOxyHb signals are reported here representing the most comprehensive measurement. However, consistent with best practices for fNIRS data (Yücel et al., 2021), results from the separate signals are included in Supplementary Figures S1-S4 and Tables S3-S6. Results are generally comparable to those reported here, although reduced activity is apparent in the deOxyHb analyses due to expected factors such as noise and relative difficulty with signal detection. Second, noisy channels were removed automatically if the root mean square of the signal was more than 10 times the average for that participant. A principal component analysis spatial filter was used to remove global components caused by systemic effects assumed to be non-neural in origin (Zhang et al., 2017; Zhang et al., 2020; Zhang et al., 2016). For each run, a general linear model (GLM) computed by convolving the eye gaze task paradigm (Figure 1C) with a canonical hemodynamic response function was used to generate beta values for each channel. Group results based on these beta values were rendered on a standard MNI brain template (Figure 5). Second-level analyses were performed using t-tests in SPM8. Anatomical correlates were estimated with the TD-ICBM152 T1 brain atlas using WFU PickAtlas (Maldjian et al., 2004; Maldjian et al., 2003).

Wavelet Coherence. Coherence analyses were performed on the combined signals. Details on this method have been validated (Zhang et al., 2020) and applied to prior two-person interactive investigations (Hirsch et al., 2018; Hirsch et al., 2017; Piva et al., 2017). Briefly, channels were grouped into 12 anatomical regions and wavelet coherence analysis was evaluated between all groups across participants in a pair exhaustively. The wavelet coherence analysis decomposes time-varying signals into their frequency components. Here, the wavelet kernel used was a complex Gaussian ("Cgau2") provided in MATLAB. The residual signal from the entire data trace was used, with the activity due to the task removed, similar to traditional PPI analysis (Friston et al., 1997). Sixteen scales were used, and the range of frequencies was 0.1 to 0.025 Hz. Based on prior work, we restricted the wavelengths used to those that reflect fluctuations in the range of the hemodynamic response function. Coherence results in a range higher than 0.1 Hz due to non-neural physiologic components (Nozawa et al., 2016; Zhang et al., 2020). Therefore, 11 wavelengths were used for the analysis. Complex coherence values were averaged following previously established methods (Zhang et al., 2020). 

Cross-brain coherence is the correlation between the corresponding frequency components across interacting partners, averaged across all time points and represented as a function of the wavelength of the frequency components (Hirsch et al., 2018; Hirsch et al., 2017; Noah et al., 2020; Zhang et al., 2020). The difference in coherence between the In-person Face and Virtual Face conditions for dyads was measured using t-tests for each frequency component. Only wavelengths shorter than 30 seconds were considered as the experimental cycle between task and rest was 30 seconds. An analysis of shuffled pairs of participants was conducted to confirm that the reported coherence was specific to the pair interaction and not due to engagement in a similar task. The coherence analysis was a region of interest analysis targeting somatosensory and somatosensory association cortices in the dorsal visual stream (Figs 5A and B).

Files

README.md

Files (2.7 GB)

Name Size Download all
md5:ef858088172d30af4201cafff67c874a
104.9 MB Download
md5:346c378810029055c876545580d24d8c
104.9 MB Download
md5:1d2066b2532e31d13565a808d6ab2672
104.9 MB Download
md5:d9e2b1fbbbd6b65cdbfa559fb2c6b366
104.9 MB Download
md5:b33ab1e3a50fc50985080ff61c1573f6
104.9 MB Download
md5:4d39faa2d6b04d7f83c516cf43ed020b
104.9 MB Download
md5:22d498c0a818001cb33007e465e050b2
104.9 MB Download
md5:eaddd4fc9c2d8e3620842eb4d6bb49cc
104.9 MB Download
md5:11d63e9c8ad06079702f089a254b1105
104.9 MB Download
md5:9fea3121aaa37021326f8d096f722184
104.9 MB Download
md5:fbc215b64f9cfa0fa10ff835a2421d27
104.9 MB Download
md5:a24a8fb30e251abd9156c025dc1a09c1
104.9 MB Download
md5:c991deef43917516d023e8a250b6464b
104.9 MB Download
md5:20f7369a8a998041cfd9a76c2af57438
104.9 MB Download
md5:64b998fae1c535d34a6198ffef6abe50
104.9 MB Download
md5:82b20b78574a3e7e82b663ac8f08ca9c
104.9 MB Download
md5:edb088f9aa5d2e575695c9a14438da0a
104.9 MB Download
md5:ce6f321d3efc27a5045b797033ddfded
104.9 MB Download
md5:e0f127bfd49213c5a2b5368385aea41f
104.9 MB Download
md5:e9198ec9eb4db5c746832a2b05d6ab25
104.9 MB Download
md5:2bceaef83a668816754ab7f9b82c50e4
104.9 MB Download
md5:f299f9a5ff9d0915ca90bfc45c639564
104.9 MB Download
md5:ba5c5add053e07e4dfc69cb53a1a68e8
104.9 MB Download
md5:6c8365e294c3b9a6b9d9c9c0549a3117
104.9 MB Download
md5:61a9870012b6cd04af976299301a384c
104.9 MB Download
md5:23874e5e17c0409b3af9e9b75e16c0c6
32.4 MB Download
md5:a762ccf3b56272cebf9c258a094f87cd
3.0 kB Preview Download

Additional details

Related works

Is cited by
10.1162/imag_a_00027 (DOI)