Architectural Form and Affect: A Spatiotemporal Study of Arousal

How does the form of our surroundings impact the ways we feel? This paper extends the body of research on the effects that space and light have on emotion by focusing on critical features of architectural form and illumination colors and their spatiotemporal impact on arousal. For that purpose, we solicited a corpus of spatial transitions in video form, lasting over 60 minutes, annotated by three participants in terms of arousal in a time-continuous and unbounded fashion. We process the annotation traces of that corpus in a relative fashion, focusing on the direction of arousal changes (increasing or decreasing) as affected by changes between consecutive rooms. Results show that properties of the form such as curved or complex spaces align highly with increased arousal. The analysis presented in this paper sheds some initial light in the relationship between arousal and core spatiotemporal features of form that is of particular importance for the affect-driven design of architectural spaces.


I. INTRODUCTION
Architects are required to consider three core Vitruvian criteria during the design process: the utility (i.e. function), the solidity (i.e. material) and the beauty (i.e. aesthetics) of the built outcome [1]. The contemporary approach to architecture largely follows two views on this topic: one defining function as the main form-giver [2], the other placing the human at the center of design [3]. This latter view gave birth to approaches aimed at understanding the human psyche and delivering affective experiences that promote well-being and quality of living [4]- [7]. However, quantifying the impact of architectural design on the human subject and its potential to deliver satisfying experiences is a great challenge, as the affective response to architectural design is subject to personal taste, values, cultural influences, aesthetics, and expectations.
To understand how design elements impact our affective response, a common practice in architectural design research is the identification, definition and adaptation of the parameters that comprise the built environment. By doing so, the relations between the features involved and the emotions they elicit could arise and be quantified. The capability of form to elicit The research leading to these results has received funding from the European Union H2020 Horizon Programme (2014-2020) under grant agreement 952002, project PrismArch: Virtual reality aided design blending crossdisciplinary aspects of architecture in a multi-simulation environment. emotions, however, still lacks sufficient evidence compared for instance to lighting and color [8].
Motivated by the lack of a comprehensive study on the impact of architectural form on emotions, this paper leverages methods and theories taken from environmental psychology, and best practices for affect annotation and modeling [9] to shed light on the spatiotemporal relationship between affect and form. In particular, the objective of this study is three-fold. First, to identify the most prevalent design features and spatial properties in the literature in terms of their impact on affect. Second, to create an affective corpus of spatial transitions and corresponding time-continuous annotations of arousal. Third, to analyze the impact of different design features on affect by treating both the design features and the annotations in a relative manner [10], [11]. This paper contributes to an extensive body of work on the impact of space, form, and color on affect, and is the first instance of time-continuous affect annotation on a spatiotemporal navigation task. The paper also proposes and tests two ways of treating arousal in a relative fashion, based on comparisons between mean arousal values in consecutive spaces and based on the degree of change of arousal during the arrival into a new space. Both treatments are able to capture strong effects between a room's curvature and perceptual complexity on arousal.

II. RELATED WORK
Studying the affective response of humans to the built environment is undeniably challenging, as it is arguably difficult to compartmentalize which elements are present or which elements draw our attention (and to what extent). Identifying the momentary emotional response that might be at play during the process of spatial evaluation is a field of study combining disciplines and theories coming from psychology, architecture, visual perception and cognition. Interestingly, the emotional effect of spaces has been well-studied in the domain of digital games [12], [13]. This is likely because it is easy to build many variations of the same space inexpensively, but also because games can elicit strong emotions through their gameplay, ingame events (e.g. enemies or sound effects), but also spaces.
The curvature of spatial elements has been investigated extensively in several studies, not only in terms of affective response but also as a parameter of objects' shape [14]- [16].
978-1-6654-0019-0/21/$31.00 ©2021 IEEE As a property of any shape, designers' preoccupation with curvature is linked in many ways to our potential dislike of angularity and a sense of perceptual threat that it conveys [3]. Curvature as a parameter of form has been investigated as a decorative feature in interior openings or in façade design [3], [17]. In [3], authors sought to study the potential link of artistic expertise to preference for curvature; 24 female participants were introduced to three different tasks-pairwise preference, multiple psychological variables ranking and preference ranking-in comparing four different types of façades (curved, mixed, angular and rectilinear) projected on a wall. Results from that experiment confirmed a preference towards curved and mixed properties of façade design. Rather than using selfreports, Shemesh et al. [18] coupled electroencephalogram (EEG) data of 42 participants with virtual spatial stimuli from within a VR headset. Results showed that there were consistent differences in terms of EEG responses between experienced and inexperienced designers to properties of curvature, irregularity and rectilinearity. A supplementary test based on rankings showed a preference for curved spaces by participants inexperienced to design; experienced participants however tended to prefer rectilinear spaces. Banaei et al. [19] compared two methods for measuring arousal in interior scenes, via the self assessment manikin (SAM) [20] and via EEG. SAM tests confirmed positive correlations between arousal and curvature and negative correlation with rectilinear and angular spatial elements. In another study using biofeedback, functional magnetic resonance imaging (FMRI) was used to study the effect of interior scenes with spatial properties related to curvature and rectilinearity, varying ceiling height and either open or enclosed arrangement [21]. In that study, 18 participants indicated their preferred scenes and on a second test chose which scene to approach or avoid during an FMRI scanning procedure. The study indicated a tendency towards curved over linear arrangements.
Ceiling height and volume of an interior space can convey feelings of freedom or confinement, and have been primarily investigated in studies regarding memory and attention. Meyers-Levy and Zhu studied the effects of ceiling heights on participants in delivering a series of tasks [22] within physical environments. In a first experiment, 32 participants responded to six Likert-scale questions-reflecting on freedom-related feelings and confinement-related body states-and performed tasks of solving 12 anagrams. In a second experiment, 100 participants performed a categorization task and a product evaluation task. Results validated the authors' hypothesis that higher ceilings result in a higher feeling of freedom; moreover, low ceilings prompt relational (item-specific) processing, while high ceilings prompt abstract ideation. The capacity for larger volumes to trigger a higher perceived arousal is explained by Niedenthal [12] where the experience of awe is conveyed by the Gothic setting of churches and castles in the game Resident Evil 4 (Capcom, 2005). The change of volume either from low-ceiling rooms to higher or the opposite is a tool that in many cases is employed by video game designers and architects in order to accentuate a change in spatial relationship, usually from a transitional "no-space" to a space; this process is identified as an arrival [23].
Lighting and its effect on human experiences has been frequently tested within virtual environments [17], [24] or through rendered static images [25]. Illumination parameters such as brightness, color and luminance distribution are studied frequently [13] in terms of affective responses, preference and attention tasks [24], [26]. The effects of illumination is also well-studied in video game research [27], [28], primarily regarding player performance and gameplay. In multiple studies [24], [29] the impact of ambient light color was studied in terms of completion time for a maze navigation task. In both studies, blue ambient color resulted in longer completion times compared to red ambient color, although the completion times under neutral light (compared to the other two treatments) differed between the two studies. Regarding the impact of color on affect, Joosten et al. [13] investigated the effect of ambient lighting color on the player's self-reported pleasure, arousal and dominance [30] levels during the completion of a task for a custom-made module in the game Neverwinter Nights (Bioware, 2002). Responses from 60 participants, each conducting three playthroughs of four rooms per playthrough, showed that red ambient color scored highest in terms of arousal and yellow scored highest in terms of valence among experienced players. This study highlighted the importance of color and perceived affect, and how it can be influenced by the players' level of experience.
Several studies explore more than one design feature in tandem. Ergan et al. [31] examined how certain spatial features could stand out from their surroundings and affect the human experience. Through dual image comparison of generated scenes, features were put in polar opposites and compared. Features explored in this study include daylight, openness of a space, ceiling height, level of artificial lighting, symmetry of interior elements and contour curvature. Based on participants' preferences, most impactful features were the access to windows, space openness, ease of access, flexibility in isolation and color of surfaces. It is worth noting that two of the top features relate to how open or cramped a space is perceived. Banaei et al. [32] sought to cluster different interior elements according their form and appearance. The authors collected a series of living room arrangements that followed different trends and defined 25 clusters that included features such as curvature, scale, location and angle. Results demonstrated a stronger presence of rectilinearity over curvature in the selected sample but also demonstrated the inability of some elements to be used for a categorization task.
The aforementioned studies highlight the current state of affective computing in relation to form and spatial affect. The reviewed literature shows a variety of perspectives, hypotheses, and ways of evaluating affect, but also considers a broad range of design parameters. This rich body of work, however, appears to lack the temporal dimension both in the treatment of data but also regarding how spatial stimuli are presented for annotation. The importance of the reactive nature of architecture and space appraisal is highlighted within this study and the method for eliciting and capturing such timesensitive data is discussed below.

III. EXPERIMENTAL SETUP
To explore the affective responses that different properties of architectural form and illumination may elicit, we conducted a user study with a broad range of spaces and spatial transitions. This section describes the elicitors (i.e. the video dataset of spatial navigation) and the annotation methods used for building our affective corpus. The video dataset and annotations (raw and processed) are available in a public repository 1 .

A. Features Explored
Four design features were selected based on their contribution to the appearance of the space, either in relation to the room geometry or ambient lighting.
In terms of room geometry (form), we explore the effect of room contour curvature, interior complexity, and the room's size. Each of these properties were either present or absent in each room. In the case of room size, low ceilings represent small size (absence) and high ceilings represent large size (presence); Figure 1 shows all possible room sizes. Contour curvature is explored as a potential contributor to arousal, compared to rooms that are defined by rectilinear contours. Different combinations of curvature and size affect the room layout differently: small size and high curvature result in cylindrical rooms (Fig. 1c) while large size and high curvature result in a dome-like structure (Fig. 1d). Interior complexity introduces boundaries and obstructions during navigation: complex spaces have symmetrically placed columns and two walls in the middle of the room that obstruct both visibility and the path of the user navigating through the room.
Following the paradigm of previous studies on the effect of color in navigable spaces [12], we explore illumination with red, blue and white as the possible colors of each room. Light sources are distributed within each room so as to provide general illumination from multiple sources and remain the same for all rooms. In our analysis, we use color warmth (1 for red, 0 for white, -1 for blue) to track changes in illumination.
With three features of room geometry with two possible states each, and a color illumination feature with three states, the possible combinations of rooms are 24. Each room has the same exterior dimensions, consisting of 20 meters width by 20 meters depth. Depending on the size and curvature parameter combinations, heights range from 3 to 12 meters. The final rooms, from a first-person view, are shown in Figure 2.

B. The AffRooms Corpus
The goal of the experiment is to capture affect annotations during navigation and examine how each design feature impacts our perceptual arousal in a temporal manner. To increase the consistency of the annotators' experience, all annotators received the same elicitors in the form of pre-recorded footage of the first author navigating through a sequence of rooms. To provide a diverse set of spatial transitions from one room to the next, multiple sequences of rooms were produced and navigated through to provide multiple videos.
The sequence of rooms was generated randomly within Unreal Engine (see Fig. 3). In each generated sequence, all 24 possible rooms appear once. All 24 rooms are placed in a navigable sequential manner along a straight path, each separated with a sliding door. The player starts in a small entrance room, before they pass through the first sliding door to view the first room. For each recorded playthrough we ensured that while navigating and capturing, the camera's field of view would capture each room's distinct features. During the navigation tasks, the player's position and angle of the camera viewport were periodically logged, as was the time stamp when the player entered a new room and the type of features of each room. This allows us to align the arousal annotations and match them to the design features, as discussed in Section III-D.
Twenty random sequences of rooms were generated and navigated, resulting in 20 playthrough videos that capture a diverse set of spatial transitions. The average duration of each video is 186 seconds (ranging from 164 to 240 seconds).

C. Annotation of Arousal
This study uses a continuous and unbounded method to capture annotators' reactions while viewing a pre-recorded video of a spatial navigation task. Arousal data was collected through the PAGAN video annotation tool [33], which allows users to report changes in a single affect dimension while they watch a video. Annotation traces collected by PAGAN are aligned to the video frames. In this experiment, we use the RankTrace [34] annotation protocol, which allows users to define the degree of change of the affective dimension in an unbounded fashion. Figure 4 shows an instance of PAGAN during annotation. Users can control the degree of change through their mouse wheel.
Three participants with experience in PAGAN and Rank-Trace annotation protocols were recruited to annotate the 20  videos of the AffRooms dataset. All participants are research staff of University of Malta with expertise on artificial intelligence, affective computing and digital games. All participants are male, aged between 22 and 36. All appropriate consent was provided by participants; data was collected online, stored in the PAGAN database, and no personal data was retained. Annotation was done by participants remotely, with no interaction with the authors and outside of a controlled lab environment. The study follows the view of arousal as affect intensity rather than physiological activation. Each participant was given the following definition of arousal: Arousal in the context of spatial appraisal is defined here as the momentary amplitude of emotions elicited during this process. An environment characterized as having positive amplitude is an environment described as exciting, tense, stimulating, wakened and/or intriguing. An environment assessed with a negative amplitude in arousal is an environment evoking feelings of boredom, fatigue, flatness, tiresomeness, calmness and/or relaxation.
After the participant had read the arousal definition and some guidelines for interacting with RankTrace, annotations were performed in a single session. Each participant was presented

D. Processing Arousal
Since each annotation of a video is unbounded, as a first step the arousal traces were normalized to [0, 1] via minmax normalization independently (i.e. on a per-user, per-video basis). Since PAGAN does not have a specified sampling rate, the trace is resampled at 1000Hz. Figure 5 shows the normalized arousal annotations of the three participants on the same video. The annotation traces were processed in a relative fashion, in two different ways described below.
1) Mean Room Arousal: A straightforward way to assess the impact of each spatial feature is to split the video playthrough based on when the player enters a new room,  and average the annotated arousal per room. We identify the room time window as the interval from the moment the room is entered until the moment the next room in the sequence is entered. Figure 6 shows the different room windows for one annotator's trace. We average the arousal values within that window to calculate the room's mean arousal value. Comparing how changes in terms of each design feature match changes between rooms' mean arousal is a straightforward way to process the data as discussed in Section IV-A.
2) Arrival Windows: We recognize that the main affective response in spatial navigation is during "the experience of an arrival, the way in which you come into a space for the first time" [23]. We define arrival as the transition from one space to the next, and form an arrival time window around the time that a new room is entered in the video, including prior moments when the sliding door in the previous room opens and the new room is revealed. Therefore, considering t e as the timestamp (in seconds) that the player enters the new room's area, the arrival time window is between [t e −1, t e +2] seconds (see Fig. 6). For this analysis, we are interested in the gradient of the annotation trace during this 3-second arrival time window. The gradient is calculated as the change between consecutive time frames [10], [11], [35]. The gradient values of the entire trace are normalized to have an amplitude of 1 but the sign of the gradient is retained. Averaging the normalized gradient on the 3 seconds of the arrival window, we derive the arrival's mean gradient of arousal.

IV. RESULTS
This section explores the impact of a change in spatial features on the annotators' arousal levels. Following [10], [35], the analysis treats the annotators' traces and the design features in a relative fashion, observing how changes in the design features of consecutive rooms impacts the mean arousal of the next room compared to the previous one (Section IV-A) and how the moment-to-moment arousal levels change during the arrival at a new room (Section IV-C).

A. Changes in Mean Arousal
As a first experiment, we compare consecutive rooms in terms of change in each of the design features and in terms of change of the mean room arousal. Since room features are categorical, determining whether the feature changes is straightforward. In the 20 videos recorded (i.e. 23 transitions per video or 460 transitions in total), there are 219 changes in curvature, 238 changes in size, 325 changes in light color, and 243 changes in complexity between consecutive rooms. On the other hand, mean room arousal is a real value within [0, 1] and thus what constitutes a change between the mean arousal values of two consecutive rooms needs to be defined. Following the literature on treating affect as rankings [10], [35], [36], we determine an uncertainty threshold and if the absolute difference between mean arousal values of consecutive rooms is below this threshold then we consider that there is no change in arousal. For mean arousal, = 0.05, i.e. 5% of the value range of each arousal trace. We consider that arousal increases from the previous room (r 1 ) to the next (r 2 ) if m(r 2 ) − m(r 1 ) > , and that arousal decreases if m(r 2 ) − m(r 1 ) < − . We then check those transitions from one room to the next where a specific design feature changes, and if there is an arousal increase or decrease then we mark it as an arousal shift. If the design feature is absent in the previous room but is present in the next room (or the color warmth increases by at least one step) and arousal increases, then we mark it as agreement; if arousal decreases we mark it as disagreement. Similarly if the design feature is present in the previous room but is absent in the next room (or the color warmth decreases by at least one step) and arousal decreases, we mark agreement, and if arousal increases we mark disagreement. Through this simple process we enumerate the instances where the change in a design feature has a corresponding change in arousal. A high agreement ratio means that the presence of a design feature leads to higher arousal, while a ratio of arousal shifts over the total number of feature changes in transitions means that annotators have a reaction when this particular feature changes. Table I shows the agreements between mean arousal changes and design feature changes, per annotator. It is evident that some annotators were less prone to shift their arousal annotations between rooms (e.g. observing the overall arousal shifts of Annotator C). On the other hand, Annotator B is fairly consistent and the presence of every spatial feature is more often associated with increased arousal than not. Significance of the ratio of agreements versus disagreements is calculated based on the binomial distribution of all arousal changes when the spatial feature changes, assuming a 50% probability that the changes may be in agreement. Significance is established at 95% confidence. It is evident that different annotators provide traces with different degrees of granularity, while some extremes (e.g. 97% agreement in terms of complexity for Annotator A) raise some concerns discussed in Section V.

B. Inter-rater Agreement on the Changes in Mean Arousal
Based on their raw arousal traces, annotators are often in agreement (Crombach α = 0.717). However, we observe that patterns are less clear when we analyze how each annotator's trace seems to be impacted by changes in design features. We focus on those mean arousal changes where at least two annotators are in agreement (i.e. mean room arousal increases for at least two annotators, or decreases for two annotators). Calculating the instances where at least two annotators are in agreement, and matching them with changes between consecutive rooms, we retain 254 arousal shifts out of a total of 460 room transitions, which is a good sample for data analysis. Table I includes the agreements, disagreements and arousal shifts with each spatial feature change in consecutive rooms for instances where at least two annotators agree. It is evident that higher complexity and higher curvature leads to higher arousal, with warmer colors also coinciding with high arousal to a significant degree. Moreover, changes in curvature are more likely to result in a non-trivial change in arousal. This is surprising, as we expected that features of color would have a more noticeable effect than all features of form. Looking into the impact of different color transitions on mean room arousal change in this corpus, we found significant influences when the player entered a blue room from any other color room (resulting in decreased arousal in 85% of instances).
For completeness, we performed an analysis with those instances where all three annotators agreed in terms of mean room arousal changes. The results are included in Table I. As expected, agreements are even more pronounced, however the number of arousal shifts where all three annotators have a non-trivial increase or decrease is 49 out of 460. This means that the findings are rather circumstantial, and we lose most of the data for the sake of inter-rater agreement.

C. Impact of Arrivals
As noted in Section III-D2, the point when the player enters a room (i.e. "arrival") is expected to have a strong impact on the affective response. Calculating the average gradient within the 3-second time window during this transition between rooms, we check when the mean arousal gradient is positive or negative (ignoring values between 10 −4 and −10 −4 as ambiguous, determined empirically) matches the change in each design feature between the previous room (that the player exits) and the next room that they enter. Table II shows each annotator's agreements between arrival gradients and changes in design features, as well as on the filtered data where gradients had the same sign for at least two or all three annotators. Unsurprisingly, results follow a similar pattern as with changes in mean arousal, with annotator B showing significant impact of all design features on arousal change during the arrival time window and annotator C being more ambiguous in their annotations. Notably, however, annotator C has clearer patterns than with mean arousal changes. Moreover, the number of instances where arousal gradient was nonzero when a feature changes is increased over the times mean arousal changes (see "arousal shifts" entries in Table I). This may indicate that focusing on the arrival windows and their arousal gradient could provide more concise data, although we cannot discount other effects on data processing such as the different thresholding procedures for the two signals. Focusing on the instances where two annotators agree as the best consensus between sufficient data and inter-rater agreement, we observe that complexity and curvature of the space have a strong impact on arousal during the moments of arrival, with color warmth having a significant but less pronounced effect.

V. DISCUSSION
This first study explored how both the transitions between spaces and arousal can be treated in a time-continuous fashion. Results of Section IV indicate that the presence of curved forms or occlusion from complex structures leads to increased arousal. Moreover, by exploring the impact of blue, red, and neutral light we notice a more complex relationship between color and arousal. Treating warmth of the color as a linear variable, we see that warmer colors tend to lead to higher arousal; this is in line with extensive research on both cognitive psychology [37], [38] and digital spaces [24], [39]. Finally, we have explored two different ways of processing the continuous annotation traces in a relative fashion, either looking at the mean arousal value per room and comparing consecutive rooms' mean arousal values, or focusing on a shorter time window of "arrival" and tracking the relative change of arousal within that time window. Both methods capture the change in arousal and juxtapose it with the changes in terms of each design feature to find matches and mismatches. As expected, both methods of processing the continuous annotation trace lead to similar conclusions, indicating that they are both valid for processing similar elicitors where the shift from one stimulus (in this case, room) to the next is gradual.
Along with experimental findings, the paper's contribution is the AffRooms dataset which consists of 62 minutes of footage of 3D spatial navigation and encompasses a variety of design features of architectural form and light color. The fact that each video includes all 24 possible combinations of features allows us to give annotators an inclusive view of the possible stimuli. The fact that the videos are pre-recorded ensures that navigational style is consistent and allows for the inter-rater agreement analysis in Section IV-B. On the other hand, the long videos (each 3 to 4 minutes long) and the fact that annotators were not the ones in control of the navigation may have introduced some bias in the annotation traces. Due to the fact that (a) the spaces were often similar to each other (especially after the first few videos), (b) there was no goal or opposition to the player's navigation and (c) the annotators were watching another player navigate, it is possible that the annotations were based on cognitive rather than affective evaluation of the material. This seems to be true in the case of complex spaces for annotator A (see Table I) where a change in arousal matched a change in complexity 97% of the time: this indicates that early on the annotator developed a cognitive priming that complex spaces should be more arousing. The lack of purpose or context is also important: earlier studies [24] tasked players to navigate through mazes as quickly as possible, and thus the purpose of speed was explicit. In this case, the recorded navigation task aimed to provide a good view of each room while still keeping the duration manageable. In future work, more game-specific goals could be introduced when recording the videos, from e.g. finding a key inside the room in order to open the door to the next room, or avoiding patrolling enemies [40].
The reported study was exploratory in nature, recruiting few annotators to perform thorough and homogeneous annotation which allowed us to calculate inter-rater agreements on the entire dataset. The annotation task was time-consuming, running slightly over an hour, and thus would likely not scale well for many annotators and modalities. A follow-up study will produce shorter navigation videos (with 12 room transitions instead of 24) and recruit many participants via crowdsourcing platforms. Each participant would annotate a random subset of a video database. We also intend to explore other affect dimensions, such as valence, as time-continuous variables.
In terms of the elicitors, there is still a broad variety of features that can be explored; examples include sounds playing in each room, similar to [24], [40], walls' textures, more variants of complex rooms (e.g. with a central column or with clutter only along the edges of the room), etc. Among these, perhaps the easiest and most critical adaptation is the use of a more natural level of interior illumination, matching e.g. the Correlated Color temperature (CCT) of modern lightbulbs. The extreme saturation of blue and red light sources in the current experiment may explain why annotators were less consistent in terms of how light impacted arousal than for other features. While the colors used in this study are consistent with previous studies [24], [39], it is worth exploring whether the difference between e.g. candlelight (10 3 Kelvin in CCT) versus daylight (10 4 Kelvin) has a clearer impact on arousal levels.

VI. CONCLUSION
Inspired by the many studies on the impact of architectural form and light on affect, this paper introduced the AffRooms corpus of 3D spatial navigation videos and explored how annotators reacted to changes across four design features. Unlike previous work, this experiment used an unbounded continuous annotation method which provided granular information on the moment-to-moment arousal shifts. Through this protocol, we were able to extract the relative changes in arousal from one room to the next, as well as the change in arousal at the moment of arrival at a new room. This initial study assumes only short-term memory on the part of the annotator and only compares the features of the previous room with the features of the next room. Results show that, although arousal traces are expectedly diverging across annotators, overall certain features of the 3D spaces strongly influence the arousal levels of the viewers. Future work should explore more complex ways of treating the signals (e.g. assuming a longer memory window and comparing all rooms against all rooms) and expand the dataset with more architectural features and reallife illumination.