Different kinds of embodied language: A comparison between Italian and Persian languages

It is debated whether only concrete but also abstract, figurative sentences, e.g.: “She grasps the cup” vs. “She grasps the concept”, are grounded in the sensorimotor system. Importantly, studies on sentences with action verbs and motor system activation have been conducted so far only with WEIRD samples (Western cultures, in North American and European countries). The aim of our work is to investigate the relationship between language and motor responses using both concrete and abstract sentences in Italian and Persian languages. In the present study, Italian and Persian participants were asked to read the sentences on the screen. The sentences referred either literally or metaphorically to motor actions. They were accompanied by a video displaying a movement that could be congruent or incongruent with the one described in the sentence. Participants were asked to re-execute the movement observed and subsequently they had to perform the task evaluating whether the sentence made sense or not. In the Italian sample a strong effect of concreteness was present, especially in the congruent but also in the incongruent condition. In the Persian sample, instead, there was an inhibition effect of congruent trials, particularly with concrete sentences, and in the incongruent trials no difference in RTs between abstract and concrete sentences was present. Results indicate that cross-cultural differences have to be taken intoand concrete sentences was present. Results indicate that cross-cultural differences have to be taken into account when investigating the relationship between language and action.


Introduction
Do people of different cultures comprehend action related language in a different way?
The fact that action words and sentences recruit sensorimotor information is quite an established finding. Many consolidated research lines address this issue. Here we will focus on three of them, i.e. the studies on the relationship between action verbs and involvement of effectors, the studies on the Action Sentence Compatibility (ACE) effect, and the studies on the spatial interference effect. Importantly, most studies within these research lines were conducted with Western participants. We will now briefly describe these research lines, and propose a study in which action-language integration is investigated with a cross-cultural approach.

Action verbs and effectors
When we process action verbs and sentences entailing action verbs, we implicitly activate the effector to which the words refer. Seminal EEG and fMRI studies have demonstrated that different areas of the brain are activated when reading verbs referring to different effectors, such as 'kick', 'lick', 'pick' (Hauk, Johnsrude, & Pulvermüller, 2004;Pulvermüller, Härle, & Hummel, 2001;Tettamanti et al., 2005), and that part of the brain is activated in a somatotopic way. Results obtained with transcranial magnetic stimulation (TMS) are contrasting. With a combined TMS and behavioral study Buccino et al. (2005) found a decrease in amplitude of MEPs recorded from hand muscles while listening to hand-action-related sentences (e.g. s/he sewed the skirt), and from foot muscles when listening to foot-related sentences (e.g. s/ he kicked the ball). This finding is quite robust, even if not always replicated (Gianelli & Dalla Volta, 2015). Other TMS studies (Oliveri et al., 2004;Pulvermüller, 2005) and behavioral studies show a facilitation: for example, Scorolli and Borghi (2007) found a facilitation when reading sentences related to the foot and to the mouth and concurrently producing a response with the implied effector. Behaviorally, reading the sentence "he/she sewed the skirt" yielded longer response times with hand than with foot responses system, while the opposite was true for sentences such as "He/she kicked the ball"; interference was also found by Sato, Mengarelli, Riggio, Gallese, and Buccino (2008), but only with tasks implying deep semantic processing and not with more shallow lexical decision tasks. Overall, the most consistent pattern of results seems to show an early interference (but Gianelli and Dalla Volta (2015) and Pulvermueller (2005) found an early facilitation) in case of congruency between the effector implied by the verb and the one used to respond, and a late facilitation (but Sato et al., 2008, did not find it). In any case, all results converge in showing the action-language cross-talk. In recent work Miller, Brookie, Wales, Wallace, and Kaup (2018) performed 8 experiments in which they combined behavioral with ERPs tasks. They found a facilitation in RTs in case of congruency between the effector implied by the sentence and the one involved to respond, but the ERP (Event-related Potentials) analyses showed that ERPs differed for hand versus foot movements, but not for hand-versus foot-associated words. This invites to be cautious as it might suggest that language-related compatibility effects on RTs might emerge before action processing, hence might not be determinant for language comprehension. Glenberg and Kaschak (2002) were the first to demonstrate the Action Sentence Compatibility (ACE) effect: when participants process sentences referring to a movement away from or toward their body (e.g. "open/close the drawer"), responses are facilitated in case of congruency between the action implied by a sentence and a real movement, away or toward the body, performed to respond. The typical ACE task is a sentence sensibility evaluation one, in which participants are required to decide whether the sentence makes sense or not -the effects hold however with a variety of slightly different tasks, such as evaluation of words (part vs. no part), and sentence sensibility evaluations (Borghi, Glenberg, & Kaschak, 2004;Kaup, Lüdtke, & Maienborn, 2010), and for a variety of linguistic phenomena (e.g. sentences, words, sentences describing a state, (Kaup et al., 2010). The ACE effect is not exempt from criticism: recently Papesh (2015) has questioned the strength of the ACE effect, showing though Bayesian analyses that only a minority of the published results offer strong evidence for the ACE; one of the problems pointed out by the authors is that some studies lead to a facilitation and some to an interference effect.

Spatial interference effect
When visual stimuli instead of only words are presented, a reversed effect is obtained, i.e. no facilitation but an interference is reported. Estes and Barsalou (2018) found the spatial interference effect, an effect of attention on language, showing that processing words with spatial associations (e.g., "bird", "hat") can reduce the capability to identify an unrelated visual target (e.g., X) at the implied location (i.e., at the top of a display). Even if the reliability of the effect was criticized, Estes and Barsalou (2018) reported a meta-analysis of 37 studies indicating that the effect was reliable. They collapsed studies using single words or sentences, in which the task was to detect the visual target and the target was unrelated to the words.
Overall results from the three lines of research we have outlined reveal that the meaning of the language is intimately tied to the actions it refers to. However, the results obtained are not always coherent. There is certainly a crosstalk between language and action, but some studies report that language processing interferes with action execution while others find a facilitation (notice that similar facilitation and interference effects are also found in the action observation literature (Brass, Bekkering, & Prinz, 2001). Different explanations of these contrasting results have been advanced. The most common is based on timing (e.g. Borreggine & Kaschak, 2006;de Vega, Moreno, & Castillo, 2013), and it has been explained with a model (Chersi, Thill, Ziemke, & Borghi, 2010) that predicts interference when action and language are simultaneously processed, and facilitation in case of delayed processing. Another explanation, which is not necessarily mutually exclusive with the first, points to the ease of integration between the sentence meaning and the motor action/visual stimuli: when the integration is scarce, then interference occurs (Kaschak et al., 2005).

Action verbs in literal and figurative sense
While numerous studies have demonstrated a tight relationship between language and motor system, only a small subset of studies have investigated whether action sentences still recruit sensorimotor information when used in an abstract sense. For example, if we hear or read the sentence "He does not grasp the concept", do we still activate the grasping movement? The more relevant to our aims are studies that investigate grounding of figurative language, and particularly those that concern non-idiomatic, novel metaphors/abstract usage. Most results show that also figurative sentences involve areas engaged in processing concrete sentences, and thus are grounded in the sensorimotor system (e.g. Boulenger, Hauk, & Pulvermüller, 2009;Boulenger, Shtyrov, & Pulvermüller, 2012;Desai, Binder, Conant, Mano, & Seidenberg, 2011;Saygin, McCullough, Alac, & Emmorey, 2010;Wallentin, Østergaard, Lund, Østergaard, & Roepstorff, 2005). There are, however, inconsistencies: some neuroimaging studies found motor/ premotor activation for literal but not for figurative sentences (Aziz-Zadeh, Wilson, Rizzolatti, & Iacoboni, 2006;Raposo, Moss, Stamatakis, & Tyler, 2009). Specifically, Desai, Conant, Binder, Park, and Seidenberg (2013) showed that motor activation increased at the reduction of abstractness and conventionality of sentences. At a behavioral level, Scorolli et al. (2011) investigated response times using simple sentences resulting from combinations of concrete and abstract verbs/nouns (e.g. think of/caress the dog/idea). Mixed combinations led to longer response times, especially when the concrete word preceded the abstract one: this shifting cost can be ascribed to the hypothesis that concrete and abstract words are processed in parallel systems, one more linguistic and the other more sensorimotor, in line with the hypothesis of the WAT (Words As social Tools) theory Borghi, Barca, Binkofski, & Tummolini, 2018); this hypothesis is supported by two further fMRI and TMS studies with the same stimuli (Sakreida et al., 2013;Scorolli et al., 2012).

Action-language cross-talk in different cultures
The aim of our work is to investigate the relationship between language and motor responses using both concrete and abstract sentences in two different languages, Italian and Persian. To the best of our knowledge, studies on the relationship between sentences with action verbs have been conducted so far only with WEIRD (Henrich, Heine, & Norenzayan, 2010a, 2010b participants, i.e. participants of Western cultures, in North American and European countries. The only exception we are aware of is a study by Dennison and Bergen (2010) that showed that social and cultural practices influence the way people represent action language. They focus on a social practice common in Korean culture, in which people tend to use both hands to give objects to people of higher social status. The authors manipulated the status of the recipient of the presented sentences, using sentences such as "You are now giving a letter to (your) professor / to (your) younger sibling". The acquired cultural practice is reflected in the ACE effect found: consistently with the acquired behavior, when presented with sentences referring to transfer of an object to high status recipients Korean participants were slower in bimanual responses, while unimanual responses were slower with sentences referring to low status recipients.
Even if this study shows the tight relation language-action in an Eastern culture, it does not compare Eastern and Western cultures using the same task. This is instead what we did in the present work.
Here, we used sentences that referred either literally or metaphorically to motor actions, with no directional information. They were accompanied by a video displaying a movement that could be congruent or incongruent with the one described in the sentence.
Participants were asked to reproduce the movement observed in the video, then they had to perform the task evaluating whether the sentence made sense or not. We wanted to render the task as much implicit as possible, hence we opted for asking participants to perform a sentence sensibility judgment rather than requiring participants to evaluate the congruency between the action and the sentence. Since the displayed actions could involve the hands, we asked participants to respond by pressing a pedal when the sentence made sense and to refrain from responding when it did not: it was thus a go-nogo paradigm. Response times and accuracy were recorded. The reason why we decided to show participants a video and ask them to perform the observed movement is that we intended to boost motor activation, in order to verify its relationship with language processing. To investigate whether the relationship between language and movement was the same across the two different languages/cultures, we submitted the same task to Italian and Iranian participants. Notice that these two cultures are likely not extreme in their Western/Eastern characteristics: for example, Italian culture is less individualistic than US culture (Henrich, Heine, & Norenzayan, 2010b), Iranian culture likely differs from East-Asian cultures on a variety of dimensions. Evidence has shown that belonging to a Western or an Eastern culture influences a variety of cognitive processes, starting from perception to decisional processes (review in Henrich et al., 2010b): Asian people tend to perceive the environment in a more holistic way, Western cultures are generally more analytic (Nisbett & Masuda, 2003); sense of agency differs as well -Western people tend to perceive events as the outcome of a choice more often than Eastern people (Savani, Markus, & Conner, 2008). Similarly to other Asian cultures, Iran can be considered as a collectivist culture (Hofstede, 1980), as testified by the presence of extended families, by the important role played by the ingroup, by the strong sense of national belonging, and also by some linguistic habits such as the frequent use of the pronoun "we" instead of "I". Because of the strong differences between typical WEIRD cultures and Iranian culture, we intended to test whether the compatibility effect, found in WEIRD cultures, extended also to an Iranian sample. Lei/Lui coglie un fiore S/he picks a flower S/he picks a flower 3 Lei/Lui afferra la tazza S/he grasps the cup S/he grasps the cup 4 Lei/Lui tira la corda S/he pulls the rope S/he pulls the rope 5 Lei/Lui impugna un'arma S/he holds a weapon S/he holds a weapon 6 Lei/Lui devia la palla S/he diverts the ball S/he diverts the ball 7 Lei/Lui nutre il figlio S/he feeds the son S/he feeds the son 8 Lei/Lui scardina la porta S/he unhinges the door S/he unhinges the door 9 Lei/Lui stringe una spugna S/he squeezes the sponge S/he squeezes the sponge Italian abstract sentences Sentences Literary meaning Meaning 1 Lei/Lui accarezza un'idea S/he caress an idea S/he has an idea 2 Lei/Lui coglie un'occasione S/he picks an occasion S/he takes an occasion 3 Lei/Lui afferra un concetto S/he grasps a concept S/he grasps a concept 4 Lei/Lui tira le conseguenze S/he pulls the consequences S/he draws the consequences 5 Lei/Lui impugna una sentenza S/he holds a judgment S/he takes the issue 6 Lei/Lui devia il discorso S/he diverts a speech S/he meaders in a topic 7 Lei/Lui nutre un dubbio S/he feeds a doubt S/he has the doubts 8 Lei/Lui scardina un'accusa S/he unhinges an accusation S/he thwarts an accusation 9 Lei/Lui stringe un'amicizia S/he squeezes a friendship S/he holds a friend Hypotheses. Based on the reviewed literature, we advanced the following hypotheses: First, we predicted that there would be a difference between congruent and incongruent trials, likely leading to a facilitation of congruent over incongruent trials since the timer started after the sentence was presented.
Second, we predicted that action sentences would elicit faster responses compared to abstract ones, in line with the previously discussed results on figurative sentences and on the well-established concreteness effect, showing that concrete words/sentences are processed faster and recalled better than concrete ones (Paivio, 1990).
Third, and more crucially, we advanced the (directional) hypothesis that the congruency action-video with the sentence would be perceived as stronger in the case of concrete sentences, leading to a greater difference in response times between concrete congruent and incongruent trials than between abstract congruent and incongruent trials. As to the cultural and linguistic difference, we advanced two hypotheses, hypothesis four and five. Fourth, we intended to investigate whether the congruency effect we expected to replicate in the Italian sample would be extended or not to a different culture. Finally, we intended to investigate whether using a different language has a different effect with responses to concrete and abstract sentences. We expected the difference between the two languages to be more marked with abstract than with concrete sentences, in line with the idea that linguistic experience is more influential on abstract than on concrete sentences processing (Borghi et al., 2018.

Sample
Thirty-five Iranian and thirty-eight Italian participants took part in this experiment. All Iranian participants were native Persian speakers (20 females, all but 2 right-handed, mean age 30.5, st dev 3.71 range 25-40), had left Iran for less than 8 years, and were now living in Rome (mean age of permanence in Italy 4 years) They were recruited in dorms of Iranian students, or in Iranian meeting points in Rome. Out of the 35 Iranian participants, only 5 spoke fluent Italian. All thirty-eight Italians were Italian native speakers (19 females, all but 2 right-handed, mean age 26.6 st dev 4.15range 20-38). All participants had normal or corrected-to-normal vision. The study was in accordance with the Declaration of Helsinki and approved by the local ethical committee.

Stimuli and procedure
For the Persian sample, stimuli consisted of fourteen sensible sentences in Persian (see Table 1 Panel A). Each sentence was composed of a third-person subject, a transitive verb and a concept noun. In seven sentences, the verbs were used with a concrete meaning, in the other seven the same verb were used in a metaphorical sense. For example, the verb "to pull" was used for the concrete sentence ""Kif ra roye zamin ke/id/ She pulled the bag on the floor" and for the abstract sentence "Xane ra be atae/ ke/id/She set the house on fire". Additionally, we combined the verbs with the nouns to create seven meaningless sentences e.g. "t/aekko/ ra ruye daerya ke/id/ She pulled the sea". The word order of Persian sentences is Subject + Object + Verb, which is the unmarked word order in Persian. The tense is simple past as it contains the simplest process for a thirdperson subject with no extra affix added to the verb. Moreover, as both Persian and Italian languages have the capacity to be meaningful with no subject in the beginning of sentences (pro-drop parameter in linguistics), the subject is null on the surface. The sentences were selected by asking to 14 Persian people (age range of 25-43) to rate them through a seven point Likert-scale according to a list of parameters, useful to determine the degree of abstractness of the sentence (Villani, Lugli, Liuzza, & Borghi, 2019): concreteness vs abstractness, imageability (Paivio, 1986), emotionality (Ponari, Norbury, & Vigliocco, 2017), age of acquisition (Barca, Burani, & Arduino, 2002), modality of acquisition (Marschark & Wauters, 2003), body-object interaction (Tillotson, Siakaluk, & Pexman, 2008), social metacognition (Borghi et al., 2018), quantity of motion, and perceptual strength (Connell & Lynott, 2012;Lynott & Connell, 2013). The abstract/concrete sentences were compared for different parameters: abstractness/concreteness   Table 2). Three-second video clips showing the right-hand of an actor performing the action coupled with an object located on the table e.g. ("to pull the cup on table/to pour the tea/to throw trash") were recorded. In total there were seven video clips that represented the action verbs. The action verbs were "to screw", "to take", "to throw", to pull", "to pick up", "to pour" and "to hit". The sentences could be congruent or incongruent with respect to the action observed in the video and the verb could have either a concrete or an abstract metaphorical meaning. When the sentences, either abstract or concrete, contained the action verb referring to the action observed in the video and they were sensible, the trials were categorized as congruent. On the contrary, when the sentence, either abstract or concrete, contained a verb referring to an action that differed from the displayed one, the trials were categorized as incongruent. No sensible sentences were entered as catch trials serving the scope to keep participants focused on the task. The fourteen sensible sentences were combined with the seven videos; we thus obtained seven congruent_abstract trials, seven congruent_concrete trials, seven incongruent_abstract trials and seven incongruent_concrete trials and seven trials consisting of no sensible sentences. In total seventy trials were randomly administered in two different sessions composed each one of thirty-five trials. For the Italian sample, stimuli consisted of eighteen sensible Italian sentences (See Table 1 Panel B) composed of a third-person subject, a transitive verb and a concept noun. Nine of them referred to concrete contexts e.g. "Lei afferra la tazza/She grasps the cup", the others nine were metaphorical sentences e.g. "Lei afferra il concetto/She grasps the concept", the remaining nine were meaningless sentences e.g. "Lei afferra il vulcano/She grasps the volcano". The sentences were selected by asking to 17 Italian people (18-51) to rate them through the same seven point Likert-scale above-mentioned for the Iranian group. The abstract/concrete sentences were compared for different parameters:  Table 2). The nine action verbs in the video clips were "to take", "to grasp", "to pet", "to feed", "to unhinge", "to hold", "to pull", "to divert", "to squeeze". The eighteen sensible sentences were combined with the nine videos in order to have nine congruent_abstract trials, nine congruent_concrete trials, nine incongruent_abstract trials, nine incongruent_concrete trials and nine trials with no sensible sentences. In total ninety trials were administered in a random order in two different sessions composed each one of forty-five trials. The experimental task was administered on a PC utilizing E-Prime software (Version 3). The participants sat at 60 cm from a 15 inches computer monitor in a dimly lit room. They were asked to maintain a comfortable position and to keep the feet on a pedal connected with the laptop through a Multifunctional response box Fig. 1. Experimental procedure: Participants were instructed to look at a fixation cross for 1000 ms. During a time window of 5000 ms participants were asked to perform the pantomime of the observed action. Immediately after, a sentence appeared on the screen and participants were instructed to press a pedal on the ground only when the sentence was sensible, otherwise they were asked to refrain from responding. (Chronos PST100430 model). Participants were instructed to look at a fixation cross that remained on the screen for 1000 ms, then the video started followed by a black screen lasting 5000 ms. During this time window, participants were asked to perform the pantomime of the observed action three/four consecutive times. Immediately after, a sentence lasting 3000 ms appeared on the screen and participants were instructed to press a pedal on the ground only when the sentence was sensible, otherwise they were asked to refrain from responding. All participants were informed that their response times (RTs) would be recorded and were invited to respond as quickly as possible while still maintaining accuracy (Fig. 1).

Results
The analysis was restricted to 4034 reaction times. The analyses we performed in the sentence sensibility task were restricted to the sensible sentence, the non-sensible sentences were not included because they were not informative for the experimental question, indeed their role was just to keep the attention on the task and to make the task implicit. Hence, all the errors we reported refer to false negative (thinking that the sentence is without sense when it is not the case). From the overall possible responses, we discarded 662 trials which were not given responses. Among all the errors, 493 (74.5%) involved abstract sentences, the remaining 169 (25.5%) were related to concrete sentences. Such percentage discrepancy clearly indicates that in the sensibility task the abstract sentences were perceived as more complex than the concrete sentences. Following a stepwise procedure, in the first model we included fixed effects for the variables Congruency, Category, Group and we added random intercepts for Participants and the Sentences nested in the Participants factor. In a second, third and fourth models, we excluded in the following order the variables: Congruency (congruent, incongruent), Category (abstract, concrete), Group (Italians, Persians), in order to investigate the main effect of each factor. In the fifth, sixth, and seventh model, we investigated the two-way interactions with the addition of all the fixed factors. Specifically, in model fifth we entered the interaction Category × Congruency, in the sixth model the interaction Congruency × Group and in seventh model the interaction Category × Group. We constantly kept in all the models the random intercepts for Participants and Sentences nested in the Participants factor. Finally, in eighth model we entered the three-way interaction Group × Congruency × Category. We compared the eighth model with all these models. Model eight was the one with the best fit (see Table 3), suggesting that the three-way interaction better explained the data. This model yielded a significant main effect of the Category (F (1,3215) = 14.6683, p = .0001) due to the fact that overall abstract sentences showed slower RTs (1481, SE = 38.3) than concrete sentences (1456, SE = 38.2) (Concreteness effect, Hypothesis 2). The main effect of the Congruency was also approaching the significance (F (1,3216) = 3.3403, p = 0.0677), congruent conditions (1486, SE = 38.2) were slower than incongruent conditions (1451, SE = 38.2). Both main effects are however qualified by the interactions. The 3-way interaction Group X Congruency X Category was significant (F(1,3215) = 7.1218, p = 0.0077) Fig. 2). In the Italian group, Tukey post hoc comparisons indicate that in the congruent condition, concrete sentences (1372, SE = 53.7) were processed faster than abstract sentences (1534, SE = 54.4), (t(3216) = 7.26 p < .0001). The same difference was present in the incongruent condition: concrete sentences (1433, SE = 53.8) were processed faster than abstract sentences, even if the effect was less marked (1505, SE = 54.2), (t (3216) = 3.23 p = .0273). In the Italian sample our hypothesis that a concreteness effect would exist was thus confirmed (Hypothesis 2).  included in the model. Shaded bands represent the confidence intervals (95%). Congruency condition: in the Italian sample concrete sentences were processed significantly faster than abstract sentences; in the Persian sample abstract sentences were processed significantly faster than concrete sentences. Incongruency condition: in the Italian sample concrete sentences were processed significantly faster than abstract sentences; in the Persian sample RTs of concrete and abstract sentences did not differ.
Moreover, Tukey post hoc comparisons show that, in keeping with our predictions, concrete sentences (1372, SE = 53.7) were processed faster in the congruent than in the incongruent condition (1433, SE = 53.8), although this difference was just approaching the significance (t(3216) = −2.923, p = .0685), suggesting a tendency towards a facilitation effect (Hypothesis 3).
All the other Tukey post hoc comparisons between the Italian and Persian groups were not significant (p > .2461).

Discussion
The results clearly show that action sentences are grounded in action, even if the extent of their grounding differs as a function of the abstractness level. More crucially, the marked differences between Iranians and Italians are food for thought and induce a reflection. In the next pages, we will first summarize the main results and the main differences between the two groups, then provide/attempt some possible explanations.
Before summarizing the data, a caveat. The fact that the Iranians who participated to the study had been exposed to the Italian language and culture might constitute a potential limitation of this work. However, all our participants were late bilinguals, only 5 of 35 spoke fluent Italian, and they frequented mostly Iranians. Furthermore, research on bilinguals show that they have a flexible linguistic organization (e.g. Athanasopoulos & Aveledo, 2012). When tested in their own language, participants tend to revert to the structures used in their own language. For example, Chinese bilinguals tend to use more taxonomic conceptual relations when they speak in English, more thematic ones when they speak in Chinese. In addition, age had a marked effect of the degree of cognitive shift toward the L2, as revealed by studies on time (Boroditsky, 2001), and all our participants arrived in Italy when they were adults. Based on these considerations, we think that the exposure to the Italian language and culture might not have markedly influenced the results. However, future research should shed better light on the possible fine-grained distinctions between the performance of Iranians living in Iran and Iranian living elsewhere.
Italians: The predicted congruency effect (Hypothesis 1) was not present/not significant per se but it was modulated by the interaction with Category in the three-way interaction. In line with our  . Group* Category interaction: in the Italian sample concrete sentences were processed significantly faster than abstract sentences; in the Persian sample abstract sentences were processed significantly faster than concrete sentences. expectations, evaluating the matching between the action and the sentence was easier for concrete than for abstract sentences (Hypothesis 2, concreteness effect); this advantage of concrete over abstract sentences was present both for the congruent and the incongruent condition, but was slightly more marked in the congruent condition, in line with our predictions (Hypothesis 3).
Iranians: The pattern of results of Iranian participants did not match that of Italians; surprisingly, it was quite different. First of all, a reverse congruency effect was significant, indicating that the congruent condition was slower than the incongruent one; furthermore, the advantage of concrete congruent trials we found in the Italian sample was not extended to the Persian culture. Hence, the hypothesis that the congruency and concreteness effects would be extended to the Persian sample was not confirmed (Hypothesis 4).
Furthermore, in the congruent condition abstract sentences were processed faster than concrete ones. This profoundly differs from what happens in the Italian sample: as Fig. 2 clearly shows, in the Persian sample the condition with slowest response times is that of concrete congruent trials, i.e. the faster condition in the Italian sample. Apparently, when reading a sentence, Iranians simulate the mentioned action; this simulation interferes with the actions they observe and perform, generating an inhibition in the congruent condition. This inhibition is particularly marked with concrete sentences, consistently with the fact that they are grounded to a greater extent in the sensorimotor system, but is present also with abstract sentences. Even if in the incongruent condition abstract and concrete sentences did not differ, through a visual inspection of the data we notice that the fastest RTs are elicited by incongruent abstract sentences, i.e. by the cases in which the sensorimotor system is less activated.
As to hypothesis 5, we did not confirm our expectations that the difference across the two languages would be more marked for abstract than for the concrete sentences.
Overall, the comparison between the two samples allow us to advance some considerations: (1) For Italians, it seems that language and action are quite integrated, while for Iranians they are processed separately. The inhibition found in the Iranian sample might reflect an overload due to the fact that the motor system is activated by the video/movement and that this exerts a priming effect on the motor activation evoked by the sentence. (2) The distinction between concrete and abstract sentences seems to be more marked in Italian than Iranian participants. In the Italian sample we found a facilitation with concrete sentences in both the congruent and incongruent condition, and an opposite pattern with abstract sentences; in contrast, in the Persian sample the 3-way interaction shows that in the congruent condition abstract sentences were processed faster than concrete ones, while in the incongruent condition no difference was present between concrete and abstract sentences. The higher continuity between concrete and abstract sentences can be owing to the fact that Persian is more a literary/metaphoric language (poems, literature, etc.), and also that abstract sentences in Persian seem to be more embodied than in Italian, i.e. concrete and abstract sentences seem to rely on a common scheme to a larger extent than in Italian.
Seyed-Gohrab (2012), referring to the role and importance of metaphor and figurative speech in the Persian language from the very old times declares "metaphors are the heart of Persian poetry. They are used for a wide range of purposes in different genres". He argues that the poet's professional survival rests on their ability to contrive original metaphors within the established literary conventions. As metaphor is very important in poetry for Persians, it has entered in the ordinary life of people, since poetry is considered one of the most significant artistic achievement of Persia (e.g. Yarshater, 1962). You easily can trace variations of everyday talk of people in poems of old famous poets such as, Khayam (1048-1131), Rumi(1207-1273), Hafez (1325-1389), who are not only famous in Iran but internationally appreciated. In sum, some metaphoric expressions have entered in the life of Iranian people and have become independent from the concrete meaning they originally are extracted from. You can even trace the Persian sentences of the experiments of this paper in the poems of Hafez or Rumi. For instance, once hearing the sentence "Xane ra be atattke/id/" (literary "to pull the house on the fire") which means to set the house on fire, Persians immediately think of the action of burning, not of pulling.
A further reason of the differences in the factor Category can be owing to differences of Perceptual Strength across the two languages. Notice that in the two groups there was a difference in Perceptual Strength: abstract and concrete sentences differed only for touching in the Persian set while they differed also for vision in the Italian set. To verify whether differences in Perceptual Strength across the two languages might affect the results, we run mixed model analyses. The analyses are reported in the Appendix. Overall, our results indicate an important role of perceptual strength, in line with the hypothesis of Connell and Lynott (2012). Across the two groups we found that an increase in perceptual strength led to faster response times. However, perceptual strength differently influences the two languages: it loaded more on concrete items in the Persian sample, more on abstract items in the Italian sample. This different role of perceptual strength might thus contribute to explain the differences in the concreteness effect, which was present in the Italian sample but not in the Persian one (See Appendix).
It remains to be explained why with concrete congruent sentences we find an inhibition in Persian but a tendency towards facilitation in the Italian sample. The reason underlying contradictory results -interference and facilitation -has been extensively debated in the literature on action-language integration. A possible reason underlying such contradictory results lies in the timing: interference seems to occur between 160 and 500 ms after stimulus presentation, whereas facilitation becomes evident between 550 and 800 ms after sentence appearance (Borreggine & Kaschak, 2006;Boulenger et al., 2006;De Vega, Robertson, Glenberg, Kaschak, & Rinck, 2004). The model by Chersi et al. (2010), based on a network with a chain organization, explains the divergent results on language and effectors in terms of timing of motor chains activations. The early interference is due to the fact that the motor system is activated concurrently by the sentence and by the movement; the late facilitation is another manifestation of the same process, i.e. an aftereffect of this early overload of the motor system (Chersi et al., 2010). Results found by Liepelt, Dolk, and Prinz (2012) are convergent with this model: participants were required to execute a hand opening or closing action in response to the color (blue or red) of the words "open" and "close". Seeing the word automatically evoked a gesture, leading to an advantage of congruent trials (word open-opening gesture). The same results were obtained in further experiments in which participants were required to say "open" or "close" in response to a green or a red cue displayed above a human hand performing either an opening or a closing action. The results matched those of the first experiment: congruent trials are faster than not congruent ones, consistently demonstrating that there is a bidirectional crosstalk between language and motor system. A neutral condition was then added, in which a hand remained stationary during the task. Results confirm this showing that semantically non-corresponding action word pairings lead to interference, consistently with the model by Chersi et al. (2010) that predicts interference when action and language are simultaneously processed, and facilitation in case of delayed processing.
In our study, the contradictory results cannot be ascribed to overall timing, since the exposure time is the same for Italians and Iranians.
Two possible explanations of the contradictory results are possible. The first reflects a linguistic difference between Italian and Persian. In the Persian sentences the verb is placed at the end of the sentence, in contrast in Italian it is located immediately after the subject. Such grammatical difference could have contributed to create in the Persian sample a shorter temporal interval between the simulation of the action triggered by the sentence and the motor response than in the Italian sample. Dennison and Bergen (2010) provided this explanation for the interference result they found with an ACE paradigm in Korean. We are not incline to favour this explanation because we considered the congruency between the simulation induced by the video and that elicited by the sentence -in this case, in the Persian sample the distance is longer. Furthermore, Dennison and Bergen used an ACE paradigm in which sentence and motor responses were concurrent, while we did not.
A plausible explanation of the inhibition we found in the Iranian sample is owing to the fact that information derived from language and from vision/action are scarcely related and need to be integrated online, generating an overload. In this respect, Estes, Verges, and Adelman (2015) provides an explanation of spatial interference that can be useful for us. They asked participants to identify related or unrelated visual targets above or below a cue word that had high or low spatial associations (e.g. cloud vs. puddle). They found a facilitation when the target was related to the cue, and interference when they were not related. A similar explanation has been offered by Kaschak et al. (2005), who ascribed the interference effect found when participants listened to sentences and simultaneously observed black and white displays moving in the same/opposite direction to a different degree of "integratibility", i.e. to the extent to which the perceptual stimuli can be integrated with the content of the sentence. For example, a picture of a car can be easily integrated with the content of the sentence "The car approached you," whereas the black-and-white stimuli used could not, thus generated interference. Hence, when the visual/motor stimulus match with the verbal one there is a facilitation, when they do not, or they cannot be easily integrated, then there is an interference. Following this interpretation, it seems that in Persian action and language are separately coded and need to be integrated online, requiring resources and time. Conversely, in Italian action and language are easily integrated online, thus we find a tendency towards facilitation in concrete sentences. But which is the reason of this difference between Italians and Iranians? One possibility is that it depends on factors related to the specificities of the two languages -for example, the writing direction is opposite (left-to-right in Italian and right-to-left in Persian), and as a consequence, there might be a different hemispheric actionlanguage lateralization in the two languages. Furthermore, the orthography is more transparent in Italian than in Persian. However, it seems unlikely that such inhibition/facilitation depends on merely linguistic factors. The most plausible explanations we came up with relied on the gesture-language system. The Italian gesture system is more structured and more integrated with language compared to the Persian one. A wide literature shows that, compared to other populations, Italians have a rich gesture vocabulary, and produce a lot of gestures in everyday communication (Colletta et al., 2015); notably, the Italian gesture system is particularly structured and includes many conventional gestures (De Jorio, 2000;Diadori, 1990;Efron, 1941;Kendon, 2004;Munari & Saglietti, 1994) (but see (Pettenati, Sekine, Congestrì, & Volterra, 2012) showing that this Italian specificity does not characterize children). As to Persian, using gestures has been for long banished. Nowadays in Iran, moving hands is accepted and even common, especially in academic spaces. However, until the last decades moving hands while talking was considered impolite as it was the sign of getting control and authority in a discourse. Elderly people would get annoyed to see a younger person move hands in a conversation with them and they hardly would do it themselves. Even nowadays, when people and especially men intend to show their respect for authoritative others, they put one hand on another hand and take them before their body. The tendency to avoid moving hands when talking with others or sitting beside people with higher authority could have roots in religious believes considering quotes from the Prophets and Imams that advise people to have control on their body to respect others. For instance, Imam Sadegh says "Do not look at your parents' eye unless with kindness; do not let your voice get higher than theirs; do not let your hands go upper than their hand; and never walk before them (Majlisi, 1986:79)." The governmental religious website, which is promulgating Islamic ways of communication, declares that according to Islamic etiquette you would better not to imply anything through your eye, eyebrow or hand when talking (Al-Mutlaq, 2017). Around two decades ago children were told to sit with their hands on their knees without moving in family gatherings, and in the classroom when teachers asked them to go in front of the blackboard to answer questions the polite position was standing with their hands on the side. In addition, when sitting on their bench or chair the polite position was putting their hands on knees or in folded arms style to prevent any movement. The habit to repress gestures and movements while talking can be the cause of the inhibition we found: the linguistic and action systems would be difficult to integrate, hence require more time when simultaneously activated.

Conclusion
We performed two experiments aimed at investigating the cross-talk between concrete and abstract action sentences and videos representing them in two different cultures/languages, the Italian and the Iranian one. The results confirm the tight relationship between language and action, stronger for concrete than for abstract sentences. Strikingly, the results reveal a marked cultural/linguistic difference: while observing a video and imitating an action inhibits concrete congruent sentences in the Iranian sample, it leads to a trend towards facilitation of the same ones in the Italian sample. The results can be ascribed to the higher integration between gestures and language in Italians, and to the tendency to avoid movement when talking widespread in Iranians. Overall, our results help to shed light on the possible causes of facilitation/interference-inhibition and suggest that taking into account of the culture to which participants belong is of paramount importance to understand in depth the mechanisms responsible for contrasting results.

Open practices statement
The row data for all analyses are available at: https://osf.io/w3m7t/ . To understand the possible role Perceptual Strength exerts on our results, we computed a normalized index for each sentence weighted on the range of values attributed to all the abstract/concrete sentences within each perceptual component (vision, touch, smell, taste, hear) in the entire sample. We averaged the weighted indexes in order to obtain a unique value of Perceptual Strength ranging from 0 to 1 for each Italian and Persian sentence. We adopted a stepwise procedure, as those already described in the manuscript. We maintained as random intercepts participants and sentences and we combined in 51 models all the possible combinations of the fixed factors and their two way/ three-way interactions with the covariate. The high number of models was obtained because we considered the two-interactions and the three-way interactions among all the factors and we eliminated from the model each factor following a stepwise procedure. We compared all these models with a full model expressed by the interaction of the Group, Congruency, Category and Perceptual Strength with the addition of all the fixed factors and as random effects the participants and the sentence. Such model resulted to be the best compared with the others. The model yielded a significant four-way interaction Group X Category X Congruency X Perceptual Strength (F(1,3210) = 4.52, p = 0.033). In order to explore the four-way interaction, we performed two separate analyses, one for each sample. In the Italian sample, following a stepwise procedure, we compared 15 models with Congruency, Category and Perceptual Strength as fixed factors and their interaction, again participants and sentences ware included as random factors. The best model was the full model expressed by the interaction Congruency X Category X Perceptual Strength. This model yielded a significant main effect of the Category (F(1,1961) = 60.87 p < .0001). Both the two-way interaction Congruency X Category (F(1,1961) = 8.857 p = 0.003) and the twoway interaction Perceptual Strength X Category (Fig. A1) were significant (F(1,1961) = 17.61 p = 0.006). Crucially, the three-way interaction Congruency X Category X Perceptual Strength was also significant (F(1,1961) = 7.39, p = 0.006) (Fig. B1). In the Persian sample, we followed the identical stepwise procedure adopted in the Italian sample, and we compared again 15 models. The best model was again the full model expressed by the fixed factors and their interaction Congruency X Category X Perceptual Strength; in which participants and sentences were included as random intercepts. This model yielded a significant main effect of the Category (F(1,433) = 9.21 p = .0025) and of the Congruency (F(1,1249) = 16.71 p < 0.0001). The two-way interaction Perceptual Strength X Category was significant (F(1,433) = 11.69 p = 0.0007) (Fig. C1) and the two-way interaction Congruency X Perceptual Strength resulted significant (F(1,1249) = 28.20 p < .0001) (Fig. D1) as well. Crucially, the three-way interaction Congruency, Category and Perceptual Strength was not significant (F(1,1249) = 0.235 p = .6278.