Aspect-Invariant Sentiment Feature Learning: Adversarial Multi-task Learning for Aspect-Based Sentiment Analysis

Neural models with attention mechanisms have achieved remarkable performance in Aspect-based Sentiment Analysis (ABSA). In most previous studies, the information about aspects in sentences is considered important for the ABSA task and therefore various attention mechanisms have been explored to leverage interactions between aspects and context. However, some sentiment expressions carry the same polarity regardless of the aspects they are associated with. In such cases, it is not necessary to incorporate aspect information for ABSA. In fact, in our experiments, we find that blindly leveraging interactions between aspects and context as features may introduce noise when analyzing those aspect-invariant sentiment expressions, especially when facing with limited aspect-related annotated data. Hence, in this paper, we propose an Adversarial Multi-task Learning Framework to identify the aspect-invariant/dependent sentiment expressions automatically without requiring extra annotations. In addition, we use a gating mechanism to control the contribution of representations derived from aspect-invariant and aspect-dependent hidden states when generating the final contextual sentiment representations for the given aspect. This essentially allows the exploitation of aspect-invariant sentiment features for better ABSA results. Experimental results on two benchmark datasets show that extending existing neural models using our proposed framework achieves superior performance. In addition, the aspect-invariant data extracted by our framework can be considered as pivot features for better transfer learning of the ABSA models on unseen aspects.


INTRODUCTION
Aspect-based sentiment analysis (ABSA) aims at inferring the sentiment polarity of a specific aspect expressed in a sentence [20,21]  1 .For example, in the sentence 'The food is good, but the service is terrible', there are two aspects mentioned: 'FOOD#QUALITY' (corresponding to the aspect entity 'food') and 'SERVICE#GENERAL' (corresponding to the aspect entity 'service').Here, each aspect consists of an entity and an attribute.The sentiment polarity of the aspect 'FOOD#QUALITY' is positive while the sentiment polarity of the aspect 'SERVICE#GENERAL' is negative.
Thanks in part to the availability of copious annotated resources for some aspects, recent attention-based models can effectively distinguish the sentiment polarities of different aspects in the same sentence [9,18,41].Despite remarkable progress made in ABSA, most existing work only focused on how to extract contextual sentiment information for specific aspects [8,23,27,32,45], or learn aspectdependent features for sentiment classification [14,16,28,40].However, in some cases, polarities carried by sentiment expressions are aspect-independent and blindly incorporating the associated aspect information may confuse the sentiment classifier, especially when facing with limited aspect-related annotated data.Hence, we argue that separating aspect-invariant sentiment expressions from aspectdependent ones could potentially lead to improved ABSA results.To be best of our knowledge, there is no prior work focusing on identifying the aspect-invariant and aspect-dependent sentiment expressions with no supervised information.
To separate aspect-invariant sentiment expressions from aspectdependent ones, we have the following observations: • If a sentiment expression is aspect-invariant, we can simply replace its associated aspect entity with another one without inverting the polarity.In this way, it is possible to automatically generate synthetic training instances to augment the training data.• If a sentiment expression is aspect-dependent, then simply replacing its aspect may change its polarity or derive a noisy sample.We could train a discriminator by adversarial training to identify those synthetic training instances and the original training examples.
To illustrate our idea, we give examples shown in Figure 1 where some sentence examples are paired with their corresponding aspects and polarity labels.In Figure 1(a), suppose there are no training examples for the aspect 'LOCATION#GENERAL' in the training set, in such cases, many existing sentiment analysis systems may fail to detect the aspect-specific sentiment polarity of 'Oh yeah the view was good too.' enclosed in the green box.However, there are training examples for other aspects such as 'SERVICE#GENERAL' which share a similar sentiment expression as shown in the sentence enclosed by the red box.We could generate a synthetic training example enclosed in the blue box by replacing the word 'service' with the word 'location'.With the augmented training data, models could better capture such aspect-invariant sentiment features and thus generate better ABSA results.However, for sentence containing aspect-dependent sentiment expressions, we cannot generate synthetic training examples by simply replacing the aspect entity The service was good too or attribute with other aspect entities or attributes as the resulting polarity may be inverted.In Figure 1(b), the sentiment word 'high' is positive for the aspect 'FOOD#QUALITY' but negative for 'FOOD#PRICES'.In some other instances, simply adopting aspect-dependent sentiment expressions to generate synthetic training examples may derive noise, such as the sentence enclosed by the dotted red box in Figure 1(c).Hence, separating the aspect-invariant sentiment features from aspect-dependent ones should be considered for improving the performance of ABSA.
In this paper, we propose an adversarial multi-task learning framework to extract aspect-invariant sentiment features, and distinguish aspect-invariant sentiment expressions from aspect-dependent ones by adversarial training.We first generate fake cross-aspect samples by replacing the aspect entities in the original training samples with other aspect entities but keeping the original polarity labels unchanged.If the original training sample contain aspect-invariant sentiment expressions, then the polarity label of a synthetic example should be correct and the discriminator would not be able to distinguish between the real training example and the fake example.If the original sample contains aspect-dependent sentiment expressions, then it is likely that the polarity label of the fake example would be wrong and the discriminator can easily identify the fake example.In this way, we can use adversarial training to obtain a large number of aspect-invariant and aspect-dependent training instances.For different types of sentiment expressions, the representation composition is controlled by a discriminator-based gate to capture sentiment features better by judging if the aspect-based representation is needed or not.Hence, the ABSA performance can be improved by augmenting training examples containing aspect-invariant sentiment features, especially for those aspects with limited training examples in the training set.In addition, aspect-dependent features are not blindly incorporated during representation learning and will only be added if necessary.The main contributions of our work can be summarized as follows: • The ABSA task is approaches from a new perspective that aspect-invariant sentiment features are leveraged for sentiment analysis, especially for those aspects with limited training examples.• A novel multi-task discriminator is proposed for learning aspect-invariant/dependent sentiment features in an adversarial way.• As a general adversarial multi-task learning framework, our proposed method can be easily combined with any ABSA models or other neural networks to generate improved results.To the best of our knowledge, our work represents the first study of extracting aspect-invariant and aspect-dependent sentiment features by adversarial learning for ABSA.

RELATED WORK 2.1 Aspect-based Sentiment Analysis
The task of aspect-based sentiment analysis (ABSA) can be regarded as a fine-grained sentiment analysis task, which needs to leverage information from both context and aspects [3, 5-7, 11, 17, 35-37, 44].Xue and Li [44] proposed a gated CNN model to selectively output the sentiment features according to the given aspect based on gating units.Chen et al. [1] proposed a memory network-based framework with multiple-attention mechanism to capture sentiment features separated by a long distance, so that it is more robust against irrelevant information.Tang et al. [37] utilized two LSTMs to extract the contextual sentiment dependencies for a given aspect.Resoundingly, thanks to the ability of attention mechanisms, the majority of current approaches attempt to enforce models to pay more attentions to the given aspect in the training process [1,9,10,12,18,25,38,41].For example, an attention-based LSTM by means of using an individual aspect embedding is able to focus on the aspect and capture the intra-sentence relations [41].There is also an interactive attention network with context representation and aspect representation learned separately and interactively to well represent the context and its corresponding target [25].In addition, with the development of graphical neural networks, an aspect-specific graph convolutional network (GCN) based on dependency trees was proposed to learn contextual dependencies for a given aspect [46].Recently, based on the remarkable success of BERT [4], Song et al. [34] proposed an attentional encoder network to draw hidden states and semantic interactions between target and context words.Sun et al. [35] constructed an auxiliary sentence from an aspect and convert ABSA to a sentence-pair classification task based on fine-tuning the pre-trained model from BERT.In addition, Li et al. [19] exploited coarse-to-fine task transfer and proposed a multi-granularity alignment network for simultaneously aligning aspect granularity and aspect-specific feature representations across domains.This work authentically borrowed knowledge from an abundant source domain of the coarsegrained aspect category task to a small-scale target domain of the fine-grained aspect term task.However, it did not address the problem of limited annotated data for some aspects in the training data.
Almost all the aforementioned models assume contextual sentiment information towards a specific aspect is essential for ABSA, which is however not always the case.Blindly incorporating aspectrelated sentiment features may lead to degraded sentiment classification performance, as will shown in our experiments.To address this issue, a better network for leveraging aspect-invariant sentiment information should be considered.

Adversarial Multi-task Learning
By extracting shared and transferable information from related tasks, multi-task learning can leverage latent correlated features from data [24,33,42].In the task of ABSA, He et al. [9] considered ABSA as two subtasks: document-level and aspect-level classification, and transfer knowledge from document-level data to improve the performance of aspect-level sentiment classification.Further, He et al. [10] utilized an iterative message passing scheme to explicitly model the interactions between tasks.Multi-task learning could be combined with adversarial training, which has achieved promising performance in various natural language processing (NLP) tasks [2,15,22,39,43].Among them, Wu et al. [43]  In sentiment analysis, Wang et al. [39] proposed a user-attentionbased CNN model with adversarial cross-lingual learning framework to enrich the user post representation in personalized Microblog sentiment classification, and Chen et al. [2] proposed an adversarial deep averaging network to transfer the knowledge learned from labeled data on a resource-rich source language to low-resource languages in the task of cross-lingual sentiment classification.Li et al. [18] considered ABSA as an end-to-end task, and proposed a selective adversarial learning method to learn an alignment weight for each word, where more important words can possess higher alignment weights to achieve a local semantic alignment and capture domain-invariant word representations by employing adversarial training.Inspired by the recent successful work [2,22], we propose a novel framework to extract aspect-invariant sentiment features for improving aspect-based sentiment classification.

METHODOLOGY
In this section, we describe our adversarial multi-task learning framework, which is demonstrated in Figure 2, in details.

Fake Cross-Aspect Sample Generation
Ideally, we would expect a sufficient number of training examples annotated for each aspect in the task of ABSA.However, this is rarely the case in reality.In existing ABSA datasets, we often observe imbalanced training examples with some aspects associated with abundant annotated data while others having only limited annotated instances.The goal of fake cross-aspect sample generation is to automatically generate high-quality training samples so as to enrich the training dataset.
There are many possible ways to generate synthetic fake training examples.One simple approach we consider is to replace an aspect mention or attribute in a sentence with another aspect term which shares some similarity with the original aspect mention/attribute2 .The rationale behind this is that we expect sentiment expressions of these related aspects are also similar.For example, as shown in Figure 1(a), to generate the fake training example for the aspect 'LOCATION#GENERAL', we select the training instances which are annotated with the attribute '#GENERAL' and replace its corresponding entity mention with the aspect entity 'location'.

Adversarial Multi-Task Learning Framework
As demonstrated in Figure 2, our adversarial multi-task learning framework consists of two branches: aspect-based sentiment prediction and aspect-invariant features extraction.As a universally applicable framework, the proposed framework can be easily combined with existing ABSA models3 directly.
We first assume a sentence is represented by a sequence of words { 1 ,  2 , . . .,   }, and it may contain one or more aspects associated with different sentiment polarities, i.e. positive, negative, and neutral.Each word in the sentence can be represented by an dimensional embedding x ∈ R  , where  is the dimension of word embeddings.For each input sentence, the embedding layer maps the corresponding embedding vector for each word from the full word embedding matrix V ∈ R ×| | to get the input embedding matrix X ∈ R × , where  is the length of sentence and || is the vocabulary size of the dataset.The embedding layer is usually initialized with pre-trained embeddings such as GloVe [29] and BERT [4], then they are fine-tuned during the training process.Each aspect can also be represented as an -dimensional embedding a ∈ R  , which is the average of its entity and attribute word embeddings.
After that, a feature extractor F is utilized to extract the aspectinvariant sentiment features, and meanwhile separate them from aspect-dependent sentiment features and noisy features.Here, the feature extractor can be any neural model, such as bi-directional LSTM (Bi-LSTM): where Bi-LSTM(•) is the shorthand computation of Bi-LSTM, X t ∈ R  is the input of word embedding at the current time step, h   −1 is the hidden output at the last time step,   denotes all the parameters of the feature extractor F .Then the hidden output of feature extractor F (X) = h  can be fed as into the discriminator D to identify the aspect label and the sample label (fake r real).For sentiment prediction of fake cross-aspect samples, we feed F (X) into the sentiment classifier C to get the output distribution.Here, we utilize the softmax function to obtain the output distribution of the discriminator and the classifier.
For the branch of sentiment prediction for a specific aspect, the input embedding matrix X R × and the aspect embedding a ∈ R  are fed into the ABSA model M and output the hidden feature representation h  : where   represents the parameters of the ABSA model.To capture the aspect-invariant sentiment features, we feed the hidden outputs from the ABSA model M and the feature extractor F into the sentiment classifier C to obtain the sentiment polarity of a specific aspect: where y  is the predicted sentiment distribution.Here, we also utilize the softmax function for sentiment prediction.[•, •] represents a concatenation of two feature representations.

Multi-Task Learning for ABSA
The key purpose of multi-task learning is to share latent features extracted from related tasks [2,10].Here, the feature extractor F aims at reconciling the features learned from different tasks with the help of the multi-task discriminator D.
Aspect Discrimination.The sentiment expressed in a sentence might depend on a specific aspect.Here, a discriminator D is adopted to predict the aspect category given a sentence input by minimizing the cross-entropy loss of predicted and true aspect distributions for all  training samples: where  is the number of aspect classes,    is the ground-truth distribution of aspect,   represents all the parameters of D.

Algorithm 1
The training procedure of our adversarial framework Require: embedding matrix sets of samples X; aspect embedding sets A; the size of dataset  ,   ; hyperpamameters ,  and .
⊲ Aspect and fake sample discrimination 5: ⊲ Sentiment discrimination 8: ⊲ Aspect-based sentiment prediction 10: J  = C (C( f  )) + J  12: end for 13: Update all the parameters to minimize the loss Fake Sample Discrimination.To capture aspect-invariant sentiment features from the generated cross-aspect samples, the discriminator D is also adopted to classify whether a given input sample is real or fake by minimizing the cross-entropy loss of predicted and true distributions: where    is the ground-truth distribution of fake or real sample.Sentiment Discrimination.For sentiment discrimination of crossaspect samples, the sentiment features extracted by F are fed into classifier C to extract aspect-invariant sentiment information and discriminate noisy and aspect-dependent sentiment features.The objective to train the classifier C is defined as minimizing the crossentropy loss of prediction and true distributions: where  is the number of sentiment classes,    is the ground-truth distribution of sentence-level sentiment polarity,   represents all the parameters of sentence-level sentiment discrimination.
Aspect-based Sentiment Prediction.For aspect-based sentiment prediction, the aspect-invariant sentiment features extracted by F are fed into existing ABSA models.With the help of the aspect-invariant features, the ABSA model M can learn better sentiment features for those aspects with limited training samples.The objective function is defined as: where   is the number of real samples,    is the ground-truth distribution of aspect-based sentiment polarity,   represents all the parameters of aspect-based sentiment prediction.

Adversarial Training
The training procedure of our adversarial framework is depicted in Algorithm 1.The feature extractor F aims to extract aspect-invariant sentiment features that could help an ABSA model M to predict sentiment polarity for any aspect and confuse the discriminator D to detect aspect label and fake sample by adversarial training.In another words, if a well-trained discriminator D can not detect the aspect label and fake sample based on the features learned by F , then those features are aspect-invariant and could be exploited to improve aspect-based sentiment prediction.
In pursuit of the adversarial goal, the discriminator D is designed to impede the feature extractor F to learn aspect-invariant sentiment features from input sentences.In addition, the sentiment classifier C needs to predict the sentiment polarity of the input sentence.Here, adversarial training performs min-max optimization that can be divided into two parts: minimizing the cross-entropy loss of sentencelevel sentiment prediction and maximizing the cross-entropy loss of the discriminator.Hence, the adversarial loss can be defined as: where  is the weight that controls the interaction of the loss terms, and J (  ) denotes the overall loss of the discriminator D, which is defined as: Finally, we simultaneously minimize the cross-entropy loss of sentencelevel sentiment prediction and aspect-based sentiment prediction to exploit the aspect-invariant sentiment features: where Θ represents all the parameters of the adversarial multi-task learning framework,  is the parameter to control the influence of J (  ).

Three Schemes of Our Framework
To demonstrate different scenarios in deploying our proposed framework, we explore three different structures to extract and transfer aspect-invariant sentiment features for improving the performance of aspect-based sentiment prediction, as shown in Figure 3.
Integration structure (ISF).In the first scheme, we integrate the feature extractor F with an ABSA model M, i.e.M is also used as F .As demonstrated in Figure 3(a), the aspect-invariant sentiment information and aspect-based sentiment information are both learned by the ABSA model M.

Concatenation structure (CSF).
As demonstrated in Figure 3(b), a recurrent model Bi-LSTM is used as the feature extractor F to extract aspect-invariant sentiment features from data, which are subsequently concatenated with the hidden output of the ABSA model M: and fed into the sentiment classifier C.

Gate fusion structure (GSF).
The last scheme is a gated fusion structure, as demonstrated in Figure 3(c).Bi-LSTM is also used to learn aspect-invariant sentiment features.Different from CSF, feature combination is performed by a fusion gate: where  denotes the sigmoid function.

EXPERIMENTS 4.1 Datasets and Experimental Setting
We conduct experiments on two benchmark datasets of the restaurants domain from Semeval 2015 Task 12 [31] and Semeval 2016 Task 5 [30] (Sem15 and Sem16).The task of ABSA aims at predicting the sentiment polarity (i.e.positive, negative or neutral) for each aspect mentioned in the sample.Each training and test sample in the datasets consists of the review sentence, aspect entity, aspect, and the sentiment polarity towards the aspect 4 .We use GloVe [29] to initialize word embeddings for all non-BERT models, and use the pre-trained uncased BERT-base [4] model for all BERT-based models.The weight coefficients of ,  and  are 0.5, 0.1 and 0.5 respectively.We randomly initialize W and b for all experiments.To reconcile the training of multi-task, we pre-train the discriminator D for 20 epochs.

Comparison Models
We compare our proposed adversarial multi-task learning framework (including three schemes, i.e.ISF, CSF and GSF) with 10 baseline models, including BERT: • TD-LSTM [37]: A target-dependent LSTM model, which incorporates aspect information into LSTM.• ATAE-LSTM [41]: An attention-based LSTM model which better takes advantage of aspect information.• MemNet [38]: An attention-based memory network, which applies attention multiple times on the word embeddings for sentence representation to capture the importance of each context word.
• RAM [1]: A recurrent attention-based memory network which captures sentiment features separated by a long distance.
• IAN [25]: An interactive attention network with context representation and aspect representation learned separately but interactively.
• AOA [13]: An attention-based network models aspects and sentences jointly to capture the relations between aspects and context.
• MGAN [7]: A multi-grained attention network which captures the word-level interaction between aspects and context.• GCAE [44]: A gated CNN model which effectively controls the flow of sentiment according to the given aspect information.
• AEN [34]: An attention encoder network, which eschews recurrence and employs attention-based encoders for the modeling between context and aspect.• BERT [4]: The vanilla pre-trained uncased BERT-base model, which adopts "[CLS] sentence [SEP] aspect [SEP]" as input 5 .• ISF+models: The models based on our integration structure framework.• CSF+models: The models based on our concatenation structure framework.• GSF+models: The models based on our gated fusion structure framework.

Main Experiment Results
The main experimental results on two benchmark datasets are reported in Table 1.We can draw a conclusion that the ABSA models based on our adversarial multi-task learning framework (all three schemes, i.e.ISF+models, CSF+models and GSF+models) achieve better performance than competitor models for both accuracy Macro-F1.Among them, the best improvements of accuracy and Macro-F1 are 5.25% (CSF+AOA) and 6.32% (GSF+MemNet) respectively on sem15.For the Sem16 dataset, our framework improves accuracy by 4.84% (GSF+RAM) and Macro-F1 by 8.75% (GSF+MGAN).Among the baselines, BERT gives the best results on both Sem15 and Sem16 datasets.Nevertheless, when integrated with our proposed framework, significant improvements of 2.57-3.13% in accuracy and 2.50-3.63% in F1 are observed compared to the vanilla BERT model.The results verify that our proposed adversarial multi-task learning framework can be easily combined with existing ABSA models and achieve the state-of-the-art performance for predicting aspect-based sentiment.One main reason is that the proposed framework can leverage aspect-invariant sentiment features from cross-aspect samples for better learning sentiment features of those aspects with limited annotated samples.
In this paper, we explore three schemes of the proposed framework to demonstrate the versatility of our method.Experimental results show that all the three structures can improve the performance of aspect-based sentiment prediction over baseline models.It can also be observed that the concatenation structure (CSF) and the gated fusion structure (GSF) perform better than the integration structure (ISF) on both Sem15 and Sem16 datasets.This indicates that employing an independent feature extractor F for the extraction of aspect-invariant features is more effective compared to simply using an existing ABSA model.Overall, the gated fusion structure (GSF) performs better than other two schemes, showing that aspect-invariant sentiment features are better captured by the gating mechanism.

Ablation Study
To investigate the impact of different components of our proposed adversarial multi-task learning framework, we conduct experiments based on RAM model and report the results of different structures 6 .As shown in Table 2, Simply incorporation fake samples (w/ fake) only leads to marginal improvements since some of the fake samples may introduce noise to the training process.When integrating multitask learning (w/ multi) into our framework, the performance can be preeminently improved for all three submodules (w/ , w/  and w/ +).Among them, models with fake sample discrimination (w/ ) achieve more noticeable improvement in comparison with those without (w/ ), which demonstrates that incorporating fake sample discrimination can reduce the negative impact of noisy fake samples and better extract aspect-invariant sentiment features.In addition, compared with models with single-task discrimination, the multi-task learning models with both aspect discrimination and fake sample discrimination achieve better results in general for all the three structures with more significant results for both CSF and GSF, which demonstrates the effectiveness of the proposed multi-task learning for ABSA.Here, the reason why significant improvement can not be achieved by ISF with multi-task learning may be that, it is laborious for a single model to learn various discrepant features concurrently.
It is also worth noticing that significant improvement can be achieved by multi-task learning integrated with adversarial training (w/ adv.+multi+fake w/ +).This verifies that compared with pure multi-task learning, adversarial multi-task learning can better disentangle aspect-dependent sentiment features and noisy features from aspect-invariant sentiment features by adversarial training, which eventually leads to superior performance in ABSA.

Impact of the Training Data Size
To further demonstrate that our adversarial multi-task learning framework can extract aspect-invariant sentiment features to improve  4. According to Figure 4(a) and (b), we can observe that all the three proposed structures of our framework (ISF, CSF and GSF) achieve better performance in comparison with the baseline on both Sem15 and Sem16 dataset.For small proportion of annotated data (< 40%), our three structures can still achieve remarkable performance (CSF and GSF in particular), that is, when the annotated data is critically insufficient, our framework can still achieve appreciable performance.This implies that extracting aspect-invariant sentiment features can significantly improve the performance of predicting aspect-based sentiment, especially for those aspects with limited annotated data.In addition, Figure 4(c) and (d) show the performance of using different proportions of fake cross-aspect samples (here, the total number of synthetic cross-aspect samples is 4,873 on Sem15 and 6,310 on Sem16).We observe that the performance of the baseline (the original ABSA model) fluctuates with the increasing number of fake cross-aspect samples.This shows that simply adding fake training samples may introduce noise to the model as some of the generated synthetic training instances may contain inconsistent sentiment features, which are vulnerable to slash the learning ability of the model.Our three structures, on the contrary, achieve best results with the increase number of fake samples.The performance improvement is more noticeable when the size of fake samples is small (< 40%).This indicates that the proposed adversarial multi-task learning framework can extract aspect-invariant sentiment features from fake cross-aspect samples more effectively, and essentially filtering out noisy signals automatically.

Detailed Results for Aspect-Invariant Sentiment Extraction
In this subsection, we conduct detailed experiments to verify that the fake cross-aspect samples can be fed into the proposed framework to learn aspect-invariant sentiment features for improving aspect-based sentiment prediction, particularly for aspects with limited annotated data.More concretely, we manually annotate sentiment expressions (i.e.aspect-invariant or aspect-dependent) of samples identified by the multi-task discriminator D. As demonstrated in Figure 5

Visualizations and Qualitative Analysis
To qualitatively demonstrate how the proposed framework improves the performance of ABSA, we visualize the intermediate vectors extracted by the feature extractor F via t-SNE [26] and analyze what sentiment features are learned from the real testing instances which are indistinguishable by the discriminator.The results are reported in Figure 6.We can observe from Figure 6(a) and (b) that the intermediate vectors from indistinguishable and distinguishable samples are clearly separated by the discriminator D. In addition, Figure 6(c) and (d) demonstrate that the proportion of indistinguishable instances containing aspect-invariant sentiment expressions is near 90% in both Sem15 and Sem16, and about 90% of the aspect-dependent sentiment features are discriminated by the discriminator D on both Sem15 and Sem16.This verifies that the proposed framework can effectively distinguish and extract aspect-invariant sentiment features by adversarial multi-task learning.

Analysis of Gated Fusion Structure (GSF)
As reported earlier, the Gated Fusion Structure (GSF) achieves the best performance in all comparison experiments.To further analyze how the fusion gate better incorporates aspect-invariant sentiment features for improving aspect-based sentiment prediction, we demonstrate in Figure 7 the visualization of intermediate vectors learned by the fusion gate and the distribution of vector values.From Figure 7(a) and (b), we can observe that aspect-invariant and aspectdependent sentiment features are better separated for both Sem15 and Sem16, which verifies the effectiveness of using the proposed GSF scheme for better incorporating aspect-invariant sentiment features.In addition, Figure 7

Effect of Off-the-shelf Aspect-Invariant Data
To explore the availability of the aspect-invariant sentiment features for improving the prediction of aspect-based sentiment in the task of ABSA, we output the aspect-invariant samples learned by our feature extractor F and feed them with different proportions into all baseline models.Here, we adopt different baseline models as M to train the framework, and select the top 300 aspect-invariant samples for the Sem15 dataset and 400 aspect-invariant samples for the Sem16 dataset from the generated aspect-invariant data.The experimental results are demonstrated in Figure 8.We can observe that with the increasing number of aspect-invariant samples, the  accuracy of aspect-based sentiment prediction increases for all of the baseline models on both Sem15 and Sem16 datasets.This indicates that the proposed framework can indeed extract aspect-invariant samples, and feed them as additional annotated data into existing ABSA models for the improvement of ABSA performance.One possible reason is that aspect-invariant sentiment features would benefit the learning of a better sentiment classifier for those aspects with limited annotated data.

CONCLUSION
In this paper, we have proposed a novel adversarial multi-task learning framework to extract aspect-invariant sentiment features from cross-aspect data via adversarial training.The generated fake training instances containing aspect-invariant sentiment features can effectively boost ABSA performance, especially for aspects with limited annotated data.Experimental results on two benchmark datasets show that the proposed framework can be easily combined with existing neural network-based ABSA models and capture aspectinvariant sentiment features effectively for improving ABSA performance without requiring additional annotated data, and thereby achieve state-of-the-art performance in the ABSA task.

Figure 1 :
Figure 1: Sentence examples paired with their aspects and polarity labels.(a) All three sentences share the same sentiment expression.The sentence enclosed in the red box is seen in the training set, while the sentence enclosed in the blue box is the synthetic training example generated by replacing the aspect entities from 'service' to 'location'.The created synthetic example would allow the detection of the polarity of the unseen test example enclosed in the green box; (b) Examples of aspectdependent sentiment expressions.Although these two sentences share the same sentiment expression, they express opposing polarities.(c) Examples of another type of aspect-dependent sentiment expressions, in which, the word 'slow' can be used to modify 'service' but not 'restaurant'.
applied adversarial training in relation extraction within the multi-instance multi-label learning framework, which revealed the effectiveness of adversarial training for relation extraction.Liu et al. [22] proposed an adversarial multi-task learning framework for text classification, in which the shared and private feature spaces are inherently disjoint by introducing orthogonality constraints.

Figure 2 :
Figure 2: The architecture of the proposed adversarial multitask learning framework.There are five main components: the embedding layer, the feature extractor F , the discriminator D, the ABSA model M and the sentiment classifier C. Lines in different colors show the propagation of information from different components.

Figure 3 :
Figure 3: Three schemes of our adversarial multi-task learning framework.

Figure 4 :
Figure 4: Performance of using different proportions of annotated data on Sem15 and Sem16.(a) and (b) show different percentages of annotated data.(c) and (d) show different percentages of fake cross-aspect samples.
(a), among cross-aspect samples which are identified wrongly by the discriminator D (indistinguishable fake/real), over 90% of them contain aspect-invariant sentiment expressions on both Sem15 and Sem16.Examples of such cross-aspect training samples are shown in the first row of Figure5(b).On the contrary, among those cross-aspect samples which are identified correctly by our discriminator (distinguishable fake/real), only about half of them contain aspect-invariant sentiment expressions.Some aspect-dependent fake examples are shown in the second row of Figure5(b).Clearly, these fake samples need to be filtered as otherwise, they will confuse the learning of ABSA model.I love this place (b) Discrimination results of typical fake samples.Proportion of aspect-invariant samples.service was slow restaurant was slow ambience was slow I love this ambience I love this food

Figure 5 :
Figure5: Results of aspect-invariant sentiment features extracted by our framework.✓ represents that samples contain aspect-invariant sentiment expressions, which could be extracted to improve aspect-based sentiment prediction for aspects with limited annotated data.✗ represents that the generated samples contain aspect-dependent sentiment expressions, which introduce noise and should be filtered by our framework.
(c) and (d) show that the peak values

Figure 6 :
Figure 6: Visualizations of intermediate vectors.(a) and (b) demonstrate intermediate vectors output by the feature extractor F on Sem15 and Sem16 respectively.Red dots represent those 'indistinguishable' real samples that can not be discriminated by a well-trained discriminator D, cyan dots represent 'distinguishable' real samples by the discriminator.(c) and (d) show the distribution of real testing instances containing aspect-invariant and aspect-dependent sentiment expressions on Sem15 and Sem16 respectively.Blue bars denote the proportion of aspect-invariant instances in 'indistinguishable' and 'distinguishable' categories.Red bars denote the proportion of aspect-dependent instances of these two categories in all aspectdependent instances.

( a )Figure 7 :
Figure 7: Visualizations and value distributions of intermediate vectors learned by the fusion gate.

Figure 8 :
Figure 8: Performance of different proportions of aspectinvariant samples on Sem15 and Sem16.

Table 1 :
Main experimental results on Sem15 and Sem16.Acc.represents accuracy, F1 represents Macro-F1 score, † denotes the model based on our framework.Average results over 10 runs are reported, best scores for each baseline are in bold.

Table 2 :
Experimental results of different variants based on ISF, CSF and GSF structures. and  represent aspect discrimination and fake sample discrimination respectively."fake", "multi" and "adv."represent fake samples, multi-task learning and adversarial training respectively.* denotes the complete framework.