Identifying Future Brand Ambassadors in The National Basketball Association (NBA) Predicting Future NBA Superstars for Superior Marketing

This paper seeks to determine which factors/variables are useful to predict the salary of a National Basketball Association (NBA) player. Our goal is motivated by the presumed connection between a player’s salary and how well he is known nationally and perhaps internationally, and the connection between how well a player is known and his value as a brand ambassador/spokesperson for selected brands. We use stepwise regression as the key statistical tool for our analyses.


INTRODUCTION
Since the advent of cable television, popular sports have occupied a major part of the entertainment spectrum. Sportspersons have commanded a fan following, second only to major Hollywood Stars. Many of these sportspersons serve as brand ambassadors/spokespersons for major brands or franchises. When a sports person becomes a face of the brand, the image of the brand often directly benefits or "takes a hit" based on the performance of the player in the sport. As such, brands do well to perform due diligence before selecting their ambassadors.
The most popular sports stars are often also the most highly compensated. Many popular stars represent brands and are compensated handsomely for it. As such, player compensation by their league is a good indicator of player success. If player success is related to certain parameters, it will be beneficial for brands to look at these parameters as a precursor to the success of a player as a way to judge who might be an effective ambassador/spokesperson for the brand. This may be able to help brands to identify superstars (presumably, superior ambassadors/spokespersons) early in their career.
In this paper, we have used the National Basketball Association (NBA) as a model for such predictive modeling. We use data from the NBA, to predict the performance parameters that are the best indicators of future player performance, as measured by salary.
Our core objective is to study the performance statistics of players in the National Basketball Association and try to determine the performance parameters which are most likely to impact their salaries. Tying these parameters to salary implies, to an extent, that NBA general managers and owners are capable of determining which parameters are deserving of more vs. less of a role in salary determination. NBA athletes are the most highly paid sportsmen in the world, as measured by average annual salary per player (Gaines, 2015). While overall performance is generally credited for the high salary of a player, we are attempting to breakdown the overall performance into individual components, to see if there are specific identifiable components of performance that makes certain players more attractive to clubs and to what degree, again, as measured by salary.
Our approach is aimed toward helping brands to identify players who are more likely to become superstars by looking at certain variables in their performance which have a linear relationship with their compensation. Brands can then use these variables to identify and tap talent early on in their career. In theory, those who perform better on the court are more likely to be more effective ambassadors/spokespersons for the brand. While we do not believe that this is always true -after all -there are other aspects of players, such as personality, city played in, and others, which likely also speak to effectiveness as a brand ambassador/spokesperson for a brand, we believe that everything else equal, superior play on the court leads to being a superior ambassador/spokesperson.

LITERATURE REVIEW:
The NBA has been of interest to researchers from various fields ranging from labor research to economics. Hajime Katayama and Hudan Nuch of the University of Sydney, in their paper, "A game level analysis of salary dispersion and team performance in national basket ball association," have studied the impact of "within team" salary dispersion on team performance (Katayama, 2009).
Michael Wallace's (1982) study titled, "Labor Market Structure and Salary Determination among Professional Basketball Players," concluded that structural variables, such as team and race, play an important role in earnings determination in the NBA and that player position is not significant. Also, out of all performance indicators, he found only scoring and rebounding to be statistically significant. For the purpose of this study, we are not considering (the specific) team as a variable (Wallace, 1988).
Andrew Fleenor of the University of Tennessee created a prediction model for salaries of players in NBA (Fleenor, 1999). However, his study entitled, "Predicting National Basketball Association (NBA) Player salaries," considers data only for the 1997-98 season. Since the scope of our paper is to identify superstars based on performance characteristics displayed consistently, we have considered total career statistics of players.
Jerry Hausman (MIT) and Gregory Leonard (Cambridge Econometric Inc.), in their paper, "Superstars in NBA: Economic Value and Policy," conducted an econometric analysis demonstrating that TV ratings are higher for games in which certain superstars are playing. They concluded that the presence of superstars leads to inefficient distribution of talent (Hausman & Leonard,1997).
There are also a large number of other studies which explored a wide array of topics such as racial differences in salaries and the impact of reputation and status on salaries in the NBA.

RESEARCH METHODOLOGY:
For the purpose of this study, we have used 4 datasets available in the public domain (retrieved from Kaggle datasets) and consolidated them into a single dataset using "player name" as a common factor. The 4 datasets available to us were: 1) Performance Statistics of 3922 players, who have played in the NBA since 1950. This data set gave us the individual statistics for every player for every season on 31 parameters. 2) Player data of 3922 players was obtained from an NBA database. This included the Name, Height, Weight, College, Year of Birth, Birth City and Birth State 3) Player data of 4550 players was obtained from an independent data set. This data set included the name, start year, end year, position, height, weight, birth date and college 4) Player Salaries for 2017-2018 season were obtained from the NBA records. These data were for 573 players who played in that season.
A master file for analysis was created by consolidating the data from these 4 sources into a single file. For this analysis, only the data of 573 players who played in the 17'-18' season was considered (file 4). The data from files 1,2 and 3 were looked up from those files. A pivot was used to consolidate the statistics against each player. Out of 573 players, complete data was obtained for only 479 players. Thus, we have used a dataset with a total of 479 data points (i.e., n=479.) The final master file for analysis, thus prepared, contained --All player related data such as team, salary, height, weight, college, birth year, birth city, state, start year, end year, no. of years playing and position. -Parameters of player performance (refer to

Qualitative Variables and Recoding
We had 47 independent variables, of which, 46 were quantitative and 1 was qualitative / categorical. The categorical variable was player position. There are 7 (n=7) player positions (C, C-F, F, F-C, F-G, G, G-F). Guard Forward (G-F) was taken as the base/dummy category and 6 (i.e., [n-1]) dummy variables were created. (C = Center, F = Forward, G = Guard; many players can play/two positions, which is why we have C-F, F-C, F-G and G-F, where the first letter is his majority position.)

Dependent and Independent Variables:
Our dependent variable (Y) is Salaries of Players (2017-2018.) We have 52 quantitative independent variables, of which 6 are dummy variables, as noted in  We expect to reject H 0 overwhelmingly.

REGRESSION ANALYSIS & DISCUSSION OF RESULTS:
Our data are used to conduct stepwise multiple linear-regression analysis using 2017-2018 Salary as the dependent variable to determine how it is related to the independent variables. The variables that are part of our final stepwise model are Value over Replacement (VORP), Games Started, Defensive Win shares (DWS), Offensive Rebounds (ORB), Offensive Rebounds Percentage (ORB%), Player Efficiency Rating (PER), Center Position, and Two Point Average (2PA). The R-square of this model is 0.47; this means that 47% of the variability in salary is estimated to be explained by the variables included in this model. The details of the model summary are shown below in Figure 1 (in essence, the last step of the stepwise-regression analysis.)  It should be noted that Figure 1 contains 8 independent variables, even though there are 10 steps. One variable, AST, entered the model early on, but was then deleted, as other variables entered the model and, while adding predictive ability, eventually rendered AST redundant, and, consequently, AST was deleted as not adding any incremental value. This phenomenon (of a variable entering the model and subsequently being deleted from the model) is not an infrequent one in the stepwise-regression process.

Model
The coefficients in the model are depicted in Figure 2:

Multicollinearity Analysis
With this output, the VIF value was found to be higher than 10 for 3 variables -Games Started, DWS and 2PA. As such, these variables have a possible multicollinearity issue. To address this issue, we will include just one of these 3 variables and run the regression again. We conducted the stepwise multiple linear regression excluding the following variables: DWS and Games Started (GS).
The new output of our stepwise regression is shown below in Figure 3:   Player compensation increases with an increase in value over replacement. This makes sense because VORP measures the marginal utility of a player to his team, compared to a replacement level player (Woolner 1997). Player compensation increases with increase in point field goals. This was expected since the field goal is one of the most popular metrics of measurement of player success. Player compensation increases with Offensive Box Plus Minus and Offensive Rebound. This also makes sense because Offensive Box plus minus is a metric for evaluating a basketball player's quality and contribution to the team while an offensive rebound gives the offensive team another opportunity to score right away. (The majority of rebounds are defensive, since the defending team is generally in a better position to recover the ball; the shooter is generally not positioned well to get a rebound of his own missed shot, so that, at minimum, there are 5 defensive players, and only 4 offensive players, positioned to get the rebound.) As such, a player with ability to get offensive rebounds is valuable.

Model
Player compensation decreases with increase in assists, point attempt rate, offensive rebound percentage, Winshares per 48 minutes, and blocks; these are the variables with a negative coefficient in the final model. None of these negative coefficients make very clear sense. In some cases, one might make a "partial argument" (e.g., assists are distributed quite liberally, and perhaps, is less reflective of true contribution), but overall, we admit that we are not prepared to support a logical explanation for these negative coefficients. Since this is a stepwise regression with no "very high" VIF values, none of the variables should be highly correlated with other variables. If they were, not all the variables would "survive" in the final model -redundancy would suggest/mandate some deletions. Thus, we cannot "blame" multicollinearity for the negative values.
The qualitative variable, Center Position (C) is also statistically significant, which essentially means that (we estimate that) a player who plays the center position receives approximately $2.5 million more in salary annually compared to players who play in the Guard-Forward position (the "dummy" category), when all other variables in the model are held constant.

CONCLUSION
Our study concludes that brands can effectively identify future superstars by looking at certain performance parameters. A player who has a high VORP, scores a "larger" number of field goals, has a higher box plus minus, and has more offensive rebounds, fewer assists, has a lower point attempt rate, winshares and blocks is most likely to be successful, salary-wise. As mentioned earlier in this paper, this study can help advertising agencies and brands to identify promising prospects from a pipeline of future players.

LIMITATIONS AND SCOPE OF FUTURE RESEARCH:
In an ideal situation, the VIF values of variables should have been in the range of 1 to 2. However, the nature of the data is such that many variables are related to each other. As such, we have relaxed this requirement and excluded only those variables that are very strongly correlated with each other (r > 0.8.) We have excluded certain qualitative variables such as college/school of a player, and city and/or country of birth of each player. The cap on salaries of players in the form of cap of maximum allowable expenditure on player salaries by a team is another factor which may distort the market forces at work. Certain adjustments made for those factors could change the model output.