There is a newer version of the record available.

Published December 14, 2020 | Version v1
Report Open

Trends in Majority Business Ownership in the United States

Description

Foundational Questions: 
1. How is the minority representation in business ownership throughout industries changing?
2. Are there any states that have higher than average minority business ownership across all classes?
3. What other ways can businesses be clustered besides sector, and what is the minority breakdown across these clusters?

For this project I will be examining the trends in business ownership in the United States. The data for this project was acquired through the United States Census Bureau's Survey of Business Owners. The Survey of Business Owners is conducted every 5 years. The Census Bureau publishes the summary statistics for each survey based on traits like race, ethnicity, education level, and whether a person was born in the United States. I will begin the project looking at trends in this data for the survey years 2002, 2007, and 2012. This data can be found here.

The Census Bureau has also releases a microdata sample for the year 2007. This sample contains data regarding the characteristics of individual businesses and their respective owners for about 1.5 million businesses. To protect the identity of the businesses and their owners artificial noise is inserted into the dataset. Notably, this dataset also contains geographic information about each business as well as estimated for their employees, payroll, and receivables. This dataset can be found here.

Clustering
In this section, I want to explore creating clusters for different types of businesses based on traits like employment, payroll, receivables, whether the business was family owned, and so on. I then wanted to determine the breakdown of different characteristics of business owners by cluster. 

For this problem I decided to use K-Protoypes Clustering. This algorithm is a mixture of K-Means and K-Modes, and fits this problem well because it can cluster based on both numerical and categorical data. A downside of this algorithm is the computational complexity as you increase the number of data. Without a gpu and hours/days to train the model, I was not able to cluster based on all 1.5 million datapoints. Instead, I had to take random samples of the data. The most I could train with given the limitations of my hardware was 10,000, even after I split the work between all 6 of the processor cores on my laptop.

Cluster Breakdown

Cluster 0:
Number of businesses: 449927, Traits: Far below average employees, payroll and receivables; Most often is home-based, but not family owned or a franchise

Cluster 1
Number of businesses: 572374, Traits: Slightly below average employees, payroll and receivables; More often is not family-owned, home-based, or a franchise. 

Cluster 2
Number of businesses: 20747, Traits: Slightly above average employees, payroll and receivables; More often is not family-owned, home-based, or a franchise

Cluster 3:
Number of businesses: 80, Traits: Far above average employees and receivables with slightly above average payroll; Most often is not family-owned, home-based, or a franchise.

Cluster 4:
Number of businesses: 1683, Traits: High above average employees, payroll and receivables; More often is not family-owned, home-based, or a franchise.

Conclusion

The purpose of this project was to paint broad strokes about patterns that arise in business ownership in the United States. While we were interested in general trends, like the how the number of business owners were changing in certain sectors, the main goal of was to identify patterns in the demographics of business owners. 

In terms of the trend from 2002 to 2012, we observed many different combination of outcomes. For some sectors, like information and technology, both minority and non-minority ownership was increasing. In others, like real estate, both minority and non-minority ownership was increasing, but non-minority ownership was increasing much faster. Yet another outcome was the instance of minority ownership increasing and non-minority ownership decreasing.

We also observed trends in geographic trends in the data. One takeaway was that even in states with high minority populations, one minority would tend to have a much larger presence in business ownership than the others. This points to the observation that while the United States is anecdotally considered a "melting pot", business ownership tends to stratify rather than mix together. 

Another important finding was was the demographic distribution amongst the clusters different types of businesses. While minorities make up about 18 percent of all business owners, they make up only about 5.6 percent of the owners of businesses belonging to cluster 4 (generally larger companies). Whether this discrepancy is a result of certain societal blockades towards minorities owning these more "successful" companies cannot be answered by this dataset, but the discrepancy is certainly there.  

While I consider the research into this topic successful, there are definitely ways to improve. For one, I didn't have the hardware to take full advantage of the K-prototypes clustering algorithm. My ability to cluster was limited to random samples of data and all feature selection was done on a trial and error basis.
Additionally, data on this topic is not as readily available as I believe it should be. As a result of this. I was limited to using data from 2007 to create clusters, as this was the only sizeable dataset available to the public.

This project is published Github.io and the code is on Github.

Files

pums.csv.zip

Files (68.0 MB)

Name Size Download all
md5:16f1726edc152d160d35ff72c7afcd3f
65.2 MB Preview Download
md5:a581c1dbf2557ada32bb5aef9cfa361f
6.8 kB Preview Download
md5:3ccf2204eb4acc7654801bcee52ff0db
11.4 kB Preview Download
md5:c6d36c542d80395de2398dfc97234a56
6.8 kB Preview Download
md5:9b6adcf47f094be6c3842ce22d3d0516
2.1 kB Preview Download
md5:a464b5fe9a3450e6cc7dec8f1d59aace
2.0 kB Preview Download
md5:58c825a90cb763696c5234b0733c60e8
2.2 kB Preview Download
md5:6bce9e3f781d8f466ac9d21ccc84787e
2.1 kB Preview Download
md5:8cb6fc857fe9bf7634887224fba40b85
1.7 kB Preview Download
md5:862b9a963e066491a5372ef2b0e3c63d
2.0 kB Preview Download
md5:54d2ddbb2cd198f3e102536feae42034
147.6 kB Preview Download
md5:ddde2e160d05c3c2c45a1e35ad74bd58
231.9 kB Preview Download
md5:a571aba1cadbd824186ff0e97d674bb7
154.2 kB Preview Download
md5:1647d02b8513611f6d717632159faed2
191.5 kB Preview Download
md5:72b1db4bf1fd663b690bf3b7bf40285d
106.3 kB Preview Download
md5:adfd2b6bd7cb50f499415df0fe989d01
106.1 kB Preview Download
md5:6269e97c16d6f79eaaf46b4e425696c8
211.5 kB Preview Download
md5:7cfc3bfea1b6ef67cce2f95a4f36fe74
190.7 kB Preview Download
md5:1060a732e5367f9d620243dce4acdb86
105.0 kB Preview Download
md5:832109287860ff189d0c523b67bae608
133.4 kB Preview Download
md5:7e3c2fbdebeba2ae6e5c26e3a84e0e91
264.2 kB Preview Download
md5:ce95b64f926e17148266c32d93f6472d
141.6 kB Preview Download
md5:42e1d04219c28c6902e4cdbd9fd157de
181.2 kB Preview Download
md5:8d831d83a68b0a4b5a7b535704523ceb
90.7 kB Preview Download
md5:dc8b3e2974c0b97b3a2836009fac89ad
90.7 kB Preview Download
md5:6dc8ab9178aa185b883e6918cea34dd0
202.9 kB Preview Download
md5:f598cf87211df3582c6b9866402cb98d
179.7 kB Preview Download
md5:93f8e17324f131325d767c594866e8eb
14.7 kB Preview Download