Trends in Majority Business Ownership in the United States
Creators
Description
Foundational Questions:
1. How is the minority representation in business ownership throughout industries changing?
2. Are there any states that have higher than average minority business ownership across all classes?
3. What other ways can businesses be clustered besides sector, and what is the minority breakdown across these clusters?
For this project I will be examining the trends in business ownership in the United States. The data for this project was acquired through the United States Census Bureau's Survey of Business Owners. The Survey of Business Owners is conducted every 5 years. The Census Bureau publishes the summary statistics for each survey based on traits like race, ethnicity, education level, and whether a person was born in the United States. I will begin the project looking at trends in this data for the survey years 2002, 2007, and 2012. This data can be found here.
The Census Bureau has also releases a microdata sample for the year 2007. This sample contains data regarding the characteristics of individual businesses and their respective owners for about 1.5 million businesses. To protect the identity of the businesses and their owners artificial noise is inserted into the dataset. Notably, this dataset also contains geographic information about each business as well as estimated for their employees, payroll, and receivables. This dataset can be found here.
Clustering
In this section, I want to explore creating clusters for different types of businesses based on traits like employment, payroll, receivables, whether the business was family owned, and so on. I then wanted to determine the breakdown of different characteristics of business owners by cluster.
For this problem I decided to use K-Protoypes Clustering. This algorithm is a mixture of K-Means and K-Modes, and fits this problem well because it can cluster based on both numerical and categorical data. A downside of this algorithm is the computational complexity as you increase the number of data. Without a gpu and hours/days to train the model, I was not able to cluster based on all 1.5 million datapoints. Instead, I had to take random samples of the data. The most I could train with given the limitations of my hardware was 10,000, even after I split the work between all 6 of the processor cores on my laptop.
Cluster Breakdown
Cluster 0:
Number of businesses: 449927, Traits: Far below average employees, payroll and receivables; Most often is home-based, but not family owned or a franchise
Cluster 1:
Number of businesses: 572374, Traits: Slightly below average employees, payroll and receivables; More often is not family-owned, home-based, or a franchise.
Cluster 2:
Number of businesses: 20747, Traits: Slightly above average employees, payroll and receivables; More often is not family-owned, home-based, or a franchise
Cluster 3:
Number of businesses: 80, Traits: Far above average employees and receivables with slightly above average payroll; Most often is not family-owned, home-based, or a franchise.
Cluster 4:
Number of businesses: 1683, Traits: High above average employees, payroll and receivables; More often is not family-owned, home-based, or a franchise.
Conclusion
The purpose of this project was to paint broad strokes about patterns that arise in business ownership in the United States. While we were interested in general trends, like the how the number of business owners were changing in certain sectors, the main goal of was to identify patterns in the demographics of business owners.
In terms of the trend from 2002 to 2012, we observed many different combination of outcomes. For some sectors, like information and technology, both minority and non-minority ownership was increasing. In others, like real estate, both minority and non-minority ownership was increasing, but non-minority ownership was increasing much faster. Yet another outcome was the instance of minority ownership increasing and non-minority ownership decreasing.
We also observed trends in geographic trends in the data. One takeaway was that even in states with high minority populations, one minority would tend to have a much larger presence in business ownership than the others. This points to the observation that while the United States is anecdotally considered a "melting pot", business ownership tends to stratify rather than mix together.
Another important finding was was the demographic distribution amongst the clusters different types of businesses. While minorities make up about 18 percent of all business owners, they make up only about 5.6 percent of the owners of businesses belonging to cluster 4 (generally larger companies). Whether this discrepancy is a result of certain societal blockades towards minorities owning these more "successful" companies cannot be answered by this dataset, but the discrepancy is certainly there.
While I consider the research into this topic successful, there are definitely ways to improve. For one, I didn't have the hardware to take full advantage of the K-prototypes clustering algorithm. My ability to cluster was limited to random samples of data and all feature selection was done on a trial and error basis.
Additionally, data on this topic is not as readily available as I believe it should be. As a result of this. I was limited to using data from 2007 to create clusters, as this was the only sizeable dataset available to the public.
This project is published Github.io and the code is on Github.
Files
pums.csv.zip
Files
(68.0 MB)
Name | Size | Download all |
---|---|---|
md5:16f1726edc152d160d35ff72c7afcd3f
|
65.2 MB | Preview Download |
md5:a581c1dbf2557ada32bb5aef9cfa361f
|
6.8 kB | Preview Download |
md5:3ccf2204eb4acc7654801bcee52ff0db
|
11.4 kB | Preview Download |
md5:c6d36c542d80395de2398dfc97234a56
|
6.8 kB | Preview Download |
md5:9b6adcf47f094be6c3842ce22d3d0516
|
2.1 kB | Preview Download |
md5:a464b5fe9a3450e6cc7dec8f1d59aace
|
2.0 kB | Preview Download |
md5:58c825a90cb763696c5234b0733c60e8
|
2.2 kB | Preview Download |
md5:6bce9e3f781d8f466ac9d21ccc84787e
|
2.1 kB | Preview Download |
md5:8cb6fc857fe9bf7634887224fba40b85
|
1.7 kB | Preview Download |
md5:862b9a963e066491a5372ef2b0e3c63d
|
2.0 kB | Preview Download |
md5:54d2ddbb2cd198f3e102536feae42034
|
147.6 kB | Preview Download |
md5:ddde2e160d05c3c2c45a1e35ad74bd58
|
231.9 kB | Preview Download |
md5:a571aba1cadbd824186ff0e97d674bb7
|
154.2 kB | Preview Download |
md5:1647d02b8513611f6d717632159faed2
|
191.5 kB | Preview Download |
md5:72b1db4bf1fd663b690bf3b7bf40285d
|
106.3 kB | Preview Download |
md5:adfd2b6bd7cb50f499415df0fe989d01
|
106.1 kB | Preview Download |
md5:6269e97c16d6f79eaaf46b4e425696c8
|
211.5 kB | Preview Download |
md5:7cfc3bfea1b6ef67cce2f95a4f36fe74
|
190.7 kB | Preview Download |
md5:1060a732e5367f9d620243dce4acdb86
|
105.0 kB | Preview Download |
md5:832109287860ff189d0c523b67bae608
|
133.4 kB | Preview Download |
md5:7e3c2fbdebeba2ae6e5c26e3a84e0e91
|
264.2 kB | Preview Download |
md5:ce95b64f926e17148266c32d93f6472d
|
141.6 kB | Preview Download |
md5:42e1d04219c28c6902e4cdbd9fd157de
|
181.2 kB | Preview Download |
md5:8d831d83a68b0a4b5a7b535704523ceb
|
90.7 kB | Preview Download |
md5:dc8b3e2974c0b97b3a2836009fac89ad
|
90.7 kB | Preview Download |
md5:6dc8ab9178aa185b883e6918cea34dd0
|
202.9 kB | Preview Download |
md5:f598cf87211df3582c6b9866402cb98d
|
179.7 kB | Preview Download |
md5:93f8e17324f131325d767c594866e8eb
|
14.7 kB | Preview Download |