Monday, October 24, 2016

Back again to update with another project. This time we are exploring a data set that comes out of Iowa, for liquor sales in the state. It has 18 columns, including date of sales, costs associated, and locations, and most weren't needed. The first thing I had to do was look through the data, and see if it was appropriate types and values.

Surprisingly, most of the columns are appropriate types. A few things needed to be changed, though.


Voila! After tidying up the data, now it's ready to be worked with. A good next step would be to explore the data, and get a rough picture of what we're dealing with. I decided to see which counties were performing the best in sales. This would give us a quick overview of which stores to maybe focus in on.

Well, that doesn't give us much. Looking at the code:


This will return us a list with the highest and lowest 10 for total sales in dollars and the amount of transactions. Using this treemap, we can see that a couple stores represent a large portion of the liquor sales in Iowa for this time period, and many smaller stores fill in the remaining.

This could lead to more questions, such as why do certain stores perform better, neighborhoods, etc. Also need to consider what kind of store it is. Grocery stores that sell liquor will probably sell higher volumes total, because people are already there and just have to add it to their cart. Small gas stations on a remote highway would not fair as well. However, many of these questions are past the scope of this project, so moving on.



Using Linear Regression, I sought out to try and determine if the sales for 2015 would be a good predictor for the sales of 2016. I made a new dataframe with the columns I wanted to be used as parameters for the prediction (average of bottles sold, average of volume sold, average of bottle_volume, total of sales), and another one for the results I desired (sales). I did not remove outliers.


It has a pretty high R^2 score, so chances were good it would predict 2016 well.


So it looks like 2016 will be a better selling year than 2015, and 2015 was a pretty good representation of what we could expect to get in 2016. The difference is about $100k.

No comments:

Post a Comment