Mo Chen’s Coffee - A Journey Through Data

Author

Teslim Uzomah

Published

October 24, 2023

1 The Data:

Mo Chen has provided access to the data. Click here to access the data download link. The data originally covered the years 2019 to 2022. I just changed it to span from 2020 to 2023 to make it more enjoyable. If you view this project in 2024 you can change the span to 2021 up to 2024 hope you get the idea.

1.1 Project Background:

Mo Chen is my favorite YouTuber specializing in data analytics. He is a data and analytical analyst working in the financial industry. He initiated an Excel-based project centered around a coffee shop. Recently, he established a Discord group in response to subscriber requests to facilitate direct interaction.

I saw this as an opportunity to immerse myself in the data, turning it into a personal challenge and seizing the chance to include this project in my portfolio website. My goal is to enhance the project beyond what Mo Chen has achieved. My approach involves transforming data science projects into easily understandable narratives.

1.2 The Why:

This is my way of giving back to Mo Chen’s YouTube community, Discord community, and the broader data science community. I aim to make my work accessible to everyone, demonstrating that any data project can be explored in greater depth.

1.3 Setting the Scene:

The data comprises tables for orders, products, and customers, all contained within a single Excel workbook. Initially, I used Excel to create three separate workbooks for each table.

Next, I imported the data into Power BI, establishing connections between tables using keys to ensure that any future data entered will automatically update the dashboard.

I employed Power Query in Power BI to refine the data, addressing issues such as missing values and duplicates (though in this data, there were no missing values or duplicates). I also performed column renaming and created new columns, such as ‘total revenue,’ calculated as the unit price multiplied by the quantity sold.

Moving forward, I developed measures using DAX (Data Analysis Expressions), drawing insights from the knowledge I gained from a YouTuber named ‘Sweatpant BI.’ If you’re interested in learning more about Power BI, I recommend checking out his channel.

Lastly, I ensured that my Power BI file is accessible to others, providing them with the opportunity to explore and contribute to the project’s improvement.

2 Dashboard

To view and interact with the dashboard I created, Click here to view and here to download.

3 The project in a Nutshell

Mo Chen had a dream: to reach a revenue target of $14,500 by the end of 2023 in the USA, Ireland, and the United Kingdom. To understand how the business was performing and the loyalty of their customers, Mo Chen decided to bring in a data analyst named Teslim. Teslim’s task was to analyze data from January 2020 to August 2023.

3.1 Loyal vs. Non-Loyal Customers

Teslim started by sorting customers into two groups: loyal and non-loyal. Currently, they had 479 loyal customers and 521 non-loyal customers. The big question was, why do some people keep coming back, while others don’t?

3.2 Revenue and Products

Next, they looked at the money they’ve made so far, which was $45,134. They also checked which products sold the most. It turned out that Excelsa was the top seller. Most of their sales came from the United States, with $35,639, while Ireland and the United Kingdom contributed $6,697 and $2,799, respectively.

3.3 Top Cities and Customers

Together, Mo Chen and Teslim figured out the top 5 cities where their products sold the most:

  1. Washington, D.C., USA - $1,064
  2. Houston, USA - $820
  3. Toledo, USA - $774
  4. New York City, USA - $773
  5. Sacramento, USA - $628

They also identified the top 5 customers based on spending:

  1. Allis Wilmore - $317
  2. Brenn Dundredge - $307
  3. Terri Farra - $289
  4. Nealson Cuttler - $282
  5. Don Flintiff - $278

Mo Chen planned to send gifts to these top customers in September 2023.

3.5 Dashboard Showing All Time Sales Performance

4 Phase Two Analysis

We will investigate the year 2022 and our current year, 2023.

In August 2022, on the exact day from the previous year (2023), our revenue was $7,958. This figure represents the Previous Year to Date (Previous YTD).

Currently, in August 2023, our revenue stands at $7,063, representing Year to Date (YTD).

Our Year-over-Year (YoY) growth percentage is -11.2%, indicating a decline in revenue.

To visualize the trend, we’ve compared 2022 and 2023 in a line chart, clearly showing the drop in August 2023 compared to August 2022.

4.1 Dashboard Showing 2022 vs 2023 Sales Performance

Now, considering our revenue goal for 2023, Mo Chen’s target is to reach $14,500. The question arises: Are we on track to achieve this goal? As Mo Chen reviews the dashboard, he’s visibly concerned but patiently awaits the full story.

Our progress towards the goal is currently at 48.7%. Can we attain it? Mo Chen persists with his questions, and Teslim, our data analyst, reassures him to wait for the complete analysis.

Moving on, in 2023, Liberica is leading in sales, which doesn’t come as a surprise based on historical data. Liberica consistently ranks among the top two coffee types in terms of revenue.

Robusta shows a 10.6% increase, Liberica has grown by 3.7% from 2022, but Excelsa and Arabica have experienced decreases of -18.9% and -34.0%, respectively. Mo Chen has left to confer with the product team, probing for potential issues with quality, quantity, or packaging.

Shifting our focus to countries, it appears that the USA has seen a -5.6% decrease in purchases. This is significant as the USA houses the majority of our customers.

Ireland has experienced a -22.6% decline, and the UK has plummeted by -61.1%. This drastic drop in the UK market raises concerns, especially since one of our top five customers is a UK citizen. Mo Chen questions if there might be issues related to quality, shipping delays, or other factors.

The logistics (shipping) team needs to present their data, and we must analyze the behavior of customers from the UK, including their reviews and time spent on our site. This additional data will be crucial for further investigation.

Resolving these issues is essential for our future prospects.

Based on the data we’ve examined, we can identify some significant issues, which we will communicate directly to Mo Chen.

4.2 Uncovering the Path to Success

Mo Chen, the owner of Mo Chen’s Coffee, was faced with the million-dollar question: would they hit their revenue goal of $14,500? The journey had hit a few bumps, and things weren’t going as planned.

4.3 Hitting Goal?

4.3.1 Using the Power of Python

# Import necessary libraries
import numpy as np        # NumPy for numerical operations
import pandas as pd       # Pandas for data manipulation
import seaborn as sns     # Seaborn for data visualization
import matplotlib.pyplot as plt   # Matplotlib for plotting


# Import Prophet, a time series forecasting library
from prophet import Prophet

# Ignore any warnings to keep the output clean
import warnings
warnings.filterwarnings("ignore")

# Set the style of the plots to 'ggplot' and 'fivethirtyeight'
plt.style.use('ggplot')
plt.style.use('fivethirtyeight')

# Read an Excel file
df = pd.read_excel('ana.xlsx')
# Convert the 'Order_Date' column to datetime format
df['Order_Date'] = pd.to_datetime(df['Order_Date'])

4.3.2 Sales by Month by Year

Code
# Extract the year and month from the 'Order_Date' column
df['Year'] = df['Order_Date'].dt.year
df['Month'] = df['Order_Date'].dt.month

# Group the data by year and month and calculate the total revenue for each group
monthly_revenue = df.groupby(['Year', 'Month'])['Total_Revenue'].sum().unstack()

# Add a 'Total' column
monthly_revenue['Total'] = monthly_revenue.sum(axis=1)

# Rename the columns to use month names
month_names = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
monthly_revenue.columns = month_names + ['Total']

# Display the monthly revenue as a table
monthly_revenue.T
Year 2020 2021 2022 2023
Jan 828.985 566.950 837.685 1269.415
Feb 987.405 1798.340 958.830 393.630
Mar 1021.140 914.790 1544.640 1315.205
Apr 1680.750 761.810 1005.585 776.450
May 398.565 939.355 907.690 1002.370
Jun 1384.680 1438.440 864.530 1155.390
Jul 1004.135 1308.945 763.105 906.730
Aug 706.345 300.400 1075.910 244.245
Sep 1277.020 713.050 1643.575 NaN
Oct 884.970 1514.705 1400.400 NaN
Nov 823.385 1108.865 1616.180 NaN
Dec 1189.785 751.895 1147.980 NaN
Total 12187.165 12117.545 13766.110 7063.435

4.3.3 In both 2021 and 2023, August experienced a dip. However, we finished 2021 with $12,117. The question is, can we reach Mo Chen’s goal of $14,500?

Code
plt.rcParams['font.family'] = 'Arial'
plt.rcParams['font.size'] = 10

# Filter the DataFrame to include only 2021 and 2023
filtered_df = df[df['Year'].isin([2021, 2023])]

# Group by Year and Month, and sum the Total_Revenue
grouped_data = filtered_df.groupby(['Year', 'Month_Short'])['Total_Revenue'].sum().unstack()

# Define the order of months starting from January
months_order = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

# Plotting the data
plt.figure(figsize=(8.8, 5))
for year, data in grouped_data.iterrows():
    # Reorder the data based on the defined months_order
    data = data[months_order]
    plt.plot(data.index, data.values, label=year)

plt.xlabel('')
plt.ylabel('')
plt.title('Total Revenue Over Months by Year (2021 and 2023)')
plt.legend(title='Year')
plt.grid(True)
plt.show()

4.3.4 What is our Year-to-Date Sales?

Code
# Filter the data for January to August 2023
print("-" * 80)  

ytd_sales_jan_to_aug_2023 = df[(df['Order_Date'].dt.year == 2023) & \
   (df['Order_Date'].dt.month >= 1)\
    & (df['Order_Date'].dt.month <= 8)]['Total_Revenue'].sum()

print(f"Year-to-Date Sales from January to August 2023: ${ytd_sales_jan_to_aug_2023:.2f}")

print("-" * 80)  
--------------------------------------------------------------------------------
Year-to-Date Sales from January to August 2023: $7063.43
--------------------------------------------------------------------------------

Currently, we have $7,063 in sales from January 2023 to August 2023. We want to forecast how much we will make for the remaining months of the year, which are September, October, November, and December.

4.3.5 Financial Forecasting through Growth Rate Analysis

Projections are educated guesses about what you expect to earn in the future.

Let’s break down the mathematical calculations step by step for forecasting total revenue for the year 2023 based on data from 2020, 2021, and 2022.

  1. Calculate Total Revenue for Each Year:

    1. For the year 2020:
      • Sum up all the ‘Total_Revenue’ values in your data for 2020.
    2. For the year 2021:
      • Sum up all the ‘Total_Revenue’ values in your data for 2021.
    3. For the year 2022:
      • Sum up all the ‘Total_Revenue’ values in your data for 2022.
  2. Calculate the Growth Rate:

    The growth rate represents how much the revenue has changed from 2020 to 2022, as a percentage of the revenue in 2020.

    • Subtract the total revenue for 2020 from the total revenue for 2022: Growth = (Total Revenue in 2022 - Total Revenue in 2020)

    • Divide the growth by the total revenue for 2020 and multiply by 100 to express it as a percentage: Growth Rate = [(Total Revenue in 2022 - Total Revenue in 2020) / Total Revenue in 2020] * 100

  3. Forecast 2023 Total Revenue:

    To estimate the total revenue for 2023, we will use the growth rate calculated in step 2.

    • Multiply the total revenue for 2022 by (1 + Growth Rate / 100): Forecasted Total Revenue in 2023 = Total Revenue in 2022 * (1 + Growth Rate / 100)

This forecasted total revenue for 2023 is an estimate based on the observed growth rate from 2020 to 2022. It assumes that the same percentage increase will apply to 2022’s revenue to predict 2023’s revenue.

Code
# Filter data for the years 2020, 2021, and 2022
start_date = pd.to_datetime('2020-01-01')
end_date = pd.to_datetime('2022-12-31')
filtered_df = df[(df['Order_Date'] >= start_date) & (df['Order_Date'] <= end_date)]

# Calculate total revenue for each year
revenue_2020 = filtered_df[filtered_df['Order_Date'].dt.year == 2020]['Total_Revenue'].sum()
revenue_2021 = filtered_df[filtered_df['Order_Date'].dt.year == 2021]['Total_Revenue'].sum()
revenue_2022 = filtered_df[filtered_df['Order_Date'].dt.year == 2022]['Total_Revenue'].sum()

# Calculate the growth rate from 2020 to 2022
growth_rate = (revenue_2022 - revenue_2020) / revenue_2020

# Use the growth rate to forecast 2023 total revenue
forecasted_revenue_2023 = revenue_2022 + (growth_rate * revenue_2022)

print("-" * 80)  
print(f"Total Revenue in 2020: ${revenue_2020:.2f}")
print(f"Total Revenue in 2021: ${revenue_2021:.2f}")
print(f"Total Revenue in 2022: ${revenue_2022:.2f}")
print(f"Forecasted Total Revenue in 2023: ${forecasted_revenue_2023:.2f}")
print("-" * 80)  
--------------------------------------------------------------------------------
Total Revenue in 2020: $12187.17
Total Revenue in 2021: $12117.55
Total Revenue in 2022: $13766.11
Forecasted Total Revenue in 2023: $15549.62
--------------------------------------------------------------------------------

Based on our growth rate calculation, it appears that by the end of 2023, we are projected to achieve a revenue of $15,549.62. However, it’s important to understand what this growth rate represents.

  • “Projected to achieve” is used to clarify that the revenue of $15,549.62 is an estimation or projection based on the growth rate calculation.

-“it’s important to understand what this growth rate represents” emphasizes the need to provide context and explanation regarding the significance and implications of the calculated growth rate.

4.3.6 “How Our Sales Went on a Roller Coaster Ride: From 2021’s $12k to 2023’s $15.5k Mystery!”

The sales numbers have got my curiosity piqued, and I couldn’t help but dig deeper into the data. Here’s the deal: in August 2021, we were rocking it with sales at $8,029 , and by the end of that year, we reached a cool $12k. But fast forward to August 2023, and we’re at $7,063. Now, hold onto your hat because our calculations suggest we’re on track to finish 2023 with a whopping $15,549.62! I don’t know about you, but explaining this to Mo Chen is going to be a real brain-teaser! 🤯💰📈

Code
# Extract the year and month from the 'Order_Date' column
df['Year'] = df['Order_Date'].dt.year
df['Month'] = df['Order_Date'].dt.month

# Filter the data for months from January (1) to August (8) and years 2021 and 2023
df_filtered = df[(df['Year'].isin([2021, 2023])) & (df['Month'] >= 1) & (df['Month'] <= 8)]

# Group the filtered data by year and month and calculate the total revenue for each group
monthly_revenue = df_filtered.groupby(['Year', 'Month'])['Total_Revenue'].sum().unstack()

# Add a 'Total' column
monthly_revenue['Total'] = monthly_revenue.sum(axis=1)

# Rename the columns to use month names (January to August)
month_names = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug']
monthly_revenue.columns = month_names + ['Total']

# Display the monthly revenue as a table
monthly_revenue.T
Year 2021 2023
Jan 566.950 1269.415
Feb 1798.340 393.630
Mar 914.790 1315.205
Apr 761.810 776.450
May 939.355 1002.370
Jun 1438.440 1155.390
Jul 1308.945 906.730
Aug 300.400 244.245
Total 8029.030 7063.435

5 Using Machine Learning : Simple Facebook Prophet Model

Code
# Create a time series from the Total_Revenue column
time_series = df.groupby('Order_Date')['Total_Revenue'].sum().resample('M').sum()

# Create a DataFrame for Prophet with 'ds' and 'y' columns
prophet_df = pd.DataFrame({'ds': time_series.index, 'y': time_series.values})

# Create and fit the Prophet model
prophet_model = Prophet()
prophet_model.fit(prophet_df)

# Forecast horizon for Prophet (monthly data for 2023)
forecast_horizon = 12

# Create future dates for Prophet
future_dates = prophet_model.make_future_dataframe(periods=forecast_horizon, freq='M')

# Forecast with Prophet
prophet_forecast = prophet_model.predict(future_dates)

# Plot the historical data and Prophet forecast
plt.figure(figsize=(8.8, 5))
plt.plot(time_series, label='Historical Revenue')
plt.plot(prophet_forecast['ds'], prophet_forecast['yhat'], label='Prophet Forecast', linestyle='--')
plt.legend()
plt.xlabel('')
plt.ylabel('')
plt.title('Revenue Forecast with Prophet')
plt.grid(True)
plt.show()
19:43:51 - cmdstanpy - INFO - Chain [1] start processing
19:43:51 - cmdstanpy - INFO - Chain [1] done processing

Code
# Create a time series from the Total_Revenue column
time_series = df.groupby('Order_Date')['Total_Revenue'].sum().resample('M').sum()

# Create a DataFrame for Prophet with 'ds' and 'y' columns
prophet_df = pd.DataFrame({'ds': time_series.index, 'y': time_series.values})

# Create and fit the Prophet model
prophet_model = Prophet(yearly_seasonality=True)
prophet_model.fit(prophet_df)

# Forecast horizon for Prophet (assuming monthly data for 2023)
forecast_horizon = 12

# Create future dates for Prophet
future_dates = prophet_model.make_future_dataframe(periods=forecast_horizon, freq='M')

# Forecast with Prophet
prophet_forecast = prophet_model.predict(future_dates)

print("-" * 80)
# Print the Prophet forecast for 2023
print(f"Prophet Forecast for 2023:\n{prophet_forecast[['ds', 'yhat']].tail(forecast_horizon)}")
print("-" * 80)
19:43:52 - cmdstanpy - INFO - Chain [1] start processing
19:43:52 - cmdstanpy - INFO - Chain [1] done processing
--------------------------------------------------------------------------------
Prophet Forecast for 2023:
           ds         yhat
44 2023-09-30  1313.329582
45 2023-10-31  1395.020153
46 2023-11-30  1569.759191
47 2023-12-31   727.870164
48 2024-01-31  1107.930917
49 2024-02-29  1451.851904
50 2024-03-31   939.740924
51 2024-04-30  1358.201314
52 2024-05-31   484.294435
53 2024-06-30  1377.070654
54 2024-07-31  1163.915293
55 2024-08-31   787.089105
--------------------------------------------------------------------------------

From January to August 2023, we generated $7,603 in revenue. Now, using the Facebook Prophet model, here are our revenue predictions for the upcoming months:

  • September: $1,313
  • October: $1,395
  • November: $1,570
  • December: $729

When we add all these numbers together:

$7,603 + $1,313 + $1,395 + $1,570 + $729 = $12,610

According to Facebook Prophet, it seems unlikely that we will achieve our revenue goal. This metric is a better one to report to Mo Chen than the latter.

5.1 Tuning

Note: We can further improve the model by performing cross-validation and tuning.

6 Beyond our Data: What Next?

We couldn’t predict which customers would stay loyal using machine learning. In this case, we should conduct a survey and ask our non-returning customers (those who aren’t loyal) why they haven’t come back.

We noticed a drop in sales from 2022 to 2023:

  • In the United Kingdom, sales decreased by 61%. It’s worth noting that one of our top 5 customers is from the UK.
  • The USA, which accounts for the majority of our customers, saw a decrease of 5.6%.
  • Ireland experienced a 22.6% decrease in sales.

When it comes to our products Excelsa and Arabica, we saw decreases of 18.9% and 34%, respectively. These figures could provide valuable insights for our survey.

To address these challenges, the marketing team should examine our competition, and the website development team should gather more data on customer behavior to make data-driven improvements to our site.

7 A Brighter Future

With all these insights in hand, Mo Chen felt a renewed sense of clarity about his business. He now knows where to focus his efforts to solve problems and ensure he reached his revenue goal. Moreover, he has a dashboard to track crucial metrics, helping him steer the ship toward success.

The journey continued, and Mo Chen was determined to overcome any obstacles on the path to achieving his ambitious goal. With Teslim’s guidance and the power of data analysis, the future looked brighter than ever for Mo Chen’s Coffee.