NSS Partner Project, Python for BI

Go Back to the Home Page

Find the Project on GitHub

Overview

Efficency Improvement

Project Purpose

The purpose of this project was to develop an application that assists employees in recommending additional shoes and accessories to customers during the try-on process. The current workflow involves employees manually selecting three additional pairs of shoes to sell alongside the requested shoe. However, this process lacks data-driven insights and may not effectively maximize sales opportunities.

To address this issue, we aimed to create a recommendation system that utilized historical sales data to suggest the top three shoe and accessory recommendations for each unique shoe sold in conjunction with another shoe or accessory. By leveraging this system, employees can be equipped with this valuable information to enhance their upselling capabilities and provide personalized suggestions to customers.

My Contribution

My contribution to the project involved developing a Python script that generated the recommended shoe and accessory options based on the products sold together in the past. Furthermore, I have recently improved the computational time needed by 96%, reducing the time to find recommendations for all shoes and accessories from 81 days to 3 days.

Technical Process

Tools

Python

(Pandas)

Code Summary

The provided Python code performs several operations to generate shoe and accessory recommendations based on sales data. It imports the necessary libraries and reads multiple CSV files into data frames using pandas. It merges the sales data with the product data to add the department information. It drops unnecessary columns from the resulting data frame and creates new columns to differentiate between shoes and accessories. The code then merges the sales data with the stores data to add location hierarchy information. It creates separate data frames for shoe-to-shoe sales and shoe-to-accessory sales based on specific conditions.

Next, there are functions defined, 'shoe_recomendations', 'shoe_recomendations2', 'acces_recomendations' and 'acces_recomendations2', which are used to find the top three shoe or accessory recommendations given a shoe's stock number. The 'shoe_recommendations2' and 'acces_recomendations2' functions are modified versions of the 'shoe_recommendations' and 'acces_recomendations' functions. They take additional parameters region, district, state, and store, which allow for more specific filtering of the sales data. Both functions are similar in operation with the only difference being one accessing the shoe-to-shoe sales data frame and the other accessing the shoe-to-accessory sales data frame. Ultimately, the functions provide recommendations for a given stock number based on specific filters and returns the recommendations, the count of combo purchases, and the hierarchy level that the recommendation comes from (store, state, district, region, or division).

To explain further, here's a breakdown of the 'shoe_recomendations2' function:

The function takes the following parameters:
- stock_num: The primary stock number for which you want to find recommendations.
- region (optional): The region for which you want to find recommendations.
- district (optional): The district for which you want to find recommendations.
- state (optional): The state for which you want to find recommendations.
- store (optional): The store for which you want to find recommendations.
The function first filters the shoe_shoe_sale DataFrame based on the specified filters (region, district, state, store). If a location hierarchy is not specified it will provide a recommendation based on the sales of all the stores.
Next, it determines the hierarchy level based on the lengths of the input lists (store, state, district, region). The hierarchy level represents the granularity of the recommendations: 'store' if only one store is specified, 'state' if only one state is specified, 'district' if only one district is specified, 'region' if only one region is specified, and 'division' for all other cases.
The function creates a new DataFrame called recommendation_df by filtering the shoe_shoe DataFrame for rows where the PRIMARY_STOCKNO matches the input stock_num. It selects the columns PRIMARY_STOCKNO, SECONDARY_STOCKNO, and performs value counts to get the number of times each combination occurs. The resulting DataFrame is sorted in descending order.
The column names of recommendation_df are updated to 'PRIMARY_STOCKNO', 'SECONDARY_STOCKNO', and 'COUNT' for clarity.
The function initializes two empty lists, recommendations and combo_purchase_count.
A for loop iterates over the first three rows of recommendation_df. For each row, it checks if the SECONDARY_STOCKNO matches the input stock_num. If it does, the PRIMARY_STOCKNO is added to the recommendations list, and the COUNT is added to the combo_purchase_count list. If the SECONDARY_STOCKNO doesn't match, the SECONDARY_STOCKNO is added to the recommendations list, and the COUNT is added to the combo_purchase_count list.
Finally, the function returns three values: recommendations, combo_purchase_count, and hierarchy (the hierarchy level determined earlier).

After the functions, the code creates two data frames. The first data frame contains each distinct shoe sold per region while the second does the same for each distinct accessory. It then splits the data frames into smaller chunks to process them separately. Lastly, a for loop iterates over a specific data frame chunk, calling the shoe_recomendations2 function for each primary stock number. It populates the resulting recommendations and their counts into the corresponding columns in the DataFrame. A separate loop is then used to find the accessory recommendations.

Efficency Improvement

The efficiency improvement was achieved by optimizing the last two for loops responsible for collecting shoe and accessory recommendations. Initially, these loops utilized pandas data frame operations such as 'iterrows()' and 'loc' to access and modify the data. In the enhanced version, the data was stored in a dictionary, enabling direct indexing, and the loops were iterated using the more efficient 'range()' function.

Page updated

Google Sites

Report abuse