Deep Learning to Predict Customer Delivery Times (CDT)- a Practical Approach
E-commerce has greatly impacted the way we shop for products and services. This can be attributed to various factors like — convenient home delivery of the order in a desired time window (e.g 10:00–10:30 AM), ability to browse and search through a ginormous catalog of products with a few clicks, discounts and sales, etc. In this article, we give a broad overview of the delivery process and how we are using machine learning to better estimate Customer Delivery Time (CDT). CDT affects the final delivery plan that our associates need to execute and is tightly coupled to customer experience and our business.
Delivery time prediction has long been a part of city logistics, but refining accuracy has recently become very important for services such as Deliveroo, Foodpanda and Uber Eats which deliver food on-demand.
These services and similar ones must receive an order and have it delivered within ~30 minutes to appease their users. In these situations +/- 5 minutes can make a big difference so it’s very important for customer satisfaction that the initial prediction is highly accurate and that any delays are communicated effectively.
Last mile delivery is the last step in a typical supply chain where the package (order, parcel, etc.) gets shipped to desired customer’s address from a store, warehouse or any facility. Last mile delivery systems are highly complex for a retailer as we need to serve different customers scattered in the areas where fulfillment (home delivery) is available. Adding to that, we need to be able to fulfill various orders within specified time-slots.
It is also critical to the business, as there is a good chance of incurring significant losses if the last mile delivery execution is sub-optimal. Under-utilizing or overloading physically constrained parts of the system (e.g. accepting new orders that won’t fit in the delivery vehicle because it is almost full, or, reserving space and time capacities conservatively leading to a poorly occupied delivery vehicle) are two simple scenarios of incurring loss in Last Mile logistics. This might result in an increase in the price of products or delivery fee for customers. Such a customer experience may prove detrimental to the revenue and customer retention rate. In addition we also want to capture the trend where the routes might get affected by scenarios like COVID which is critical for running the operations.
To avoid these issues, many delivery operators have a last mile optimization system that solves a version of Constrained Vehicle Routing Problem with Time Windows (CVRP-TW) which is a well-researched topic in the Operations Research and Computer Science community. This is a NP-hard combinatorial optimization problem which means that finding an exact solution to it is at least as hard as in the case of a nondeterministic polynomial time problem. A schematic of the problem is presented below
There are multiple methods to solve CVRP-TW close to optimality. However all these methods need some sort of distance/time matrix that tells the total estimated delivery time between each point (origin) to every other point (destination) specified in the problem. Lets say there are 100 orders you want to fulfill from a store,the distance/time matrix is of size 101 * 101. We can reduce this to half of the size obtaining a triangular matrix if we assume that the road network is symmetrical, a simplifying assumption, though. Given this distance matrix along with other input data, the optimization algorithm tries to maximize an objective function that can be as simple as just minimizing the total driving time or distance or highly complex one incorporating other aspects of business metrics like slot availability, slot promise, customer satisfaction, etc. while adhering to constraints. In industrial setting, there are a lot of constraints and buffers that we need to account for in order to ensure smooth operations. The optimization algorithm tries to optimize the solution to the CVRP-TW problem iteratively and outputs the solution in a human readable format known as a delivery plan. The delivery plan consists of a lot of details for operations executives and delivery associates to be able to execute the fulfillment process sequentially for the day in the most optimal way.
Now that we have understood the fundamentals, we discuss an important aspect, we call Customer Delivery Time (CDT), which is tied to the Total Delivery Time used to create the distance/time matrix.
What is Customer Delivery Time (CDT)?
It may be defined as the time required by a delivery associate to deliver an order to the customer without accounting for driving time or scheduled breaks. Simply put, for an order delivery, we can define the Total Delivery Time (TDT) is the sum total of Driving Time (DT) and Customer Delivery Time (CDT) and any Break Time (BT).
TDT = DT + CDT + BT
Since the CDT is a significant portion of TDT — which is an input to the optimization algorithm, the resulting solution and delivery plan is heavily dependent on it. We need to account for CDT to ensure smooth operations, good customer satisfaction and on-time delivery especially for grocery orders that contain perishable items. For instance, the delivery associate may not find parking near the customer’s location easily — no street parking in crowded urban areas, inability to locate customer’s house (house might be occluded), etc. In other scenarios, a customer might have ordered a lot of items requiring the delivery associate to fulfill the order in multiple steps. Underestimation of CDT might strain the delivery associate as they have less time assigned to fulfill an order and subsequent ones. Overestimation of CDT will lead to inefficient use of the working time and poor slot metrics i.e. we might have been able to deliver an additional order. Typical GPS and navigation systems/services do not provide this information. In this context, we talk about how we are predicting CDT (as accurately as possible) using Machine Learning.
The hypothesis is that CDT depends on some variables :
- Location : address, type of address (apartment, house, condo)
- Order Related : order size (how big and heavy), time of delivery (evening, morning, afternoon)
- External : weather, events on that particular day, new customer (finding address might be harder)
CDT = g(X) ... hypothesis that CDT depends on variables X
To test this hypothesis we collected historical data for more than a year.
X = [ address, address_type, order_size, delivery_time_of_day, weather, events, new_customer ]
We transform these set of input features to be used by our machine learning algorithm. For instance, we parse the address to determine the floor a customer is located on (e.g. 1st floor vs 5th floor, will affect CDT drastically especially if the associate has to make multiple trips from the vehicle to customer). Lets say that post transformation we get
X_transformed = f(X) ... f represents various transformations
Notice that we also know historically how much time was actually spent on CDT for each of the fulfilled orders which is the target variable.
y = CDT ... to be predicted
We therefore can use Supervised Learning to learn the relationship between our features (X_transformed) to the target (y). Notice that this is a regression problem as we need to predict an estimate for CDT. We trained various models based on different algorithms like Linear Regression, Random Forest, Nearest Neighbor, Neural Networks, etc. and also perform hyperparameter tuning to get the best performance in terms of accuracy without underfitting or overfitting. We also took into account the latency of predictions while deciding on which model is deployed which is discussed in the next section. We choose L2 loss function as our objective and report various error metrics like RMSE, MAE, MAPE. L2 is chosen as we can tolerate +/- errors in predictions that are near the actual value but we cannot tolerate large errors as it has real-life implications
# Notice y = CDT is y_actual
# y0 = CDT predicted by our modelimport numpy as np
N = len(y) # number of samples being evaluatedMAE = (1/N) * np.sum(np.abs(y - y0))
RMSE = (1/N) * np.sum(np.square(y - y0))
MAPE = (1/N) * np.mean(np.abs((y - y0) / (y + 1e-6))) * 100
Since the predictions of the model affect our supply chain capacities (especially time constraints), we need access to this prediction pipeline very frequently as it will be invoked at a customer-order level. Also we need it to be dependable (no downtime) and scalable (able to handle high request throughput). As briefly mentioned, this is tied to slot visibility for our customers (if we have no capacity, we cannot fulfill more orders). One such example of why there might be multiple calls is shown below.
In the above scenario, a customer is shopping for items X,Y,Z,F which are added to the cart. Based on the inputs required for the CDT model, we obtain a prediction CDT1 and the slot availability is shown. If for some reason the customer decides to change their mind and remove items Z, F and add N and M it would change the CDT as we can see here that the updated CDT value predicted by the model (taking the new set of inputs) is CDT2. There are many reasons why CDT1 > CDT 2 or CDT 1 < CDT 2. In the next section, we devote our attention to understanding how the system is made scalable and robust.
We use Docker© for containerizing the various pieces in our system. We wrap all the dependencies required to perform input data collection, featurization, call to the model and run it as an independent service. This is done to ensure that the prediction service is portable. In addition it gives the benefit of being cloud agnostic. We use Python to create the API and the framework to serve the prediction pipeline. To meet high request throughput rates, we leverage an asynchronous web framework FastAPI. We then deploy this FastAPI application on a production ready Uvicorn server which is also ASGI (Asynchronous Server Gateway Interface). Multiple Uvicorn workers are spawned and due to its async nature, we see the gain in performance is significant especially with high throughput. FastAPI+ Uvicorn is one of the fastest frameworks out there using Python and gives a performance very close to NodeJS based serving. Here is a comparison of various options for serving applications in python.
We load a pre-compiled model from object storage and load it into memory during runtime which is a one time setup cost. Our containerized docker service takes care of model versioning so we always use the latest model for making predictions. This makes the service less dependable on external factors and dependencies while being highly performant and scalable. The below diagram illustrates the flow
Architecture of CDT service
Caching and Multithreading
Our service makes predictions in real time and we may need to use it via multiple upstream services. This would mean that every time a request is sent, we run the model even for the same set of input features. To avoid this, we implement a distributed caching mechanism. This enables saving time in running the model once again because the API call gets reduced to a cache retrieval operation.
Cache failures are treated as misses and in such a case, we run the model. We run cache operations on separate threads to decrease any latency and obtain fast response times. A sample code using python’s native threading looks to set cache key and value looks like
import threadingthread = threading.Thread(target=_setcache, args=(cdtrequest,feature,cdtvalue))
This has significantly improved the response time and makes the service more resilient.
Route planning is needed to cut transportation costs and provide efficiency in Logistics and Operations. Online orders for delivery are ~20M -25M per year for ASDA and correct estimation of CDT can increase that count by around 15–20 % in addition to the planning the routes efficiently which improvise the last mile planning by ~10%. This gives us cost savings and additional revenue by more than $4–5 million per year .