In this project, we train a deep learning model to recognize four types of sports racquets - badminton, tennis, squash and pickleball. They may look similar, but telling them apart can actually be quite tricky! This blog post is based on a hands-on Jupyter Notebook, so you can follow along or try it out yourself. The link to the Jupyter Notebook is shared at the bottom of the page.
This end-to-end notebook walks through:
- Data collection (including automated image scraping),
- Model training and tuning using
resnet34
, - Interpreting loss and error metrics the fastai way,
- Testing on unseen images with manual labeling,
- And finally, saving the model for future use.
The goal isn't just to build a working classifier — it's to understand the process, reason through decisions and lay the foundation for more complex computer vision projects down the line.
Setup Instructions¶
Run the following commands in your terminal to install the necessary dependencies:
pip install fastai duckduckgo_search
- fastai is a high-level deep learning library built on top of PyTorch, which we will use to train and evaluate our racquet image classifier.
# Check for fastai dependencies
import sys
try:
from fastai.vision.all import *
print("fastai is installed.")
except ImportError:
print("fastai not found. Please install it via: pip install fastai")
sys.exit(1)
fastai is installed.
# Define racquet categories
# These names will be passed in the image search request and used to create folders for the images
sports = ["tennis racquet", "badminton racquet", "squash racquet", "pickleball racquet"]
# Base image directory
base_dir = Path("images")
Create folders where images will be downloaded¶
Note: Run the below cell only once, else you may delete all the downloaded images. You will have to download them all again.
from pathlib import Path
import shutil
# Clean up and create folders
for item in sports:
sport_folder = item.split()[0].lower() # Get 'tennis' from 'tennis racquet'
folder = base_dir / sport_folder
if folder.exists():
shutil.rmtree(folder)
print(f"Removed existing folder: {folder.resolve()}")
folder.mkdir(parents=True, exist_ok=True)
print(f"Created folder: {folder.resolve()}")
Downloading Racquet Images¶
In the next step, we will download 300 images for each racquet category using the duckduckgo_search
library. Although the images are ultimately sourced from Bing, we're using the DuckDuckGo interface to bypass the bot protection, rate limiting, and API key restrictions that come with direct access to Bing or Google.
DuckDuckGo itself leverages Bing under the hood for image search results (reference: Hacker News). When you search for images on DuckDuckGo in a browser, the content is silently fetched from Bing, often proxied through DDG to enhance user privacy.
This approach is fine atleast for our small scale and non-commercial project.
import time, requests
from pathlib import Path
from duckduckgo_search import DDGS
from PIL import Image
from io import BytesIO
from PIL import ImageOps
MAX_SIZE = 400 #pixel size
DELAY = 0.5 #seconds
# Validate image bytes (not corrupted)
def is_valid_image(img_bytes):
try:
img = Image.open(BytesIO(img_bytes))
img.verify() # Check corruption
# Re-open the image to access dimensions and file type
# This is necessary because verify() doesn't load the image data
# Refer: https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.verify
img = Image.open(BytesIO(img_bytes))
if img.width < 200 or img.height < 200:
print(f"Image too small: {img.width}x{img.height}")
return False
return True
except Exception:
return False
# Main download function
def download_images_bing(query, folder_path, max_images):
print(f"\n~~~ Searching for: {query}")
count = 0
with DDGS() as ddgs:
results = ddgs.images(query, max_results=max_images)
for result in results:
url = result.get("image")
if not url:
continue
try:
response = requests.get(url, timeout=10)
response.raise_for_status()
img_bytes = response.content
if not is_valid_image(img_bytes):
continue
img = Image.open(BytesIO(img_bytes))
file_ext = img.format.lower()
if file_ext not in ["jpeg", "jpg", "png"]:
print(f"Unsupported image format: {file_ext}")
continue
#resize the images greater than MAX_SIZE to MAX_SIZE
img = Image.open(BytesIO(img_bytes))
if img.width > MAX_SIZE or img.height > MAX_SIZE:
print(f"Resizing image: {img.width}x{img.height}")
img = ImageOps.contain(img, (MAX_SIZE, MAX_SIZE)) # Maintains aspect ratio
img_bytes_io = BytesIO()
img.save(img_bytes_io, format=file_ext)
img_bytes = img_bytes_io.getvalue()
filename = f"{query.replace(' ', '_')}_{count:03d}.{file_ext}"
filepath = Path(folder_path) / filename
with open(filepath, "wb") as f:
f.write(img_bytes)
count += 1
print(f"Saved: {filepath}")
time.sleep(DELAY) # preventive measure for possible rate limiting on ddg
except Exception as e:
print(f"Error downloading {url}: {e}")
print(f"Finished downloading {count} images for '{query}'")
In the above code along with downloading and validating images, we also resize images greater than 400 pixels (shorter side), while maintaining aspect ratio. This is done because:
- Large images consume significantly more GPU, RAM, and disk space.
- They slow down data loading and training.
- For image classification tasks, a 400px resolution is typically sufficient and strikes a good balance between accuracy and efficiency.
Note : In production datasets, we should also apply data augmentation techniques such as random cropping, flipping, rotation, brightness adjustments etc to some images randomly. This helps the model generalize better and become robust to variations in real-world inputs.
However, for this small-scale project, we’re intentionally skipping augmentations to keep the workflow focused and easy to understand.
# Download images for each racquet category
MAX_IMAGES = 300
for sport in sports:
folder = Path("images") / sport.split()[0].lower()
download_images_bing(sport, folder, MAX_IMAGES)
# Verify image downloads
for sport in sports:
folder = Path("images") / sport.split()[0].lower()
# Count all image files in the folder
file_count = len(list(folder.glob("*.*")))
if file_count == 0:
print(f"No images found in {folder}.")
else:
print(f"{file_count} images found in {folder}.")
Each category was supposed to download 300 images, but the actual counts are:
- Tennis: 260 images
- Badminton: 191 images
- Squash: 176 images
- Pickleball: 272 images
Some images were skipped due to:
- Unsupported formats like
.webp
- Dimensions smaller than 200×200 pixels
- Access issues (e.g., 403 Forbidden)
- Corrupted or invalid files
Our data is downloaded and validated!
Note: In production systems, image scraping, downloading, validation and preprocessing are never done within a training notebook or script. These tasks are treated as part of the data ingestion pipeline and are typically handled as separate jobs or services.
Let's proceed to train our model.¶
We'll first define a DataBlock
.
A DataBlock
is a high level API in fastai
for building datasets and DataLoaders
. It allows us to define how to get our input items, how to label them, how to split the data and what transforms to apply.
You can refer to the official documentation here: https://docs.fast.ai/data.block.html#DataBlock
Below is the configuration we'll use for this project. Each line will be explained after the code block:
db = DataBlock(
blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(valid_pct=0.2, seed=42),
get_y=parent_label,
item_tfms=[Resize(192, method='pad')]
)
Explainaition of parameters in the code above:
blocks
: The inputs to our model are images, and the outputs are categories (in our case, "badminton","squash" etc).get_items
: A function to retrieve raw items — in our case: image file paths.splitter
: A function that returns training and validation indices. Here we use a random 80/20 split with a fixed seed for reproducibility.get_y
: A function to extract the label from each item — we’re using the parent folder name as the label.item_tfms
: Transformations applied to each item — here, resizing images to 192×192 by padding it. Padding maintains the aspect ratio.
OK, so now we have defined the blueprint:
- What type of inputs/targets to expect
- How to get them
- How to split the dataset
- What transforms to apply
But no actual data is loaded or processed at this point.
Create the dataloader
dls = db.dataloaders(base_dir, bs=64)
What happened in the above cell?
The blueprint that we created in the previous step is applied to the actual data present under base_dir
(in our case it is the images/
folder).
The dataloaders()
method scans the /images directory, retrieves all image files that match the blueprint, splits them into training and validation sets, applies any item transforms (like resizing), and finally batches the data.
Batch Size (bs=64
): This specifies that the DataLoaders object should create batches of 64 images. Each batch is a single, collective unit that’s fed into the model during training. Using a moderate batch size like 64 helps balance memory usage and training speed.
The returned dls
object is an instance of the DataLoaders
class. It encapsulates:
dls.train
: A DataLoader for the training set.dls.valid
: A DataLoader for the validation set.
These sub-DataLoaders are built on top of PyTorch's DataLoader but with additional fastai functionalities. Read the official doc here: https://docs.fast.ai/data.load.html
This step has finalized our data ingestion pipeline. The created DataLoaders object (dls
) serves as the interface between our dataset and our model during training.
Training Our Model¶
We will begin by fine-tuning an existing, well-established computer vision model. Fine-tuning is the process of adapting a model that has already been trained on a large, general-purpose dataset to a new, more specific task. This saves time and cost for us.
For this project, we will be using ResNet18
, a lightweight yet powerful convolutional neural network pretrained on the ImageNet dataset. It offers a great balance between speed and accuracy, making it ideal for rapid experimentation and limited compute environments. By fine-tuning ResNet18, we can adapt its learned representations to accurately classify images across our four racquet categories.
Here we go.
To train our model, we will use the vision_learner
API from the fastai vision module, which is specifically designed to streamline transfer learning for computer vision tasks. This API encapsulates everything needed to set up a Learner
- a core abstraction in fastai that binds together a model, dataloader (which we created above) and a loss function. You can read more about the Learner
class here.
vision_learner
simplifies the process of leveraging pretrained models (such as ResNet18) for classification. It automatically handles the necessary setup for transfer learning, including proper initialization of the model's final layers to suit our dataset.
Once the learner is configured, we invoke fine_tune(3)
, which trains the model for 3 epochs. This method first freezes the pretrained base to train only the new classification head, and then gradually unfreezes and fine-tunes the entire model. Official doc of fine_tune ia available here.
learn = vision_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(3)
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 1.721788 | 0.769392 | 0.284916 | 00:26 |
/home/commando/python_envs/learn_ml/lib/python3.12/site-packages/PIL/Image.py:1045: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images warnings.warn( /home/commando/python_envs/learn_ml/lib/python3.12/site-packages/PIL/Image.py:1045: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images warnings.warn( /home/commando/python_envs/learn_ml/lib/python3.12/site-packages/PIL/Image.py:1045: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images warnings.warn( /home/commando/python_envs/learn_ml/lib/python3.12/site-packages/PIL/Image.py:1045: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images warnings.warn(
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.716825 | 0.562591 | 0.212290 | 00:51 |
1 | 0.523383 | 0.567538 | 0.206704 | 00:47 |
2 | 0.400680 | 0.560361 | 0.195531 | 00:57 |
/home/commando/python_envs/learn_ml/lib/python3.12/site-packages/PIL/Image.py:1045: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images warnings.warn( /home/commando/python_envs/learn_ml/lib/python3.12/site-packages/PIL/Image.py:1045: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images warnings.warn( /home/commando/python_envs/learn_ml/lib/python3.12/site-packages/PIL/Image.py:1045: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images warnings.warn( /home/commando/python_envs/learn_ml/lib/python3.12/site-packages/PIL/Image.py:1045: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images warnings.warn( /home/commando/python_envs/learn_ml/lib/python3.12/site-packages/PIL/Image.py:1045: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images warnings.warn( /home/commando/python_envs/learn_ml/lib/python3.12/site-packages/PIL/Image.py:1045: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images warnings.warn( 22519.95s - thread._ident is None in _get_related_thread! /home/commando/python_envs/learn_ml/lib/python3.12/site-packages/PIL/Image.py:1045: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images warnings.warn( /home/commando/python_envs/learn_ml/lib/python3.12/site-packages/PIL/Image.py:1045: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images warnings.warn( /home/commando/python_envs/learn_ml/lib/python3.12/site-packages/PIL/Image.py:1045: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images warnings.warn( /home/commando/python_envs/learn_ml/lib/python3.12/site-packages/PIL/Image.py:1045: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images warnings.warn( /home/commando/python_envs/learn_ml/lib/python3.12/site-packages/PIL/Image.py:1045: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images warnings.warn( /home/commando/python_envs/learn_ml/lib/python3.12/site-packages/PIL/Image.py:1045: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images warnings.warn( /home/commando/python_envs/learn_ml/lib/python3.12/site-packages/PIL/Image.py:1045: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images warnings.warn(
Training Results¶
The model was trained locally on my laptop which is a standard alptop with no dedicated GPU.
- CPU: Intel i7 13th Gen
- Memory: 16 GB
- GPU: Integrated Intel Graphics
Despite lacking a dedicated GPU, training was completed in a reasonable time of around 3 minutes for 3 epochs.
Here are the training results:
Initial Phase (frozen base layer)¶
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 1.823223 | 0.750006 | 0.240223 | 00:20 |
Fine-Tuning Phase (unfrozen model)¶
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.663238 | 0.580100 | 0.201117 | 00:53 |
1 | 0.510640 | 0.544243 | 0.173184 | 00:51 |
2 | 0.372500 | 0.517990 | 0.150838 | 00:52 |
As seen, the error rate steadily decreased with each epoch, indicating the model is learning to distinguish racquet types better over time. The final error rate of ~15% is quite decent for a first pass, especially considering the small dataset used by us.
Before we start analyzing our results more deeply, it’s important to understand the two key components commonly found in modern deep learning workflows: the base model and the custom classifier.
The base model (also called the pretrained model, feature extractor or sometimes the backbone) is typically a convolutional neural network (eg ResNet18) that has already been trained on a large and diverse dataset (eg ImageNet). Its job is to extract general visual features (like edges, textures, and patterns) that are useful across many tasks.
The custom classifier (also referred to as the head or task-specific classifier) is a set of new layers added on top of the base model. These layers are trained specifically for our problem - in this case, classifying images of racquets into categories.
With this structure in mind, let's break down how our model was trained in two distinct phases.
The training was performed in two distinct phases:
Initial Phase (Frozen Base Model)¶
Pretrained Base Model (ResNet18):
We began with ResNet18, a convolutional neural network pretrained on ImageNet. This model has already learned to identify general visual patterns—such as edges, textures, and shapes—that are widely applicable across image classification tasks.Frozen Weights:
In this stage, the base model's weights are frozen, meaning they are not updated during training. This preserves the valuable feature extraction capabilities the model has learned from the large and diverse ImageNet dataset.Custom Classifier (Task-Specific Layers):
On top of this frozen base model, a custom classifier was added by fastai’svision_learner
function. This classifier is a set of fully connected layers specifically designed to map the extracted features to our target classes—different types of racquets in our case.In this phase, only the custom classifier is trained. It learns how to interpret the high-level features produced by the base model and map them to the correct racquet class (tennis, badminton, squash or pickleball).
Why This Matters:
- It allows for quick adaptation to our domain-specific data with minimal training effort.
- It ensures that the general visual understanding built into the base model is not disturbed, giving us a solid foundation without requiring retraining from scratch.
Inspecting the Model¶
To view the details of the model you can use-
learn.model #shows the complete architecture.
learn.model[0] #shows the base layer (ResNet18).
learn.model[1] #shows the head (custom classification layers).
Once our custom classifier (head) has learned to interpret the features extracted by the base model, we move to the next phase.
Fine-Tuning Phase¶
Unfreezing the Base Model¶
In this phase, we unfreeze the layers of the base model (ResNet18). This means all the convolutional layers—which were previously frozen to preserve their pretrained knowledge are now allowed to update their weights during training.
Why unfreeze?
Because the base model was originally trained on a generic dataset (ImageNet), and now it’s time to adapt those generic features to our specific task: recognizing different types of racquets.For instance, ImageNet may have taught the model to recognize general features like curves, grips or mesh, but our racquet dataset may need a bit more tuning to differentiate between, say, a badminton racquet and a tennis racquet.
Now, both the base model and the custom classifier are trained together:
- The classifier continues to improve its ability to make task-specific predictions.
- The base model begins to adjust its filters to extract slightly more specialized features tailored to racquet identification.
Key Benefits¶
- This phase allows for deeper adaptation to our racquet dataset. This turns out to be helpful if our dataset is visually different from ImageNet.
- It helps the model refine feature extraction for better accuracy especially for edge cases.
This two-phase approach is the basis of transfer learning
as it leverages existing knowledge of ResNet18 and quickly adapting to our specific task (classifying racquet images).
Detailed Explanation of Result Table Headers¶
Now let’s understand the meaning of each column in our training results and interpret the data.
Lets revisit the result of Phase 1
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 1.823223 | 0.750006 | 0.240223 | 00:20 |
Here’s a detailed analysis of each column and what the numbers mean:
1. epoch¶
An epoch is one full pass over the entire training dataset. Multiple epochs allow the model to iteratively adjust its internal weights to better fit the data. So, if you train for 3 epochs, the model will see each training sample 3 times — potentially improving its predictions each time.
If you remember, couple of cells above, while creating the dataloader we had configured the batch size to 64.
dls = db.dataloaders(base_dir, bs=64)
Let’s say we have a total of 800 images for all racquets combined. We split this dataset into 80% for training and 20% for validation. That gives us 640 training images.
Given a batch size of 64, the 640 training images are divided into 10 batches. The model is trained on each of these batches sequentially during one epoch. For each batch, it:
- performs a forward pass to make predictions,
- compares predictions with actual labels to compute training loss,
- performs backpropagation and weight updates.
After all 10 batches are processed, that completes 1 epoch. The training loss for the epoch is the average of the individual batch losses.
Once training for that epoch is complete, the model is evaluated on the validation dataset (160 images):
- It makes predictions on the validation set without updating the weights.
- From this, we get the validation loss and error rate.
In short:
- Epoch = One complete pass over the training data only
- Validation = Happens after each epoch, using the held-out validation set to assess generalization performance.
2. train_loss (Training Loss)¶
During the training phase, the model learns by adjusting its internal parameters — the weights of the neural network — to minimize errors in its predictions. Training loss is the key metric that tells us how far off the model’s predictions were from the actual labels on the training data.
For each batch, the model does the following:
- Forward Pass – It makes predictions for all 64 images in the batch.
- Loss Calculation – It compares these predictions with the actual labels using a loss function (cross-entropy in our case).
- Backpropagation – Based on this loss, it computes gradients and updates the weights to reduce the error.
This gives us one training loss of this particular batch.
Once all 10 batches (1 epoch) are processed:
- The training losses from each batch are averaged to produce the epoch-level training loss.
- This value gives us a sense of how well the model is doing on the data it is actively learning from.
A lower training loss over time typically indicates that the model is learning.
In our racquet classification project, the vision_learner
function from fastai automatically selects an appropriate loss function based on the task. Since we are solving a multi-class classification problem fastai uses CrossEntropyLoss.
You can confirm this by:
learn.loss_func
FlattenedLoss of CrossEntropyLoss()
3. valid_loss (Validation Loss)¶
While the training loss tells us how well the model is performing on the training data, the validation loss is our best indicator of how well the model is likely to perform on unseen data (like production environment).
Once the model completes one epoch, it’s time to evaluate how well it generalizes. This is where the validation set comes in — the 20% portion of the data we had held back and not used during training.
Here’s what happens step-by-step:
- The model makes predictions on the entire validation set.
- It compares those predictions with the actual labels (just like it does with the training data).
- Then it calculates the average loss over all validation samples using the same loss function (in our case, CrossEntropyLoss).
- This average is the validation loss.
4. error_rate¶
Error Rate tells us what percentage of predictions made by the model were incorrect. This gives us a more human-readable performance measure.
Here is how it is calculated-
After an epoch completes:
- The model is run on the validation dataset.
- For each image, the model outputs predicted probabilities for all classes (racquet types).
- The class with the highest probability is selected as the model's prediction.
- This predicted label is compared against the actual label.
If they match : Correct.
If they don't : Incorrect.
The Error Rate is then calculated as:
Error Rate = (Number of Incorrect Predictions)/(Total Number of Validation Samples)
You will often see some models reporting accuracy — which is simply:
Accuracy = 1 - Error Rate
So suppose in our example, Error Rate is 0.24 (~24%), Accuracy will be 1-0.24 = 0.76 (76%)
5. time¶
The elapsed time to complete one epoch. This metric helps gauge how quickly the model is training. It can be influenced by factors like dataset size, model complexity and hardware (e.g., CPU vs. GPU).
Summary¶
Header | Data Used | Purpose | Is Lower Better? |
---|---|---|---|
epoch | — | Indicates how many full passes the model has made over the training dataset. | — |
train_loss | Training set | Measures how well the model fits the training data; used to guide weight updates during training. | Yes |
valid_loss | Validation set | Evaluates how well the model generalizes to unseen data; key for detecting overfitting. | Yes |
error_rate | Validation set | Proportion of incorrect predictions on validation data (1 - accuracy). | Yes |
time | Whole epoch | Duration taken to complete one full epoch (training + validation). | — |
Training Loss vs. Error Rate¶
A common confusion that some people may have (I certainly had in the beginning) is: what’s the difference between validation loss and error rate?
Though both are evaluated on the validation set, they measure very different things. Let’s break down the differences in the table below:
Aspect | Validation Loss | Error Rate |
---|---|---|
Definition | A numerical value from the loss function (e.g., cross-entropy) | The fraction of incorrect predictions |
What it Measures | How well the model's probability distribution matches the true labels | Whether the model's top predicted class is correct or not |
Type | Continuous — can range across real values (e.g., 0.543) | Discrete — typically between 0 and 1 (e.g., 0.17 means 17% wrong) |
Sensitivity | Sensitive to confidence in correct predictions | Only considers right vs wrong, regardless of confidence |
Output Basis | Calculated from all predicted probabilities | Calculated from final class labels after argmax |
Interpretability | More nuanced but harder to interpret directly | Very interpretable — “X% predictions were wrong” |
Goal | Minimize it to improve model confidence and accuracy | Minimize it to reduce outright classification errors |
Use Case | Guides training and optimization of the model | Helps judge real-world prediction performance |
Example:¶
Suppose for an image of a tennis racquet, our model predicts the following probabilities:
- Tennis: 0.55
- Badminton: 0.40
- Squash: 0.03
- Pickleball: 0.02
Here is how to interpret it:
- Prediction is correct, because tennis has the highest probability and matches the true label.
- However, the confidence (55%) is not very high—this means the model wasn’t very sure.
- Validation loss (e.g., cross-entropy loss) will still be moderately high, because it penalizes low confidence even when the prediction is correct.
- Error rate = 0, because the top predicted class is correct.
To Summarize:
- Use validation loss for model optimization and fine-tuning.
- Use error rate for a high-level, human-readable performance metric.
Interpreting Train Loss, Valid Loss and Error Rate¶
Jeremy Howard, co-founder of fastai and a respected authority in AI, emphasizes that overfitting in deep learning is widely misunderstood. Contrary to popular belief, a lower training loss compared to validation loss is not a sign of failure. It is expected and reflects a properly trained model. He says that overfitting is rare in modern deep learning and requires deliberate effort to induce, such as disabling safeguards like data augmentation, dropout or weight decay.
The true indicator of overfitting isn’t the loss gap but a rising validation error rate - the point where the model’s prediction accuracy on unseen data deteriorates despite improving training performance. He clarifies that we should focus on error rates (e.g., misclassifications) rather than obsessing over loss values. As long as error rate improves, longer training is beneficial, even if validation loss increases.
Summary:
- Error rate (not loss) determines overfitting.
- Modern models generalize well unless stripped of regularization tools.
- Continue training until error rate plateaus or start increasing.
Testing the Trained Model on Unseen Images¶
Our model is now fully trained, and it's time to evaluate its performance on unseen images of racquets.
To do this, I downloaded a handful of racquet images from the internet and placed them in a separate folder named test_images
, which resides at the same level as this Jupyter notebook file.
To make evaluation easier, I manually renamed each file by prefixing the filename with the first letter of the correct category:
eg:
t_1.jpg
→ Tennis racquetb_3.png
→ Badminton racquets_4.jpeg
→ Squash racquetp_2.jpg
→ Pickleball racquet
This convention allows us to automatically extract the correct label for each image and compare it with the model’s prediction.
#set the test images path
test_images_folder = Path("test_images")
prefix_to_label = {
'b': 'badminton',
't': 'tennis',
's': 'squash',
'p': 'pickleball'
}
'badminton'
With the setup in place, we are now ready to test our model against the unseen racquet images.
We iterate over each image in the test_images
folder and use our trained model to make predictions. At the same time, we infer the correct label from the filename prefix (b, t, s, or p) using a simple dictionary mapping.
total = 0
correct = 0
for img_file in test_images_folder.ls():
if img_file.suffix.lower() in ['.jpg', '.png', '.jpeg']:
first_letter = img_file.name[0].lower() #p/b/t/s
if first_letter not in prefix_to_label:
print(f" -- Skipping :: Unknown prefix '{first_letter}' in filename '{img_file.name}'.")
continue
correct_label = prefix_to_label.get(first_letter)
pred_label, pred_idx, probs = learn.predict(img_file)
#get the probabilities for each category
prob_str = ""
for i in range(len(probs)):
prob_percentage = round(probs[i].item()*100, 2)
prob_category = learn.dls.vocab[i]
prob_str = f"{prob_str} , {prob_category[0]}: {prob_percentage}"
prob_str = prob_str[3:]
result = "xxx Incorrect xxx"
if pred_label == correct_label:
result = "Correct"
correct += 1
total += 1
print(f" * {img_file.name} | Actual: {correct_label} | Predicted: {pred_label} | Probabilities: [{prob_str}] | {result}")
else:
print(f"\n -- Skipping :: Unsupported file ext: {img_file.name}")
print(f"\n\nTotal images: {total}. Correct: {correct}.")
accuracy = correct / total * 100
print(f"Accuracy: {accuracy:.2f}%")
-- Skipping :: Unknown prefix 'r' in filename 'r_3.png'.
* p_2.jpeg | Actual: pickleball | Predicted: pickleball | Probabilities: [b: 0.0 , p: 99.91 , s: 0.01 , t: 0.08] | Correct -- Skipping :: Unsupported file ext: p_football.webp
* b_broken_1.jpg | Actual: badminton | Predicted: badminton | Probabilities: [b: 80.58 , p: 0.04 , s: 1.09 , t: 18.3] | Correct
* t_2.jpeg | Actual: tennis | Predicted: tennis | Probabilities: [b: 0.36 , p: 7.9 , s: 0.12 , t: 91.63] | Correct
* p_1.png | Actual: pickleball | Predicted: pickleball | Probabilities: [b: 0.0 , p: 100.0 , s: 0.0 , t: 0.0] | Correct
* t_1.png | Actual: tennis | Predicted: squash | Probabilities: [b: 2.24 , p: 0.0 , s: 58.06 , t: 39.7] | xxx Incorrect xxx
* s_2.png | Actual: squash | Predicted: squash | Probabilities: [b: 33.58 , p: 0.0 , s: 63.36 , t: 3.06] | Correct
* b_1.png | Actual: badminton | Predicted: badminton | Probabilities: [b: 99.75 , p: 0.01 , s: 0.03 , t: 0.2] | Correct
* p_22.png | Actual: pickleball | Predicted: pickleball | Probabilities: [b: 0.0 , p: 99.81 , s: 0.01 , t: 0.18] | Correct
* b_broken_ch_2.png | Actual: badminton | Predicted: badminton | Probabilities: [b: 88.69 , p: 0.2 , s: 7.98 , t: 3.14] | Correct
* t_3.jpeg | Actual: tennis | Predicted: badminton | Probabilities: [b: 73.21 , p: 0.03 , s: 0.89 , t: 25.87] | xxx Incorrect xxx
* s_1.png | Actual: squash | Predicted: squash | Probabilities: [b: 0.01 , p: 0.0 , s: 99.96 , t: 0.03] | Correct Total images: 11. Correct: 9. Accuracy: 81.82%
Result Evaluation¶
In the above code I tested the performance of our model on a set of 13 images.
We skipped 2 files having:
- Unknown prefix (
r_3.png
) - Unsupported file format (
p_football.webp
)
The model made predictions on 11 valid images. For each one, we displayed:
- Actual category (inferred from filename)
- Predicted category
- Prediction confidence scores for each category
- Whether the prediction was correct or incorrect
Here’s a snapshot of how our model performed:
- Correct predictions: 9
- Incorrect predictions: 2
- Accuracy: 81.82%
The 2 incorrect predictions were:
t_1.png
(actual: tennis) → predicted as squash.t_3.jpeg
(actual: tennis) → predicted as badminton.
Conclusion¶
Despite a small test set (170-270 images per category), our model showed promising generalization. It classified images with pretty decent accuracy and minimal tuning. With an accuracy of 81.82%, the model is doing well but there's a scope for improvement. Here’s how to push it further:
- Add More and Better Training Data¶
- Increase dataset size
- Add variety: different angles, lighting, backgrounds etc. This helps the model generalize better to real-world cases.
- Use Data Augmentation¶
- Apply transforms like rotation, zoom, lighting changes and flipping.
- Simulates real-world distortions and improves robustness.
- Fastai provides several APIs to makes this easy.
- Fine-Tune the Whole Model¶
The model can be further improved using advanced techniques like unfreezing the backbone, discriminative learning rates and progressive resizing. These approaches allow deeper layers to adapt to task-specific patterns more effectively. However, these are broader topics and go beyond the scope of this project. We may explore them more deeply in future projects.
This wraps up the training and evaluation phase. Our model is now ready to be integrated into real-world applications or further optimized for production use.
Last but not the least, let's see how we can save our trained model.
Saving the Model¶
Saving the model allows us to:
- Preserve all the learned weights and architecture.
- Avoid retraining from scratch every time.
- Quickly reload the model for inference or further fine-tuning.
In fastai, it’s straightforward:
learn.export('racquet_classifier.pkl')
This creates a file named racquet_classifier.pkl, which contains everything needed to make predictions later — including the model architecture, trained weights and class mappings.
By default, the model will be saved in the same directory as your notebook. You can also specify a different path if needed:
learn.export('/path/to/folder/racquet_classifier.pkl')
# Save the trained model
learn.export('racquet_classifier.pkl')
To load the model later:
learn_reborn = load_learner('racquet_classifier.pkl')
Now learn_reborn is ready to classify racquets — no retraining required!
#Lets try to predict the category of a badminton image using the reborn model
pred_label, pred_idx, probs = learn_reborn.predict('./test_images/b_1.png')
pred_label #it successfully predicts the category as badminton
'badminton'
Thank You!¶
I hope this small project helped you understand not just how to train a deep learning model, but also how to interpret its performance and put it to practical use. If you found even a small part of it useful, my efforts are successful.
Till next time, happy learning!
You can find the source code of this blog post which is actually a Jupyter notebook here.