Exploring Object Detection Models

Rishiraj Acharya
3 min readJan 6

Object detection is a crucial task in the field of computer vision, as it involves identifying and locating objects in an image or video. There are various object detection models that have been developed over the years, each with its own unique features and capabilities. I’ll use my experience of participating in the TensorFlow — Help Protect the Great Barrier Reef competition on Kaggle to look into some of them.


One of the most popular object detection models is YOLO (You Only Look Once). YOLO is a real-time object detection model that is able to identify objects in an image or video with high accuracy and speed. It uses a single convolutional neural network (CNN) to predict the bounding boxes and class probabilities of objects in an image.

The key to YOLO’s efficiency is its single-shot detection approach, where it processes the entire image in one pass and makes predictions for all objects in the image simultaneously. This allows it to process images and videos at a fast rate, making it a popular choice for real-time applications such as autonomous vehicles and security systems.

YOLO also has a high accuracy rate, as it is able to effectively handle complex images with multiple objects and different scales. However, it is prone to false positives and may struggle with small objects or objects with low contrast.


Another popular object detection model is FasterRCNN (Region-Based Convolutional Neural Network). Like YOLO, FasterRCNN is a real-time object detection model that is able to identify objects in an image or video with high accuracy. However, it uses a two-stage approach to object detection, where the first stage generates a set of region proposals and the second stage classifies the objects within these region proposals.

The first stage of FasterRCNN uses a CNN to generate region proposals, which are then fed into the second stage for classification. This two-stage approach allows for more precise object detection, as it is able to focus on specific regions of the image rather than processing the entire image at once.

FasterRCNN is known for its high accuracy and has been used in a variety of applications, including object tracking, image captioning, and object recognition. However, it has a slower processing speed compared to YOLO due to its two-stage approach.


Rishiraj Acharya

GSoC '22 at TensorFlow 👨🏻‍🔬 | TFUG Kolkata Organizer 🎙️ | Microsoft, DeepLearning.AI Ambassador ✨ | Kaggle Master, BIPOC Mentor 🧠 | Dynopii MLE 👨🏻‍💻