Accelerating Real-Time Object Detection with YOLOv8m and Intel’s Optimization Tools

Rishiraj Acharya
3 min readJun 9, 2023

In today’s fast-paced world, efficient and accurate object detection plays a crucial role in various domains, including autonomous driving, surveillance systems, and smart cities. To address this need, I have developed a prototype for custom object detection that leverages the cutting-edge YOLOv8m model, optimized with Intel Distribution for Python, Intel Optimization for TensorFlow, and Intel Optimization for PyTorch from Intel AI Kit. Additionally, post-training quantization techniques have been employed to maximize resource utilization on Coral hardware, resulting in unparalleled performance and efficiency. This blog post will provide a detailed technical overview of the use case, implementation details, and future plans of this prototype.

Use Case:

The custom object detection prototype is specifically designed to handle real-world scenarios, encompassing a wide range of weather conditions, lighting conditions, and road environments. It is trained to detect pedestrians, vehicles, traffic signs, and traffic signals, making it an ideal solution for applications such as autonomous vehicles, traffic management systems, and surveillance cameras.

Implementation Details:

1. YOLOv8m Model: The YOLOv8m model is the backbone of the custom object detection pipeline. It has been chosen due to its exceptional performance in terms of accuracy and speed. YOLOv8m is trained on a diverse dataset, enabling it to handle various real-world scenarios effectively.

2. Intel Distribution for Python: Intel Distribution for Python provides optimized versions of popular Python packages for scientific computing and machine learning. By utilizing this distribution, the prototype benefits from enhanced performance and accelerated execution, thanks to the optimizations specifically designed for Intel architectures.

3. Intel Optimization for TensorFlow and PyTorch: Intel Optimization for TensorFlow and PyTorch offers deep learning frameworks with enhanced performance on Intel processors. By leveraging these optimizations, the prototype achieves further acceleration during both training and inference stages.

4. Post-Training Quantization: To maximize resource utilization on Coral hardware, the prototype employs post-training quantization techniques. This process converts the 32-bit float parameter data of the model into highly efficient 8-bit fixed representations. The resulting tflite model not only achieves compatibility with the Edge TPU but also ensures optimal utilization of available resources, leading to improved speed and efficiency.

Future Plans:

1. Model Refinement: The prototype can be further improved by fine-tuning the YOLOv8m model on specific datasets related to the targeted use case. This refinement process will enhance the model’s accuracy and enable it to handle specialized scenarios more effectively.

2. Hardware Optimization: In addition to Coral hardware, the prototype can be optimized for other Edge AI platforms and accelerators. This will extend its compatibility and enable deployment on a broader range of devices, catering to diverse application requirements.

3. Real-Time Deployment: The current implementation focuses on achieving exceptional performance during inference. However, future plans involve developing a real-time deployment pipeline that integrates seamlessly with the underlying hardware and provides continuous object detection capabilities.

Conclusion:

The custom object detection prototype presented in this blog post combines the power of the state-of-the-art YOLOv8m model with Intel Optimization tools and post-training quantization techniques. The result is an ultra-fast and accurate object detection solution that excels in real-world scenarios. By leveraging Intel Distribution for Python, Intel Optimization for TensorFlow, and Intel Optimization for PyTorch, the prototype achieves unparalleled performance on Intel architectures. Furthermore, post-training quantization ensures maximum resource utilization on Coral hardware, making it an ideal solution for deployment on Edge TPU and Coral hardware. With future plans focusing on model refinement, hardware optimization, and real-time deployment, this prototype paves the way for efficient and effective object detection in various domains.

--

--

Rishiraj Acharya

GDE in ML (Gen AI, Keras) | GSoC '22 at TensorFlow | TFUG Kolkata Organizer | Hugging Face Fellow | Kaggle Master | MLE at Tensorlake, Past - Dynopii, Celebal