Shivam Thukral

Senior Software Engineer - Robotics and Perception

Locus Robotics

Hi!

My primary research focus revolves around deep learning algorithms and their integration into robotic systems to enhance their intelligence, physical consistency, and seamless interaction with human counterparts. As a Senior Software Engineer specializing in Machine Learning and Robotics at Locus Robotics, I actively contribute to the advancement of autonomous mobile robots, enabling them to perceive their surroundings and make intelligent decisions.

I recently completed my Master of Science degree in Computer Science from the University of British Columbia, under the supervision of Ian M. Mitchell. Before enrolling at UBC, I was a research fellow at TCS Research and Innovation Labs, where I contributed to the automation of warehouse robotics under the guidance of Swagat Kumar and Rajesh Sinha. Additionally, I hold a Bachelor’s degree in Computer Science from IIIT Delhi, where I worked under the supervision of Rahul Purandare and closely collaborated with P.B. Sujit.

My experience and expertise lie in pushing the boundaries of what is possible with machine learning in the field of robotics, continuously striving to create systems that are not only autonomous but also capable of sophisticated interaction and collaboration with humans.

Interests

Robotics (Perception + Planning)
Computer Vision
Deep Learning for Images and Pointclouds
Competitive Programming

Education

MSc in Computer Science, 2022

University of British Columbia, Vancouver
BTech in Computer Science Engineering, 2017

Indraprastha Institute of Information Technology, Delhi

Experience

Senior Software Engineer - Robotics and Perception

Locus Robotics

March, 2024 – Present Vancouver

Object Detection Developed an advanced object detection system, Locus Learning, utilizing transfer learning with to detect LocusBots, persons, carts and other objects in real-time within indoor warehouse environments.

Optimized model inference by porting it from Python to C++, achieving a 15% reduction in inference time and a 35% decrease in CPU load.
Single-handedly integrated the object detector into the existing Locus framework, converted PyTorch weights into ONNX format for faster Intel iGPU inference, and introduced a lightweight inference visualizer for enhanced detection performance.
Integrating a state-of-the-art Kalman-based Multi-Object Tracker, ByteTracker, with the Object detector. This advanced tracking system is employed to track and avoid forklifts in warehouses.

Cart Filter for Vectors Designed and implemented a per-sensor filtering system to exclude carts from sensor data once a robot attaches itself to a cart, significantly improving navigation performance and reducing mission time by 14% (Video).This system prevents the robot from misinterpreting the attached cart as an external obstacle, allowing for smoother trajectory planning and avoiding unnecessary detours.

LiDAR Filtering: Each LiDAR sensor operates its own dedicated filtering module to dynamically remove cart leg reflections from the scan data. The filtering range is fully customizable, allowing users to adjust parameters on-the-fly based on specific cart dimensions or deployment environments.
Depth Camera Filtering: For every depth sensor, a real-time mask is computed based on the camera’s field of view and the known geometry of the cart. These masks filter out cart-related depth points before they reach the obstacle costmap. The entire processing pipeline is offloaded to the onboard iGPU, ensuring minimal latency and maintaining real-time performance under compute constraints.

Software Engineer - Robotics and Perception

Locus Robotics

April, 2022 – March, 2024 Vancouver

Forklift Detection via Apriltags Used a new Apriltag3 family with Forklifts to avoid robot-forklift collisions. Detected fiducial markers are used to define dynamic ‘danger zones’ in the robot’s costmap, treated as obstacles during navigation. This approach significantly reduced collision incidents, lowering repair costs and boosting operational efficiency.

Fiducial Marker Detection Upgraded the fiducial marker detection system to AprilTag3, resulting in a 22% increase in frame processing speed and a 28% improvement in recall. Replaced image undistortion with Region of Interest (RoI) rectification for tag detectors, reducing NUC load by approximately 5%. Additionally, integrated Locus’s fiducial markers with the state-of-the-art deep-learning tag detector, DeepTag, to enhance overall detection accuracy and efficiency.

Camera Calibration Replaced individual camera calibrations with a standard calibration matrix for all cameras mounted on the robot, ensuring calibration errors remained within 1% of use-case-specific tolerance limits. This approach streamlined the calibration process, reducing robot deployment time by 6% and eliminating the need for per-camera calibrations for each robot.

Graduate Research Assitant

University of British Columbia

May, 2020 – February, 2022 Vancouver

We propose to augment smart wheelchair perception with the capability to identify potential docking locations in indoor scenes. ApproachFinder-CV is a computer vision pipeline that detects safe docking poses and estimates their desirability weight based on hand-selected geometric relationships and visibility. Although robust, this pipeline is computationally intensive. We leverage this vision pipeline to generate ground truth labels used to train an end-to-end differentiable neural network that is 15 times faster.

ApproachFinder-NN is a point-based method that draws motivation from Hough voting and uses deep point cloud features to vote for potential docking locations. Both approaches rely on just geometric information, making them invariant to image distortions. A large-scale indoor object detection dataset, SUN RGB-D, is used to design, train, and evaluate the two pipelines.

Potential docking locations are encoded as a 3D temporal desirability cost map that can be integrated into any real-time path planner. As a proof of concept, we use a model predictive controller that consumes this 3D costmap with efficiently designed task-driven cost functions to share human intent. This wheelchair navigation controller outputs a nominal path that is safe, goal-oriented, and jerk-free for wheelchair navigation.

Graduate Teaching Assistant

University of British Columbia

September, 2019 – December, 2021 Vancouver

Designed and graded homework assignments, quizzes, and examinations for the following courses:

Research Software Engineer, TCS Research and Innovation Labs

TATA Consultancy Services

August, 2017 – August, 2019 Noida

I participated in several research projects focused on warehouse automation using industrial manipulators. My work included 3D pose estimation of heterogeneous-sized boxes using point clouds and motion planning for Universal Robots with ROS.
Here are some selected projects I worked on:

Long Distance Container (LDC) Packing Video
Chitrakar: Robotic System for Drawing Jordan Curve of Facial Portrait Video
Amazon Robotic Challenge Video

For a detailed description of these projects, please refer to my Curriculum Vitae.

Undergraduate Research Assitant

Indraprastha Institute of Information Technology (IIIT), Delhi

August, 2016 – December, 2016 Delhi

Developed BugFlood, an optimal path planning algorithm inspired by the bug algorithm, to efficiently compute paths in obstacle-rich environments or report the absence of a viable path. This approach simulates virtual bugs that, upon encountering an obstacle, split into two bugs that explore the obstacle boundary in opposite directions until they find the goal in their line of sight. We compared the performance of our algorithm with various planners from the Open Motion Planning Library (OMPL) and Visibility Graph methods. The results demonstrate that the proposed algorithm delivers lower-cost paths compared to other planners, with reduced computational time, and quickly indicates if no path exists.

Projects

Diffusion Model

DDPM are generative models that learn to transform Gaussian noise into data samples by iteratively denoising through a Markovian diffusion process. The model is trained on CIFAR10.

Conditional GAN

Generative Adversarial Networks (GANs) consist of a generator and a discriminator trained in a competitive framework. This implementation is trained on the CelebA dataset and conditioned on attributes such as ‘Male’ and ‘Blond Hair’ to generate realistic face images.

Classification with Vision Transformers

Demonstration of Vision Transformers applied to the CIFAR-10 dataset for classification.

Yolo Object Detector

YOLO model trained on the Pascal VOC dataset through transfer learning techniques.

CLIP Classifcation

A multimodal model by OpenAI that learns visual concepts from natural language supervision, enabling it to understand and relate images and text efficiently.

Semantic Segmentation with UNet

Training of the UNet architecture on the Oxford-IIIT Pet dataset for accurate image segmentation.

AlexNet

Implementation the AlexNet architecture, the landmark model that won the 2012 ImageNet challenge. It uses the CIFAR-10 dataset for training and integrates TensorBoard for real-time visualization of metrics.

ResNet-18

ResNet18 trained from scratch on the CIFAR-10 dataset.

Spatial Transform Networks

Spatial Transformer Networks allow a neural network to learn how to perform spatial transformations on the input image in order to enhance the geometric invariance of the model.

Transfer learning with VGG16

Built on the 2014 ImageNet Challenge winner, this project explores transfer learning for efficient adaptation.

MAE vs MOCO Embedding

A comparison between MoCo and MAE embeddings generate from CIFAR10 dataset.

ApproachFinder-NN

Developed an end-to-end docking location detection network based on synergy of deep point set networks and Hough voting.

ApproachFinder-CV

Developed a real-time computer vision pipeline to find potential docking locations indoor environments for wheelchairs using point cloud data.

Wheelchair Navigation

Real-time wheelchair navigation with shared control using model predictive path integral (MPPI) controller.

Real-time Indoor Object Detection

Indoor object detection using Votenet for pointclouds captured from RGB-D cameras in ROS simulation.

Visual Servoing

Image-based visual servoing in eye-in-hand configuration for Universal Robot 5 using Microsoft Kinect V2 camera.

Modelling-Human-Behaviour-in-Chess

Developed a predictive model that can play chess like humans, with special focus on modelling amateur play.

Verifying DNN

Summarised 10 state-of-the-art approaches to verify DNN and developed a framework to test networks (eg ACAS Xu) on safety cases using SMT solvers.

3D Human Pose Estimation

Developed a CNN capable of obtaining a temporally consistent, full 3D skeletal human pose from a single RGB camera.

BugFlood

Developed an optimal path planning algorithm in obstacle rich environments. BugFlood unlike its predecessor uses a split and kill approach to advance in the environment. Performance of this algorithm was compared with different planners from Open Motion Planning Library (OMPL) and visibility graph methods.

Shivam Thukral

Senior Software Engineer - Robotics and Perception

Hi!

Interests

Education

Experience

Senior Software Engineer - Robotics and Perception

Software Engineer - Robotics and Perception

Graduate Research Assitant

Graduate Teaching Assistant

Research Software Engineer, TCS Research and Innovation Labs

Undergraduate Research Assitant

Projects

Publications

Recent Talks

Contact