My primary research focus revolves around deep learning algorithms and their integration into robotic systems to enhance their intelligence, physical consistency, and seamless interaction with human counterparts. As a Senior Software Engineer specializing in Machine Learning and Robotics at Locus Robotics, I actively contribute to the advancement of autonomous mobile robots, enabling them to perceive their surroundings and make intelligent decisions.
I recently completed my Master of Science degree in Computer Science from the University of British Columbia, under the supervision of Ian M. Mitchell. Before enrolling at UBC, I was a research fellow at TCS Research and Innovation Labs, where I contributed to the automation of warehouse robotics under the guidance of Swagat Kumar and Rajesh Sinha. Additionally, I hold a Bachelor’s degree in Computer Science from IIIT Delhi, where I worked under the supervision of Rahul Purandare and closely collaborated with P.B. Sujit.
My experience and expertise lie in pushing the boundaries of what is possible with machine learning in the field of robotics, continuously striving to create systems that are not only autonomous but also capable of sophisticated interaction and collaboration with humans.
MSc in Computer Science, 2022
University of British Columbia, Vancouver
BTech in Computer Science Engineering, 2017
Indraprastha Institute of Information Technology, Delhi
Object Detection Developed an advanced object detection system, Locus Learning, utilizing transfer learning with to detect LocusBots, persons, carts and other objects in real-time within indoor warehouse environments.
Cart Filter for Vectors Designed and implemented a per-sensor filtering system to exclude carts from sensor data once a robot attaches itself to a cart, significantly improving navigation performance and reducing mission time by 14% (Video).This system prevents the robot from misinterpreting the attached cart as an external obstacle, allowing for smoother trajectory planning and avoiding unnecessary detours.
Forklift Detection via Apriltags Used a new Apriltag3 family with Forklifts to avoid robot-forklift collisions. Detected fiducial markers are used to define dynamic ‘danger zones’ in the robot’s costmap, treated as obstacles during navigation. This approach significantly reduced collision incidents, lowering repair costs and boosting operational efficiency.
Fiducial Marker Detection Upgraded the fiducial marker detection system to AprilTag3, resulting in a 22% increase in frame processing speed and a 28% improvement in recall. Replaced image undistortion with Region of Interest (RoI) rectification for tag detectors, reducing NUC load by approximately 5%. Additionally, integrated Locus’s fiducial markers with the state-of-the-art deep-learning tag detector, DeepTag, to enhance overall detection accuracy and efficiency.
Camera Calibration Replaced individual camera calibrations with a standard calibration matrix for all cameras mounted on the robot, ensuring calibration errors remained within 1% of use-case-specific tolerance limits. This approach streamlined the calibration process, reducing robot deployment time by 6% and eliminating the need for per-camera calibrations for each robot.
We propose to augment smart wheelchair perception with the capability to identify potential docking locations in indoor scenes. ApproachFinder-CV is a computer vision pipeline that detects safe docking poses and estimates their desirability weight based on hand-selected geometric relationships and visibility. Although robust, this pipeline is computationally intensive. We leverage this vision pipeline to generate ground truth labels used to train an end-to-end differentiable neural network that is 15 times faster.
ApproachFinder-NN is a point-based method that draws motivation from Hough voting and uses deep point cloud features to vote for potential docking locations. Both approaches rely on just geometric information, making them invariant to image distortions. A large-scale indoor object detection dataset, SUN RGB-D, is used to design, train, and evaluate the two pipelines.
Potential docking locations are encoded as a 3D temporal desirability cost map that can be integrated into any real-time path planner. As a proof of concept, we use a model predictive controller that consumes this 3D costmap with efficiently designed task-driven cost functions to share human intent. This wheelchair navigation controller outputs a nominal path that is safe, goal-oriented, and jerk-free for wheelchair navigation.
Designed and graded homework assignments, quizzes, and examinations for the following courses:
I participated in several research projects focused on warehouse automation using industrial manipulators. My work included 3D pose estimation of heterogeneous-sized boxes using point clouds and motion planning for Universal Robots with ROS.
Here are some selected projects I worked on:
For a detailed description of these projects, please refer to my Curriculum Vitae.
DDPM are generative models that learn to transform Gaussian noise into data samples by iteratively denoising through a Markovian diffusion process. The model is trained on CIFAR10.
Generative Adversarial Networks (GANs) consist of a generator and a discriminator trained in a competitive framework. This implementation is trained on the CelebA dataset and conditioned on attributes such as ‘Male’ and ‘Blond Hair’ to generate realistic face images.
Demonstration of Vision Transformers applied to the CIFAR-10 dataset for classification.
YOLO model trained on the Pascal VOC dataset through transfer learning techniques.
A multimodal model by OpenAI that learns visual concepts from natural language supervision, enabling it to understand and relate images and text efficiently.
Training of the UNet architecture on the Oxford-IIIT Pet dataset for accurate image segmentation.
Implementation the AlexNet architecture, the landmark model that won the 2012 ImageNet challenge. It uses the CIFAR-10 dataset for training and integrates TensorBoard for real-time visualization of metrics.
ResNet18 trained from scratch on the CIFAR-10 dataset.
Spatial Transformer Networks allow a neural network to learn how to perform spatial transformations on the input image in order to enhance the geometric invariance of the model.
Built on the 2014 ImageNet Challenge winner, this project explores transfer learning for efficient adaptation.
A comparison between MoCo and MAE embeddings generate from CIFAR10 dataset.
Developed an end-to-end docking location detection network based on synergy of deep point set networks and Hough voting.
Developed a real-time computer vision pipeline to find potential docking locations indoor environments for wheelchairs using point cloud data.
Real-time wheelchair navigation with shared control using model predictive path integral (MPPI) controller.
Indoor object detection using Votenet for pointclouds captured from RGB-D cameras in ROS simulation.
Image-based visual servoing in eye-in-hand configuration for Universal Robot 5 using Microsoft Kinect V2 camera.
Developed a predictive model that can play chess like humans, with special focus on modelling amateur play.
Summarised 10 state-of-the-art approaches to verify DNN and developed a framework to test networks (eg ACAS Xu) on safety cases using SMT solvers.
Developed a CNN capable of obtaining a temporally consistent, full 3D skeletal human pose from a single RGB camera.
Developed an optimal path planning algorithm in obstacle rich environments. BugFlood unlike its predecessor uses a split and kill approach to advance in the environment. Performance of this algorithm was compared with different planners from Open Motion Planning Library (OMPL) and visibility graph methods.
Papers, Workshops and Patents