My primary research focus revolves around deep learning algorithms and their integration into robotic systems to enhance their intelligence, physical consistency, and seamless interaction with human counterparts. As a Senior Software Engineer specializing in Machine Learning and Robotics at Locus Robotics, I actively contribute to the advancement of autonomous mobile robots, enabling them to perceive their surroundings and make intelligent decisions.
I recently completed my Master of Science degree in Computer Science from the University of British Columbia, under the supervision of Ian M. Mitchell. Before enrolling at UBC, I was a research fellow at TCS Research and Innovation Labs, where I contributed to the automation of warehouse robotics under the guidance of Swagat Kumar and Rajesh Sinha. Additionally, I hold a Bachelor’s degree in Computer Science from IIIT Delhi, where I worked under the supervision of Rahul Purandare and closely collaborated with P.B. Sujit.
My experience and expertise lie in pushing the boundaries of what is possible with machine learning in the field of robotics, continuously striving to create systems that are not only autonomous but also capable of sophisticated interaction and collaboration with humans.
MSc in Computer Science, 2022
University of British Columbia, Vancouver
BTech in Computer Science Engineering, 2017
Indraprastha Institute of Information Technology, Delhi
Object Detection Developed an advanced object detection system, Locus Learning, utilizing transfer learning with YOLOX to detect LocusBots, persons, and carts in real-time within indoor warehouse environments. Optimized model inference by porting it from Python to C++, achieving a 15% reduction in inference time and a 35% decrease in CPU load. Single-handedly integrated the object detector into the existing Locus framework, converted PyTorch weights into ONNX format for faster Intel iGPU inference, and introduced a lightweight inference visualizer for enhanced detection performance. Currently integrating a state-of-the-art Kalman-based Multi-Object Tracker, ByteTracker, with the YOLOX detector. This advanced tracking system would be employed to track and avoid forklifts in warehouses.
Fiducial Marker Detection Upgraded the fiducial marker detection system to AprilTag3, resulting in a 22% increase in frame processing speed and a 28% improvement in recall. Replaced image undistortion with Region of Interest (RoI) rectification for tag detectors, reducing NUC load by approximately 5%. Additionally, integrated Locus’s fiducial markers with the state-of-the-art deep-learning tag detector, DeepTag, to enhance overall detection accuracy and efficiency.
Camera Calibration Replaced individual camera calibrations with a standard calibration matrix for all cameras mounted on the robot, ensuring calibration errors remained within 1% of use-case-specific tolerance limits. This approach streamlined the calibration process, reducing robot deployment time by 6% and eliminating the need for per-camera calibrations for each robot.
We propose to augment smart wheelchair perception with the capability to identify potential docking locations in indoor scenes. ApproachFinder-CV is a computer vision pipeline that detects safe docking poses and estimates their desirability weight based on hand-selected geometric relationships and visibility. Although robust, this pipeline is computationally intensive. We leverage this vision pipeline to generate ground truth labels used to train an end-to-end differentiable neural network that is 15 times faster.
ApproachFinder-NN is a point-based method that draws motivation from Hough voting and uses deep point cloud features to vote for potential docking locations. Both approaches rely on just geometric information, making them invariant to image distortions. A large-scale indoor object detection dataset, SUN RGB-D, is used to design, train, and evaluate the two pipelines.
Potential docking locations are encoded as a 3D temporal desirability cost map that can be integrated into any real-time path planner. As a proof of concept, we use a model predictive controller that consumes this 3D costmap with efficiently designed task-driven cost functions to share human intent. This wheelchair navigation controller outputs a nominal path that is safe, goal-oriented, and jerk-free for wheelchair navigation.
Designed and graded homework assignments, quizzes, and examinations for the following courses:
I participated in several research projects focused on warehouse automation using industrial manipulators. My work included 3D pose estimation of heterogeneous-sized boxes using point clouds and motion planning for Universal Robots with ROS.
Here are some selected projects I worked on:
For a detailed description of these projects, please refer to my Curriculum Vitae.
Multiple tutorials covering how to implement visionfocused deep learning architectures in PyTorch with torchvision.
Developed an end-to-end docking location detection network based on synergy of deep point set networks and Hough voting.
Developed a real-time computer vision pipeline to find potential docking locations indoor environments for wheelchairs using point cloud data.
Real-time wheelchair navigation with shared control using model predictive path integral (MPPI) controller.
Indoor object detection using Votenet for pointclouds captured from RGB-D cameras in ROS simulation.
Image-based visual servoing in eye-in-hand configuration for Universal Robot 5 using Microsoft Kinect V2 camera.
Developed a predictive model that can play chess like humans, with special focus on modelling amateur play.
Summarised 10 state-of-the-art approaches to verify DNN and developed a framework to test networks (eg ACAS Xu) on safety cases using SMT solvers.
Developed a CNN capable of obtaining a temporally consistent, full 3D skeletal human pose from a single RGB camera.
Converted Sudoko as a Boolean Satisfiability Problem to solve it through SAT
Studied and summarised major approaches to perform text detection and recognition using deep learning techniques.
Developed an optimal path planning algorithm in obstacle rich environments. BugFlood unlike its predecessor uses a split and kill approach to advance in the environment. Performance of this algorithm was compared with different planners from Open Motion Planning Library (OMPL) and visibility graph methods.
Developed a static analysis Clang based tool for ROS to reduce network latency and dropout rate by optimizing message size.
A clang based tool to find different types of statements in C/C++ code. This tool is used generate meta-data for a ROS package.
Papers, Workshops and Patents