Vision

Diffusion Model

DDPM are generative models that learn to transform Gaussian noise into data samples by iteratively denoising through a Markovian diffusion process. The model is trained on CIFAR10.

Generative Adversarial Networks (GANs) consist of a generator and a discriminator trained in a competitive framework. This implementation is trained on the CelebA dataset and conditioned on attributes such as 'Male' and 'Blond Hair' to generate realistic face images.

Classification with Vision Transformers

Demonstration of Vision Transformers applied to the CIFAR-10 dataset for classification.

Yolo Object Detector

YOLO model trained on the Pascal VOC dataset through transfer learning techniques.

CLIP Classifcation

A multimodal model by OpenAI that learns visual concepts from natural language supervision, enabling it to understand and relate images and text efficiently.

Semantic Segmentation with UNet

Training of the UNet architecture on the Oxford-IIIT Pet dataset for accurate image segmentation.

AlexNet

Implementation the AlexNet architecture, the landmark model that won the 2012 ImageNet challenge. It uses the CIFAR-10 dataset for training and integrates TensorBoard for real-time visualization of metrics.

ResNet-18

ResNet18 trained from scratch on the CIFAR-10 dataset.

Spatial Transform Networks

Spatial Transformer Networks allow a neural network to learn how to perform spatial transformations on the input image in order to enhance the geometric invariance of the model.

Transfer learning with VGG16

Built on the 2014 ImageNet Challenge winner, this project explores transfer learning for efficient adaptation.