Understanding Transformer Architecture from First Principles: A Detailed Exploration
Introduction Transformer Architecture lies at the heart of modern AI models. Nearly all the state-of-the-art large language models(LLMs) like ChatGPT, LLaMa and Gemini ,all of them are built upon the transformer architecture.The transformer architecture was introduced in the paper Attention is all you need [^1]. Although it is used everywhere these days,it was first introduced for the purpose of language translation but then it was quickly generalized to other task as well.Since then, due to its scalability and self-attention mechanism it has found itself in not just natural language processing(NLP) but also in computer vision,speech and multimodal AI systems. ...
ResNets Explained - Solving Deep Network Degradation with Residual Learning
Introduction Deep convolutional neural networks have been around for a while and have completely revolutionized how we tackle image recognition task in computer vision. When AlexNet came out in 2012 it revolutionized how we use CNN’s as it was the first time we saw an architecture with consecutive Convolutional Layers with significant improvement in training speed and performance. This was achieved by leveraging deeper architecture and GPU accelaration.This 8 layer deep CNN was one of first to have this kind of performance for a largescale image classification. ...