Alt text

Hi,

I’m Suyog Ghimire, a second-year Computer Engineering student at Institute of Engineering(IOE), Paschimanchal campus. I write blogs as a way to learn and document my journey so far I’m especially curious about multimodal AI, vision-and-language models, and how machines can connect perception with understanding. Along the way, I also share thoughts on programming, machine learning, and anything else that sparks my interest.

Understanding Transformer Architecture from First Principles: A Detailed Exploration

Introduction Transformer Architecture lies at the heart of modern AI models. Nearly all the state-of-the-art large language models(LLMs) like ChatGPT, LLaMa and Gemini ,all of them are built upon the transformer architecture.The transformer architecture was introduced in the paper Attention is all you need [^1]. Although it is used everywhere these days,it was first introduced for the purpose of language translation but then it was quickly generalized to other task as well.Since then, due to its scalability and self-attention mechanism it has found itself in not just natural language processing(NLP) but also in computer vision,speech and multimodal AI systems. ...

April 28, 2025 · 10 min · Suyog Ghimire

ResNets Explained - Solving Deep Network Degradation with Residual Learning

Introduction Deep convolutional neural networks have been around for a while and have completely revolutionized how we tackle image recognition task in computer vision. When AlexNet came out in 2012 it revolutionized how we use CNN’s as it was the first time we saw an architecture with consecutive Convolutional Layers with significant improvement in training speed and performance. This was achieved by leveraging deeper architecture and GPU accelaration.This 8 layer deep CNN was one of first to have this kind of performance for a largescale image classification. ...

February 3, 2025 · 5 min · Suyog Ghimire