
Large Language Models, GenAI, Transformers, Embeddings, Vectors, Inferences, Fine-tuning, RAG, Neural Networks, Megaloboxing… Do I have your Attention?
Ever wanted to understand how Generative AI works, not to build it, but at least get the gist?
As someone with premium subscriptions to three different LLM services, so did I. For months, I’ve been safely keeping track of these resources by leaving them open in tabs across various devices and browsers. One day, I hope to actually finish at least one of them…
At Class Central, we generally recommend courses. But often, learning isn’t just about pre-recorded videos with an occasional quiz here and there, followed by a paid certificate for clicking the right things.
Why Learn About LLMs or Large Language Models?
Honestly, IDK. I’m just adding this question to please the algorithm gods at Google. Feel free to skip the rest of this section. The next paragraph is entirely generated by an LLM, which will remain anonymous to protect its privacy.
Learning about Large Language Models is important because they’re changing how we communicate and access information. By understanding how LLMs work, you can better use them to improve your writing, communication, and even your job skills!
GenAI vs GPT vs LLM
You know the drill. Feel free to skip the rest of the section or just paste the title into your friendly neighborhood LLM.
- GenAI: A general term for artificial intelligence that can learn and improve over time.
- GPT (Generative Pre-trained Transformer): A specific type of AI model that can generate human-like text, like chatbots and language translation.
- LLM (Large Language Model): A type of AI model that’s trained on a massive amount of text data to understand and generate human language, like writing and conversation.
What Is ChatGPT Doing … and Why Does It Work? By Stephen Wolfram

Yes, THE Stephen Wolfram. Don’t know who he is? Here is what Wikipedia ChatGPT has to say about him:
Stephen Wolfram is a British-American computer scientist, physicist, and entrepreneur, best known for his work in developing Mathematica, an advanced computational software, and for his development of the Wolfram Alpha computational knowledge engine.
This nearly 20,000-word article (book?) by Wolfram goes into detail about the Wolfram Language and GPT-2 system, with illustrations and code examples provided. It’s in-depth but accessible (given the complexity of the topic).
tldr: It’s just adding one word at a time.
This is art. It would be great if we had more educators like Wolfram who can break down complex topics into something that a majority of people can understand, building it up bit by bit.
Of course, we’re not going to do that because it doesn’t “scale.” Instead, we’re going to flood the internet with random garbage generated by LLMs.
If you have to read just one article, this would be it..
3Blue1Brown’s Visual Intro to Transformers and Attention
3Blue1Brown (3b1b) by Grant Sanderson is a popular YouTube channel with over 6 million subscribers. He creates stunning animated videos of complex mathematical concepts, making them accessible and visually engaging for viewers of all levels.
Sanderson created his own mathematical animation engine and open-sourced it on GitHub. Similar to Wolfram, I feel his videos are a work of art. I don’t think he’s worried about GenAI taking his job.
So far, he has published two videos on this topic: “But what is a GPT? Visual intro to transformers” and “Attention in transformers, visually explained.” I even spied a third video on his Patreon.
He explains visually what goes on in a transformer step-by-step. And by step-by-step, I mean he uses a real-world example and shows us the actual matrices in those steps as data flows through them. We’re talking tokens, vectors, attention blocks, and feed-forward layers – all brought to life through Sanderson’s magical animations.
It’s mind-boggling that this even exists. I haven’t been this impressed with a video since I watched Jurassic Park in theaters for the first time (I know I’m dating myself).
LLM University And Serrano.Academy
I am combining LLM University by Cohere and Serrano.Academy (YouTube Channel) because they have a common instructor: Luis Serrano.
If this name sounds familiar, you might be a Udacitian from its heyday. Luis was a popular instructor teaching the Machine Learning Nanodegree. Long ago, Udacity launched something called Udacity Connect Intensive. Basically, you’d meet in-person once a week in a physical classroom while taking the Nanodegree.
I was part of the first cohort/test in San Jose, and Luis Serrano once dropped in to give a lecture. His greatest strength is breaking down complicated concepts into simple analogies and examples.
For me, Luis provides the intuition behind the concepts. His passion for teaching is obvious and infectious.
Unlike the previous two examples that start with real-world examples and are quite information packed, I would say if you’re having trouble understanding those two resources, watch the following Serrano.Academy videos:
- The Attention Mechanism in Large Language Models
- The math behind Attention: Keys, Queries, and Values matrices
- What are Transformer Models and how do they work?
LLM University consists of 7 modules in total and contains a mix of text and videos. The first module, taught by Luis, covers Large Language Models and some of the theory behind them like Attention, Transformers, and Embeddings. I believe this would have significant overlap with the Serrano.Academy videos I mentioned above.
The next 6 modules are very practical in nature and deal with the real-world applications of LLMs: Text Generation, Semantic Search, and Retrieval-Augmented Generation (RAG). The code examples are in Python and use the Cohere SDK.
There is also a section on Prompt Engineering.
Jay Alammar’s Visual Journey Through Language Models

Jay Alammar, another former Udacity instructor, now works at Cohere alongside Luis Serrano. He is also an instructor for certain modules at Cohere’s LLM University.
Alammar has created a series of tutorials where he explains the workings of large language models using illustrations, animations, and visualizations:
These tutorials offer a visual approach to understanding complex AI concepts, making them more accessible to a wider audience.
Let’s build GPT: from scratch, in code, spelled out by Andrej Karpathy
In this two-hour video, you build a simplified version of ChatGPT from scratch with one of the co-founders of OpenAI, Andrej Karpathy. Karpathy previously was also a Director of AI at Tesla, where he led the development of the company’s Autopilot system.
Honestly, the title of the video is pretty self-explanatory. You will build a transformer model right from scratch in Python. It will focus on training a character-level language model on the Shakespeare dataset to generate text that resembles Shakespeare’s writing.
This is part of his “Neural Networks: Zero to Hero” series. Since this video, he has published a couple more: “Let’s build the GPT Tokenizer” and most recently a 4-hour “Let’s reproduce GPT-2 (124M)“. Apparently it takes 90 minutes and $