<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>llm on David An</title>
    <link>https://davidan.dev/tags/llm/</link>
    <description>Recent content in llm on David An</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Mon, 12 Jan 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://davidan.dev/tags/llm/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Fine Tuning Llama 3.2B with Unsloth</title>
      <link>https://davidan.dev/posts/ftsql/</link>
      <pubDate>Mon, 12 Jan 2026 00:00:00 +0000</pubDate>
      
      <guid>https://davidan.dev/posts/ftsql/</guid>
      <description>In this article, we will be fine tuning the Llama 3.2B model with Unsloth on the Spider 1.0 SQL dataset. The goal of the article is to improve the SQL capabilities of a general Llama 3.2B model.
Prerequisites Before we get started, we assume that the reader has access to a GPU which they are able to use for training. Additionally, we assume that the reader has a Python setup.</description>
    </item>
    
    <item>
      <title>Distributed Inference for Fun and Profit</title>
      <link>https://davidan.dev/posts/dif/</link>
      <pubDate>Sat, 01 Nov 2025 00:00:00 +0000</pubDate>
      
      <guid>https://davidan.dev/posts/dif/</guid>
      <description>You ever just wonder how large models serve at scale? Or how to actually go from query to answer? Over the course of this article, we will take a look at approaches to inference and explore the tradeoffs of various approaches from a technical perspective.
We assume that the reader has basic knowledge of ML concepts and how Transformers work. Additionally, all of the work here is done on a single Nvidia RTX 3090 GPU with the respective drivers installed (nvidia-smi, nvidia-ctk, etc.</description>
    </item>
    
    <item>
      <title>A Dive into GPU Math</title>
      <link>https://davidan.dev/posts/gpumath/</link>
      <pubDate>Wed, 15 Oct 2025 00:00:00 +0000</pubDate>
      
      <guid>https://davidan.dev/posts/gpumath/</guid>
      <description>You ever wonder what goes on when you ask ChatGPT a question and how that is served? Or what people mean when by using a A100 to train a model and the time it takes? Or even considering the levels of abstraction between the model and the hardware? This article will aim to bring light to many of the concepts related to GPUs and the math behind them.
We assume that the reader has a basic understanding of how recent LLM technologies work.</description>
    </item>
    
    <item>
      <title>Tokenization and Embeddings: A Primer</title>
      <link>https://davidan.dev/posts/tokenization/</link>
      <pubDate>Wed, 01 Oct 2025 00:00:00 +0000</pubDate>
      
      <guid>https://davidan.dev/posts/tokenization/</guid>
      <description>Lately, all we have heard about is tokenization and embeddings and the role they play in the greater LLM and AI ecosystem. These two concepts are one of the most fundamental concepts in language modeling and remain the foundation of the technology we interact with on a daily basis. In this article, we will cover some of the basics around tokenizing and embedding sequences of texts and the nuances of them.</description>
    </item>
    
  </channel>
</rss>
