<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>infrastructure on David An</title>
    <link>https://davidan.dev/tags/infrastructure/</link>
    <description>Recent content in infrastructure on David An</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Sun, 14 Dec 2025 00:00:00 +0000</lastBuildDate><atom:link href="https://davidan.dev/tags/infrastructure/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Local-first GPU Cluster with nvkind and Time Splitting</title>
      <link>https://davidan.dev/posts/nvkind/</link>
      <pubDate>Sun, 14 Dec 2025 00:00:00 +0000</pubDate>
      
      <guid>https://davidan.dev/posts/nvkind/</guid>
      <description>You have a brand new shiny GPU and want to start experimenting with it by running some sample experiments in Kubernetes, but how would you start that. In this short tutorial, we go over how to use nvkind, the gpu-operator to start running some basic experiemtns using your new GPU. We assume that the reader already has things such as Docker, golang, and relevant drivers/systems (nvidia-ctk, nvidia-smi, etc.) installed too.</description>
    </item>
    
    <item>
      <title>Distributed Inference for Fun and Profit</title>
      <link>https://davidan.dev/posts/dif/</link>
      <pubDate>Sat, 01 Nov 2025 00:00:00 +0000</pubDate>
      
      <guid>https://davidan.dev/posts/dif/</guid>
      <description>You ever just wonder how large models serve at scale? Or how to actually go from query to answer? Over the course of this article, we will take a look at approaches to inference and explore the tradeoffs of various approaches from a technical perspective.
We assume that the reader has basic knowledge of ML concepts and how Transformers work. Additionally, all of the work here is done on a single Nvidia RTX 3090 GPU with the respective drivers installed (nvidia-smi, nvidia-ctk, etc.</description>
    </item>
    
    <item>
      <title>A Dive into GPU Math</title>
      <link>https://davidan.dev/posts/gpumath/</link>
      <pubDate>Wed, 15 Oct 2025 00:00:00 +0000</pubDate>
      
      <guid>https://davidan.dev/posts/gpumath/</guid>
      <description>You ever wonder what goes on when you ask ChatGPT a question and how that is served? Or what people mean when by using a A100 to train a model and the time it takes? Or even considering the levels of abstraction between the model and the hardware? This article will aim to bring light to many of the concepts related to GPUs and the math behind them.
We assume that the reader has a basic understanding of how recent LLM technologies work.</description>
    </item>
    
    <item>
      <title>An Intro to Kubernetes Security</title>
      <link>https://davidan.dev/posts/k8s/</link>
      <pubDate>Sat, 06 Jul 2024 00:00:00 +0000</pubDate>
      
      <guid>https://davidan.dev/posts/k8s/</guid>
      <description>Kubernetes is now widely used for managing containerized applications. As more organizations adopt it, understanding its security aspects becomes crucial. This paper examines the key security challenges in Kubernetes and suggests ways to address them.
Basic Concepts of Kubernetes Security Kubernetes operates across many computers, often in different locations. This spread-out nature makes security more complex. Kubernetes also constantly creates and removes small units of work called pods. This constant change means that old security methods designed for unchanging systems don&amp;rsquo;t work well.</description>
    </item>
    
    <item>
      <title>Horizontal and Vertical Database Scaling</title>
      <link>https://davidan.dev/posts/db/</link>
      <pubDate>Wed, 06 Mar 2024 00:00:00 +0000</pubDate>
      
      <guid>https://davidan.dev/posts/db/</guid>
      <description>In today&amp;rsquo;s day and age, organizations have more data than ever. On a smaller scale, developers are building startups serving large amounts of data to customers. All of these use cases require both the efficient storage and transmission of data. If a backend database goes down, a service can&amp;rsquo;t be used, customers can&amp;rsquo;t reach their app, and a myriad of issues appear.
To accommodate for this ever-growing thirst for data, many techniques have been developed to help address these issues.</description>
    </item>
    
  </channel>
</rss>
