<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Optimization on Sange Mehrab</title><link>https://anwarshamim01.github.io/Sang_e_Mehrab/tags/optimization/</link><description>Recent content in Optimization on Sange Mehrab</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Fri, 01 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://anwarshamim01.github.io/Sang_e_Mehrab/tags/optimization/index.xml" rel="self" type="application/rss+xml"/><item><title>1.4 Mathematics of Large Language Models: Training, Inference, Attention, Scaling, and Alignment</title><link>https://anwarshamim01.github.io/Sang_e_Mehrab/courses/course/chapter-01/section-04/</link><pubDate>Fri, 01 May 2026 00:00:00 +0000</pubDate><guid>https://anwarshamim01.github.io/Sang_e_Mehrab/courses/course/chapter-01/section-04/</guid><description>A beginner-to-advanced mathematical introduction to LLMs, covering autoregressive language modeling, tokenization, vector embeddings, positional encodings, transformer blocks, attention, softmax, cross-entropy, maximum likelihood, backpropagation, AdamW, scaling laws, compute-optimal training, MoE, efficient attention, KV caching, speculative decoding, quantization, LoRA, RLHF, DPO, PPO, and inference-time reasoning.</description></item></channel></rss>