AI Inference Optimization Engineering

Name: AI Inference Optimization Engineering
Brand: Independently published
SKU: 9798199720021
Price: 3923 HUF
Availability: InStock

Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

Szerző: ChatVariety Team

Nyelv:

Kötés: Puha kötésű

Kiadó: Independently published

Elérhetőség: Várható készletfeltöltés

Küldés 07. 06. 2026

3 923 Ft

Slash LLM Deployment Costs and LatencyDeploying Large Language Models (LLMs) in production is a mass...

Információk a könyvről

Szerző

ChatVariety Team

Nyelv

Angol

Kötés

Könyv - Puha kötésű

Kiadva

2026

oldal

EAN

9798199720021

Enbook ID

52770465

Kiadó

Independently published

Súly

142

Méretek

152 x 229 x 5

Teljes leírás

Slash LLM Deployment Costs and Latency

Deploying Large Language Models (LLMs) in production is a massive economic and engineering hurdle. AI Inference Optimization Engineering is your comprehensive, hands-on guide to mastering the full stack of modern LLM optimization techniques. From memory-bandwidth solutions to hardware-specific compilation, this book bridges the gap between research-level models and enterprise-grade execution.

What you will master inside this book:

Hardware-Aware Optimization: Dive deep into KV cache mechanics, autoregressive decoding, and GPU memory hierarchies to eliminate latency bottlenecks.
State-of-the-Art Quantization: Apply GPTQ, AWQ, and GGUF compression algorithms to scale down massive neural networks without sacrificing model accuracy.
Advanced Acceleration Methods: Implement speculative decoding with draft models (like Medusa and Eagle), PagedAttention, and FlashAttention to boost throughput by 2-3x.
Production-Grade Serving: Build ultra-low-latency deployment infrastructures using vLLM, Triton Inference Server, and continuous batching.
Cross-Platform Deployment: Optimize models for specific target hardware, including NVIDIA H100 (TensorRT-LLM), Apple Silicon (llama.cpp/Metal), and Qualcomm mobile/edge accelerators.

Whether you are an ML infrastructure engineer, an AI platform architect, or a technical leader looking to scale LLMs cost-effectively, this book provides the production-ready code, equations, and architectural patterns you need to build hyper-efficient AI pipelines.

Leggyakrabban keresett

Categories

Authors

Publishers

Termékek

Termékek

Leggyakrabban keresett

Categories

Authors

Publishers

Termékek

Termékek

AI Inference Optimization Engineering

Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

Információk a könyvről

Teljes leírás