Efficiently Serving LLMs

Software > Computer Software > Educational Software DeepLearning.AI

$49

ENROLL NOW

Course Overview

What You'll Learn

Join our new short course, Efficiently Serving Large Language Models, to build a ground-up understanding of how to serve LLM applications from Travis Addair, CTO at Predibase.
Whether you’re ready to launch your own application or just getting started building it, the topics you’ll explore in this course will deepen your foundational knowledge of how LLMs work, and help you better understand the performance trade-offs you must consider when building LLM applications that will serve large numbers of users.
Learn how auto-regressive large language models generate text one token at a time.

Join our new short course, Efficiently Serving Large Language Models, to build a ground-up understanding of how to serve LLM applications from Travis Addair, CTO at Predibase. Whether you’re ready to launch your own application or just getting started building it, the topics you’ll explore in this course will deepen your foundational knowledge of how LLMs work, and help you better understand the performance trade-offs you must consider when building LLM applications that will serve large numbers of users. You’ll walk through the most important optimizations that allow LLM vendors to efficiently serve models to many customers, including strategies for working with multiple fine-tuned models at once. In this course, you will: 1. Learn how auto-regressive large language models generate text one token at a time. 2. Implement the foundational elements of a modern LLM inference stack in code, including KV caching, continuous batching, and model quantization, and benchmark their impacts on inference throughput and latency. 3. Explore the details of how LoRA adapters work, and learn how batching techniques allow different LoRA adapters to be served to multiple customers simultaneously. 4. Get hands-on with Predibase’s LoRAX framework inference server to see these optimization techniques implemented in a real world LLM inference server. Knowing more about how LLM servers operate under the hood will greatly enhance your understanding of the options you have to increase the performance and efficiency of your LLM-powered applications.

Course FAQs

Is this an accredited online course?

Accreditation for 'Efficiently Serving LLMs' is determined by the provider, DeepLearning.AI. For online college courses or degree programs, we strongly recommend you verify the accreditation status directly on the provider's website to ensure it meets your requirements.

Can this course be used for continuing education credits?

Many of the courses listed on our platform are suitable for professional continuing education. However, acceptance for credit varies by state and licensing board. Please confirm with your board and {course.provider} that this specific course qualifies.

How do I enroll in this online school program?

To enroll, click the 'ENROLL NOW' button on this page. You will be taken to the official page for 'Efficiently Serving LLMs' on the DeepLearning.AI online class platform, where you can complete your registration.

Efficiently Serving LLMs

Course Overview

What You'll Learn

Course FAQs

Is this an accredited online course?

Can this course be used for continuing education credits?

How do I enroll in this online school program?

Similar Online School Programs

Internationalization of Education: Global Issues & Trends

Getting Started with Teradata

Discover Best Practice Farming for a Sustainable 2050

Elasticsearch 8 and the Elastic Stack: In-Depth and Hands-On

Attention Mechanism - Español

Learn to Draw