Maximizing utility of GPUs for AI Inference
13 Nov 2025
Seminar Theatre 10
The objective of the seminar is to teach people how to host AI models in the cloud in a cost effective manner. nCompass has built a platform that makes the whole process simple, but this seminar will break down the details of how we do this so our product's utility is more transparent and if there are audience members who are deep in the technical details, they can benefit as well. We will dive into all aspects of the inference stack from details on interpreting the OpenAI API correctly, so you can setup your own AI inference stack and the various steps of the inference process from when a request reaches the server to getting processed on the GPUs. This will include a breakdown of what GPU kernels are, how a GPU works and what we've done to optimize this process. Finally, I will focus on what performance metrics are important to monitor for various use cases and how to ensure performance is maximized.