Unleashing AI for real-time, scalable applications

Sharon Zhou, co-founder and CEO of Lamini, introduces the concept of using AI for real-time scalable applications. She discusses the challenges of using large language models (LLMs) in real-time applications, including latency and cost considerations. She explores various considerations for using LLMs, such as effort, context window, streaming, caching, and chains of LLM calls. She also discusses the relationship between effort and quality in reducing latency, and concludes by highlighting the potential of LLMs in developing high-quality, low-latency, and cost-effective applications.
Previous

The rise of AI in real-time decision making

Next

Bringing GenAI securely to the enterprise