If playback doesn't begin shortly, try restarting your device.
•
You're signed out
Videos you watch may be added to the TV's watch history and influence TV recommendations. To avoid this, cancel and sign in to YouTube on your computer.
CancelConfirm
Share
An error occurred while retrieving sharing information. Please try again later.
How We Power the Largest AI Deployments on the Planet: Running Virtual Clusters at Scale - Brandon Jacobs, CoreWeave & Lukas Gentele, Loft Labs
Running and managing a large number of Kubernetes clusters on bare metal poses significant challenges, from security to GPU provisioning to scalability. Specialized cloud provider CoreWeave experienced these first-hand, operating 3,000+ Kubernetes clusters on top of 5,000 bare metal nodes with massive amounts of GPUs to power modern AI applications at scale. In the session, we’ll dive into these challenges and how CoreWeave partnered with Loft Labs, the maintainers of vcluster, to create this serverless Kubernetes experience for numerous companies running AI workloads at scale. This session demonstrates the pitfalls, design choices and architectural challenges the teams have dealt with over the course of 3 years while evolving its serverless Kubernetes offering, including: -Secure Isolation Of Tenants On A Shared Infrastructure -Challenges in achieving 10 second autoscaling -On-Demand Cluster & Compute Provisioning For Tenants -Day 2 Operations & Managing A Fleet Of Clusters At Scale…...more
How We Power the Largest AI Deployments on the Planet: Running Vir... Brandon Jacobs & Lukas Gentele
28Likes
1,538Views
2023Nov 13
How We Power the Largest AI Deployments on the Planet: Running Virtual Clusters at Scale - Brandon Jacobs, CoreWeave & Lukas Gentele, Loft Labs
Running and managing a large number of Kubernetes clusters on bare metal poses significant challenges, from security to GPU provisioning to scalability. Specialized cloud provider CoreWeave experienced these first-hand, operating 3,000+ Kubernetes clusters on top of 5,000 bare metal nodes with massive amounts of GPUs to power modern AI applications at scale. In the session, we’ll dive into these challenges and how CoreWeave partnered with Loft Labs, the maintainers of vcluster, to create this serverless Kubernetes experience for numerous companies running AI workloads at scale. This session demonstrates the pitfalls, design choices and architectural challenges the teams have dealt with over the course of 3 years while evolving its serverless Kubernetes offering, including: -Secure Isolation Of Tenants On A Shared Infrastructure -Challenges in achieving 10 second autoscaling -On-Demand Cluster & Compute Provisioning For Tenants -Day 2 Operations & Managing A Fleet Of Clusters At Scale
Show less...more
How We Power the Largest AI Deployments on the Planet: Running Virtual Clusters at Scale - Brandon Jacobs, CoreWeave & Lukas Gentele, Loft Labs
Running and managing a large number of Kubernetes clusters on bare metal poses significant challenges, from security to GPU provisioning to scalability. Specialized cloud provider CoreWeave experienced these first-hand, operating 3,000+ Kubernetes clusters on top of 5,000 bare metal nodes with massive amounts of GPUs to power modern AI applications at scale. In the session, we’ll dive into these challenges and how CoreWeave partnered with Loft Labs, the maintainers of vcluster, to create this serverless Kubernetes experience for numerous companies running AI workloads at scale. This session demonstrates the pitfalls, design choices and architectural challenges the teams have dealt with over the course of 3 years while evolving its serverless Kubernetes offering, including: -Secure Isolation Of Tenants On A Shared Infrastructure -Challenges in achieving 10 second autoscaling -On-Demand Cluster & Compute Provisioning For Tenants -Day 2 Operations & Managing A Fleet Of Clusters At Scale…...more