Hi Marcelo,
Been going down a rabbit hole on your vLLM paged-attention PRs and the scheduler in your agent orchestration repo—the way you handled KV cache fragmentation across variable-length sequences is genuinely clever, and the tool-routing pattern in that 4k-star repo is unlike anything I've seen.
Would love to hear the backstory on those design choices—down for a quick chat this week?
- Referencing their specific vLLM PRs and KV cache handling.
- Connecting their agent orchestration architecture to our stack requirements.