I'm interested in AI, with a focus on inference and post-training of AI models. Follow me on Twitter @sumo43_ for updates and discussions about the latest in AI research and development.
- Demo: Object Detection Demo on X
- Description:
A fast paligemma inference engine running on the RTX 4090. I built an object detection demo using a 224px model that runs in real time at 16fps.
- Description:
RobotArena is an ELO-based 🤖 Robot-Action Model benchmark that lets you test models directly in your browser. This project is a collaboration with SkunkworksAI, allowing users to explore and evaluate robot-action models in a browser.
Role: LLM Inference Engineer
Overview:
At Brium AI, I worked on accelerating inference for large language models across diverse GPU architectures. My role focused on optimizing the inference stack—from runtime systems to compilers—for long-context LLM applications. This work led to significant improvements in throughput and latency, particularly on AMD’s MI210 and MI300 GPUs.
Read more: Brium AI Blog Post
Role: ML Engineer
Overview:
At RunPod, I built an in-house inference engine that supports low-latency workloads with speculative decoding. I also collaborated closely with customers to deploy AI models effectively on the RunPod stack.
- Twitter: @sumo43_
- Email: (Add your email here if you'd like to be contacted directly)
Feel free to reach out if you're interested in collaborating or just want to chat about AI, machine learning, and all things tech!