artem sumo43

Hi there 👋

I'm interested in AI, with a focus on inference and post-training of AI models. Follow me on Twitter @sumo43_ for updates and discussions about the latest in AI research and development.

🔭 Projects

loopVLM

Demo: Object Detection Demo on X
Description:
A fast paligemma inference engine running on the RTX 4090. I built an object detection demo using a 224px model that runs in real time at 16fps.

RobotArena

Description:
RobotArena is an ELO-based 🤖 Robot-Action Model benchmark that lets you test models directly in your browser. This project is a collaboration with SkunkworksAI, allowing users to explore and evaluate robot-action models in a browser.

💼 Work Experience

Brium AI

Role: LLM Inference Engineer
Overview:
At Brium AI, I worked on accelerating inference for large language models across diverse GPU architectures. My role focused on optimizing the inference stack—from runtime systems to compilers—for long-context LLM applications. This work led to significant improvements in throughput and latency, particularly on AMD’s MI210 and MI300 GPUs.
Read more: Brium AI Blog Post

RunPod

Role: ML Engineer
Overview:
At RunPod, I built an in-house inference engine that supports low-latency workloads with speculative decoding. I also collaborated closely with customers to deploy AI models effectively on the RunPod stack.

📫 Get in touch

Twitter: @sumo43_
Email: (Add your email here if you'd like to be contacted directly)

Feel free to reach out if you're interested in collaborating or just want to chat about AI, machine learning, and all things tech!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

artem sumo43

Achievements

Achievements

Highlights

Organizations

Block or report sumo43