Machine Learning Systems Engineer

Oct 1, 2025 — Greg Diamos

Unified LLM Training & Inference Platform

Location: San Francisco Bay Area / Remote

Type: Full-time

About ScalarLM

ScalarLM unifies vLLM, Megatron-LM, and HuggingFace for fast LLM training, inference, and self-improving agents—all via an OpenAI-compatible interface. ScalarLM builds on top of the vLLM inference engine, the Megatron-LM training framework, and the HuggingFace model hub. It unifies the capabilities of these tools into a single platform, enabling users to easily perform LLM inference and training, and build higher lever applications such as Agents with a twist - they can teach themselves new abilities via back propagation.

ScalarLM is inspired by the work of Seymour Roger Cray (September 28, 1925 – October 5, 1996), an American electrical engineer and supercomputer architect who designed a series of computers that were the fastest in the world for decades, and founded Cray Research, which built many of these machines. Called "the father of supercomputing", Cray has been credited with creating the supercomputer industry.

We are a fully open source project (CC-0 Licensed) focused on democratizing access to cutting-edge LLM infrastructure that combines training and inference in a unified platform, enabling the development of self-improving AI agents similar to DeepSeek R1.

ScalarLM is supported and maintained by TensorWave and Relational AI.

The Role

We are seeking a passionate Machine Learning Engineer who will contribute directly to the ScalarLM open source codebase as well as build LLM applications on top of it. This role is perfect for someone who wants to work at the intersection of high-performance computing, distributed systems, and cutting-edge machine learning research. You'll be working on fundamental infrastructure that enables researchers and organizations worldwide to train and deploy large language models at scale.

Key Responsibilities:

Contribute code and improvements to the ScalarLM open source project
Develop and optimize distributed training algorithms for large language models
Implement high-performance inference engines and optimization techniques
Work on integration between vLLM, Megatron-LM, and HuggingFace ecosystems
Build tools for seamless model training, fine-tuning, and deployment
Optimize performance advanced GPU architectures
Collaborate with the open source community on feature development and bug fixes
Research and implement new techniques for self-improving AI agents

Required Qualifications

Technical Skills

Programming Languages: Proficiency in both C/C++ and Python
High Performance Computing: Deep understanding of HPC concepts including:
- MPI (Message Passing Interface) programming and optimization
- Bulk Synchronous Parallel (BSP) computing models
- Multi-GPU and multi-node distributed computing
- CUDA/ROCm programming experience preferred
Machine Learning Foundations:
- Solid understanding of gradient descent and backpropagation algorithms
- Experience with transformer architectures and ability to explain their mechanics
- Knowledge of deep learning training and their applications
- Understanding of distributed training techniques (data parallelism, model parallelism, pipeline parallelism, large batch training, optimization)

Research & Development

Publications: Experience with machine learning research and publications preferred
Research Skills: Ability to read, understand, and implement techniques from recent ML research papers
Open Source: Demonstrated commitment to open source development and community collaboration

Experience

3+ years of experience in machine learning engineering or research
Experience with large-scale distributed training frameworks (Megatron-LM, DeepSpeed, FairScale, etc.)
Familiarity with inference optimization frameworks (vLLM, TensorRT, etc.)
Experience with containerization (Docker, Kubernetes) and cluster management
Background in systems programming and performance optimization

Preferred Qualifications

PhD or MS in Computer Science, Computer Engineering, Machine Learning, or related field
Experience with SLURM, Kubernetes, or other cluster orchestration systems
Knowledge of mixed precision training, data parallel training, and scaling laws
Experience with transformer architecture, pytorch, decoding algorithms
Familiarity with high performance GPU programming ecosystem
Previous contributions to major open source ML projects
Experience with MLOps and model deployment at scale
Understanding of modern attention mechanisms (multi-head attention, grouped query attention, etc.)

What We Offer

Open Source Impact: Your contributions will directly benefit the global ML research community
Cutting-Edge Research: Work on the latest developments in LLM training and inference
Collaborative Environment: Work alongside leading researchers and engineers in the field
Flexible Work: Remote-friendly culture with optional in-person collaboration
Professional Growth: Opportunities to publish research and speak at conferences
Competitive Compensation

Technical Environment

You'll be working with:

Frameworks: Megatron-LM, vLLM, HuggingFace Transformers, PyTorch
Infrastructure: Multi-GPU clusters, CUDA, AMD ROCm
Languages: Python, C/C++, CUDA/HIP
Tools: Docker, Kubernetes, SLURM, Git
Platforms: Linux HPC environments, cloud computing platforms

Application Process

To Apply:

Submit your resume highlighting relevant HPC and ML experience
Provide links to your GitHub profile and any relevant open source contributions
Share examples of your work with distributed computing or large-scale ML systems

Technical Interview Process:

Initial screening focusing on HPC and ML fundamentals
Technical deep-dive on distributed systems and parallel computing
Code review session examining ML algorithm implementation
System design discussion for large-scale training infrastructure

Join Us

Help us build the future of open source AI infrastructure. At ScalarLM, you'll contribute to technology that democratizes access to cutting-edge LLM capabilities while working with some of the brightest minds in high-performance computing and machine learning.

Ready to make an impact on the future of AI? We'd love to hear from you.

ScalarLM is an equal opportunity employer committed to diversity and inclusion. We welcome applications from all qualified candidates regardless of race, gender, age, religion, sexual orientation, or any other protected status.