Skip to main content

Machine Learning Systems Engineer

Greg Diamos

Unified LLM Training & Inference Platform

Location: San Francisco Bay Area / Remote

Type: Full-time


About ScalarLM

ScalarLM unifies vLLM, Megatron-LM, and HuggingFace for fast LLM training, inference, and self-improving agents—all via an OpenAI-compatible interface. ScalarLM builds on top of the vLLM inference engine, the Megatron-LM training framework, and the HuggingFace model hub. It unifies the capabilities of these tools into a single platform, enabling users to easily perform LLM inference and training, and build higher lever applications such as Agents with a twist - they can teach themselves new abilities via back propagation.

ScalarLM is inspired by the work of Seymour Roger Cray (September 28, 1925 – October 5, 1996), an American electrical engineer and supercomputer architect who designed a series of computers that were the fastest in the world for decades, and founded Cray Research, which built many of these machines. Called "the father of supercomputing", Cray has been credited with creating the supercomputer industry.

We are a fully open source project (CC-0 Licensed) focused on democratizing access to cutting-edge LLM infrastructure that combines training and inference in a unified platform, enabling the development of self-improving AI agents similar to DeepSeek R1.

ScalarLM is supported and maintained by TensorWave and Relational AI.


The Role

We are seeking a passionate Machine Learning Engineer who will contribute directly to the ScalarLM open source codebase as well as build LLM applications on top of it. This role is perfect for someone who wants to work at the intersection of high-performance computing, distributed systems, and cutting-edge machine learning research. You'll be working on fundamental infrastructure that enables researchers and organizations worldwide to train and deploy large language models at scale.

Key Responsibilities:

  • Contribute code and improvements to the ScalarLM open source project
  • Develop and optimize distributed training algorithms for large language models
  • Implement high-performance inference engines and optimization techniques
  • Work on integration between vLLM, Megatron-LM, and HuggingFace ecosystems
  • Build tools for seamless model training, fine-tuning, and deployment
  • Optimize performance advanced GPU architectures
  • Collaborate with the open source community on feature development and bug fixes
  • Research and implement new techniques for self-improving AI agents

Required Qualifications

Technical Skills

  • Programming Languages: Proficiency in both C/C++ and Python
  • High Performance Computing: Deep understanding of HPC concepts including:
    • MPI (Message Passing Interface) programming and optimization
    • Bulk Synchronous Parallel (BSP) computing models
    • Multi-GPU and multi-node distributed computing
    • CUDA/ROCm programming experience preferred
  • Machine Learning Foundations:
    • Solid understanding of gradient descent and backpropagation algorithms
    • Experience with transformer architectures and ability to explain their mechanics
    • Knowledge of deep learning training and their applications
    • Understanding of distributed training techniques (data parallelism, model parallelism, pipeline parallelism, large batch training, optimization)

Research & Development

  • Publications: Experience with machine learning research and publications preferred
  • Research Skills: Ability to read, understand, and implement techniques from recent ML research papers
  • Open Source: Demonstrated commitment to open source development and community collaboration

Experience

  • 3+ years of experience in machine learning engineering or research
  • Experience with large-scale distributed training frameworks (Megatron-LM, DeepSpeed, FairScale, etc.)
  • Familiarity with inference optimization frameworks (vLLM, TensorRT, etc.)
  • Experience with containerization (Docker, Kubernetes) and cluster management
  • Background in systems programming and performance optimization

Preferred Qualifications

  • PhD or MS in Computer Science, Computer Engineering, Machine Learning, or related field
  • Experience with SLURM, Kubernetes, or other cluster orchestration systems
  • Knowledge of mixed precision training, data parallel training, and scaling laws
  • Experience with transformer architecture, pytorch, decoding algorithms
  • Familiarity with high performance GPU programming ecosystem 
  • Previous contributions to major open source ML projects
  • Experience with MLOps and model deployment at scale
  • Understanding of modern attention mechanisms (multi-head attention, grouped query attention, etc.)

What We Offer

  • Open Source Impact: Your contributions will directly benefit the global ML research community
  • Cutting-Edge Research: Work on the latest developments in LLM training and inference
  • Collaborative Environment: Work alongside leading researchers and engineers in the field
  • Flexible Work: Remote-friendly culture with optional in-person collaboration
  • Professional Growth: Opportunities to publish research and speak at conferences
  • Competitive Compensation

Technical Environment

You'll be working with:

  • Frameworks: Megatron-LM, vLLM, HuggingFace Transformers, PyTorch
  • Infrastructure: Multi-GPU clusters, CUDA, AMD ROCm
  • Languages: Python, C/C++, CUDA/HIP
  • Tools: Docker, Kubernetes, SLURM, Git
  • Platforms: Linux HPC environments, cloud computing platforms

Application Process

To Apply:

  1. Submit your resume highlighting relevant HPC and ML experience
  2. Provide links to your GitHub profile and any relevant open source contributions
  3. Share examples of your work with distributed computing or large-scale ML systems

Technical Interview Process:

  • Initial screening focusing on HPC and ML fundamentals
  • Technical deep-dive on distributed systems and parallel computing
  • Code review session examining ML algorithm implementation
  • System design discussion for large-scale training infrastructure

Join Us

Help us build the future of open source AI infrastructure. At ScalarLM, you'll contribute to technology that democratizes access to cutting-edge LLM capabilities while working with some of the brightest minds in high-performance computing and machine learning.

Ready to make an impact on the future of AI? We'd love to hear from you.


ScalarLM is an equal opportunity employer committed to diversity and inclusion. We welcome applications from all qualified candidates regardless of race, gender, age, religion, sexual orientation, or any other protected status.