Our Mission

Our mission is to solve the most important and fundamental challenges in AI and Robotics to enable future generations of intelligent machines that will help us all live better lives.

Machine Learning Operations (ML-Ops) Engineers build infrastructure that supports the entire lifecycle of Machine Learning (ML) projects from development to scaling and to deployment. If you have a passion for building the foundation that enables robotics research and engineering, you will want to join us!

What You Will Do

Design, develop, and maintain company-wide platforms and tooling that utilize Kubernetes infrastructure to enable machine learning and data processing applications

Enable self-service access to ML-compute for our on-prem and cloud compute clusters, including support for job scheduling, workload scalability and workload fault tolerance

Enhance observability across ML applications through integrations with tools and services such as FluentD, Prometheus, Grafana and DataDog

Integrate ML applications with experiment tracking and management services like Weights and Biases

Elevate code quality and champion best practices in our engineering processes

Collaborate with Machine Learning Engineers, Data Engineers, DEVOPs engineers and researchers to build scalable solutions that improve engineering and research velocity.

What You Will Bring

BS or MS in Computer Science, Engineering, or equivalent

3+ years of experience in an MLOPs, DevOps, ML Engineering or software engineering role

Strong hands-on experience deploying and managing applications running on Kubernetes

Experience developing MLOPS platforms to manage the lifecycle of ML experiments; including one or more of data and artifact management, reproducibility, fault-tolerance, experiment tracking and model serving

Experience with Docker and Python environment management tools such as pip, poetry, uv or similar

Proficient in software practices such as version control (Git), CI/CD (Github Actions, ArgoCD), Infrastructure as Code(Terraform).

Extra Skills We Value

Experience with Kueue, or similar job scheduling mechanisms

Experience with workflow orchestration tools such as Airflow, Metaflow, Argo Workflows or similar

Hands-on experience deploying and managing cloud infra on platforms like GCP and AWS

Experience with hybrid-cloud compute and data environments

Experience with Ray, Pytorch Lightning or similar scalable AI/ML platforms

Experience with application and system, logging with tools and services like FluentD, Prometheus, Grafana and DataDog or similar

Experience with Bazel build tool or similar

Experience with ML model serving frameworks such as Torchserve, ONNX runtime or similar

Experience working with research teams in an academic or industrial environment.

We provide equal employment opportunities to all employees and applicants for employment and prohibit discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.

Related Jobs

Together AI

3 weeks ago

Machine Learning Operations (MLOps) Engineer

Netherlands

machine learning Tensorflow Pytorch LLM+5 more

Tenstorrent

4 weeks ago

Engineer, Machine Learning

United States

machine learning reinforcement learning Tensorflow GIS+3 more

Quantcast

3 weeks ago

Machine Learning Engineer

machine learning LLM python research+1 more

Faculty

3 weeks ago

Machine Learning Engineer

London - Hybrid

machine learning AGI Tensorflow Pytorch+3 more

Upstart

3 weeks ago

Machine Learning Engineer

Remote

machine learning AGI python research

Upgrade Your Profile With Professional Headshots

Machine Learning Operations Engineer

What You Will Do

What You Will Bring

Extra Skills We Value

Share this job opportunity

Related Jobs

Machine Learning Operations (MLOps) Engineer

Engineer, Machine Learning

Machine Learning Engineer

Machine Learning Engineer

Machine Learning Engineer