About Sesame

Sesame believes in a future where computers are lifelike - with the ability to see, hear, and collaborate with us in ways that feel natural and human. With this vision, we're designing a new kind of computer, focused on making voice companions part of our daily lives. Our team brings together founders from Oculus and Ubiquity6, alongside proven leaders from Meta, Google, and Apple, with deep expertise spanning hardware and software. Join us in shaping a future where computers truly come alive.

About the role

We are seeking an experienced Machine Learning Model Serving Engineer to join our team. This role focuses on optimizing and deploying scalable machine learning models in production, particularly large language models (LLMs), text-to-speech (TTS), and speaker recognition.

Responsibilities:

Optimize and deploy real-time, scalable ML models in production. Leverage latest techniques to squeeze as much throughput and speed as possible out of cutting edge model architectures.
Work with Kubernetes, Ray, and Torch to improve model serving infrastructure.
Manage deployments on Google Cloud Platform (GCP) using NVIDIA H100 GPUs.
Collaborate with ML engineers and infrastructure teams to ensure performance and reliability.
Conduct bottleneck analysis and systems performance tuning for inference workloads.

Required qualifications:

Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
Deep experience in deploying and managing scalable machine learning models using modern model-serving approaches.
Strong knowledge of cloud platforms (GCP preferred).
Performance analysis and optimization, including profiling latency, throughput, and memory usage in ML inference.

Preferred qualifications: PyTorch experience (strongly preferred but not strictly required).

Deep experience with Kubernetes, Ray, and Torch.
Experience with building large-scale Kubernetes infrastructure.
Proficiency in infrastructure as code (IaC) for managing deployments.
Experience implementing CI/CD pipelines for ML model deployment.
Experience with real-time audio processing and media streaming.

Sesame is committed to a workplace where everyone feels valued, respected, and empowered. We welcome all qualified applicants, embracing diversity in race, gender, identity, orientation, ability, and more. We provide reasonable accommodations for applicants with disabilities—contact careers@sesame.com for assistance.

Full-time Employee Benefits:

401k matching
100% employer-paid health, vision, and dental benefits
Unlimited PTO and sick time
Flexible spending account matching (medical FSA)

Benefits do not apply to contingent/contract workers

Related Jobs

Waymo

2 weeks ago

ML Engineer, Foundation Model Evaluation

Remote

machine learning robot reinforcement learning Tensorflow+4 more

Egen

6 days ago

ML Engineer

machine learning natural language Tensorflow Langchain+8 more

DoorDash

3 weeks ago

Staff Software Engineer, ML Serving Platform

San Francisco, CA; Sunnyvale, CA; Seattle, WA; New York, NY

machine learning Tensorflow GIS Pytorch+4 more

DoorDash

2 weeks ago

Staff Software Engineer, ML Serving Platform

San Francisco, CA; Sunnyvale, CA; Seattle, WA; New York, NY

machine learning Tensorflow GIS Pytorch+4 more

OpenAI

3 weeks ago

ML Infrastructure Engineer

San Francisco

machine learning AGI python research

Upgrade Your Profile With Professional Headshots

ML Model Serving Engineer

About the role

Responsibilities:

Required qualifications:

Preferred qualifications: PyTorch experience (strongly preferred but not strictly required).

Share this job opportunity

Related Jobs

ML Engineer, Foundation Model Evaluation

ML Engineer

Staff Software Engineer, ML Serving Platform

Staff Software Engineer, ML Serving Platform

ML Infrastructure Engineer