Job Overview

The AI Inference team, within the Contextual AI’s platform organization, designs, builds, and operates Gen AI and LLM inference systems at scale. The team pioneers system innovation to optimize latency, throughput, and cost for all Contextual AI’s models powered by RAG 2.0 technology.

What you’ll do:

As a Member of Technical Staff at Contextual AI, you will:
• Design, develop, test, and deploy high-performance inference solutions for—though not limited to—Gen AI state-of-the-art model architectures, RAG 2.0, knowledge retrieval models, and language encoders.
• Be responsible for optimizing end-to-end inference latency, throughput, and cost, ensuring the most efficient use of our inference cluster.
• Drive system architecture, spearhead best practices, and mentor junior engineers.
• Improve the reliability, scalability, and observability of our distributed inference infrastructure.
• Read papers and consult with scientists to gain insights into emerging techniques, integrating them into our roadmap.
• Design and experiment with new algorithms, benchmarking the latency and accuracy of your implementations.

What we’re seeking:

• M.Sc. or PhD in Computer Science, Engineering, Statistics, Mathematics, or a related field.
• 5+ years of non-internship professional software development experience, including experience in leading design or architecture of new and existing systems.
• Experience as a mentor, tech lead, or leading an engineering team.
• Proficiency in Python, PyTorch, multi-threaded asynchronous C++/Go, and performance optimization.
• Experience with GPU programming and the GPU inference stack: TensorRT-LLM, Triton, CUDA, and CUPTI.
• Proficiency in the TensorFlow and/or PyTorch frameworks.
• Experience with Linux kernel system calls or the POSIX API (process control, communication, and device management).
• A problem-solving mindset, owning tasks end-to-end and acquiring the necessary knowledge to get the job done.
• A good intuition for when off-the-shelf solutions are sufficient and the ability to build tools to accelerate your workflow when they aren’t.
• The ability to move quickly in an environment where things are sometimes loosely defined and may have competing priorities or deadlines.

Location: Mountain View, CA

Salary Range for California Based Applicants: $140,000 - $300,000 + equity + benefits (actual compensation will be determined based on experience, location, and other factors permitted by law).

Equal Opportunity

Contextual AI is an equal opportunity employer and complies with all applicable federal, state, and local fair employment practices laws. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, ancestry, sex, sexual orientation, gender, gender expression, gender identity, genetic information or characteristics, physical or mental disability, marital/domestic partner status, age, military/veteran status, medical condition, or any other characteristic protected by law.

Related Jobs

Liquid AI

3 weeks ago

Member of Technical Staff - Edge AI Inference Engineer

AGI Pytorch python llama

Cohere

2 weeks ago

Member of Technical Staff, Model Serving

San Francisco

machine learning AGI RAG LLM+3 more

Moonvalley

2 weeks ago

Member of Technical Staff - Applied Research Scientist

machine learning Tensorflow Pytorch generative ai+4 more

Amazon

1 month ago

Member of Technical Staff, AGI Autonomy

US, CA, San Francisco

AGI reinforcement learning LLM research

Inflection AI

4 weeks ago

Member of Technical Staff, Platform Engineer

Palo Alto, CA

AGI Pytorch python postgres+2 more

Upgrade Your Profile With Professional Headshots

Member of Technical Staff (AI Inference)

Share this job opportunity