5 days ago

Staff Cloud Software Engineer, Cloud Infrastructure

United States

Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations, ease of use, and cost efficiency. With AI redefining the computing paradigm, solutions must evolve to unify innovations in software models, compilers, platforms, networking, and semiconductors. Our diverse team of technologists have developed a high performance RISC-V CPU from scratch, and share a passion for AI and a deep desire to build the best AI platform possible. We value collaboration, curiosity, and a commitment to solving hard problems. We are growing our team and looking for contributors of all seniorities.

This Staff Cloud Software position is looking to bring new specialized expertise into the team in the area of distributed high-performance and AI computing, especially in Kubernetes-based cloud native environments. You will be driving design, implementation, and integration of systems to support scaling compute capabilities seamlessly from single-host systems into exaflop-scale clusters.

This role is hybrid, based out of Santa Clara, CA or Austin, TX.

We welcome candidates at various experience levels for this role. During the interview process, candidates will be assessed for the appropriate level, and offers will align with that level, which may differ from the one in this posting.

 

Responsibilities:

  • Design and drive implementation of distributed systems for AI computing applications in Cloud and novel supercomputing cluster environments
  • Hands-on software development, testing, integration, operations, and support
  • Closely collaborate with the team through the full stack and life cycle of AI data center applications, from data center design and rollout to MLOps
  • Operate within on-premises data centers and public cloud environments
  • Drive projects through their whole software development lifecycle, both on technical and non-technical side
  • Collaboration with both highly technical and non-technical stakeholders with differing backgrounds, being able to communicate highly complex topics to diverse audiences
  • Continuous improvement of engineering practices through code reviews and adoption of relevant techniques and technologies

 

Experience & Qualifications:

  • 10+ years of hands-on software engineering experience working with distributed systems in Cloud and/or HPC environments
  • 5+ years of experience working with clustered (multi-host) AI hardware and applications for training and inference
  • 5+ years of experience with Kubernetes clusters, including cluster and application deployment (e.g., CNI, CSI, Helm), operations, and development of extensions (e.g., Device plugins, Operators)
  • Strong working knowledge of Python and Go
  • Infrastructure as Code as a first-class citizen (e.g. Ansible)
  • Strong Git, GitOps, and CI/CD experience
  • Familiarity with performance requirement implications of AI/ML workloads, both inference and training
  • Familiarity with virtualization technologies and platforms
  • Hands-on experience with MLOps concepts and frameworks for end-to-end model training pipelines
  • Strong understanding of networking concepts – experience with network hardware configuration and management is a plus
  • Familiarity with security implications of multi-tenant environments on hardware, software, and networking level
  • Familiarity with observability, monitoring and alerting tools (e.g., Grafana, Prometheus, Loki)
  • Agile / lean software project management experience
  • Strong programming skills with years of experience in various programming languages; familiarity of both object oriented and functional programming
  • REST API development and integration experience – full-stack web development experience is a plus

 

Compensation for all engineers at Tenstorrent ranges from $100k - $500k including base and variable compensation targets. Experience, skills, education, background and location all impact the actual offer made.

Tenstorrent offers a highly competitive compensation package and benefits, and we are an equal opportunity employer.

Due to U.S. Export Control laws and regulations, Tenstorrent is required to ensure compliance with licensing regulations when transferring technology to nationals of certain countries that have been licensing conditions set  by the U.S. government.

Our engineering positions and certain engineering support positions require access to information, systems, or technologies that are subject to U.S. Export Control laws and regulations, please note that citizenship/permanent residency, asylee and refugee information and/or documentation will be required and considered as Tenstorrent moves through the employment process.

If a U.S. export license is required, employment will not begin until a license with acceptable conditions is granted by the U.S. government.  If a U.S. export license with acceptable conditions is not granted by the U.S. government, then the offer of employment will be rescinded.

Please mention that you found this job on MoAIJobs, this helps us grow. Thank you!

Share this job opportunity

Related Jobs

Celonis
3 weeks ago

Staff Cloud Infrastructure Software Engineer

New York, US, New York
Celonis
3 weeks ago

Staff Cloud Infrastructure Software Engineer

Palo Alto, US, California
Celonis
3 weeks ago

Staff Software Engineer - Cloud Infrastructure

Munich, Germany
Celonis
3 weeks ago

Staff Software Engineer - Cloud Infrastructure

Raleigh, US, North Carolina
Celonis
3 weeks ago

Staff Software Engineer - Cloud Infrastructure

New York, US, New York