Role: As TPM at Together AI, you will be at the core of building, optimizing, and scaling the global GPU resources needed for a pioneering AI infrastructure company. Your role is crucial in ensuring that the backbone of our AI models, thousands of GPUs distributed around the world, operates efficiently and reliably, enabling cutting-edge AI advancements that democratize access to AI technology globally. You will play a critical role in streamlining our workflows, improving collaboration and communication between internal and external teams, and enhancing our overall ability to deliver high-quality products and services. You’ll have the opportunity to shape the future of AI infrastructure, working alongside top engineers, researchers, and innovators to power the next generation of AI-driven solutions.
Requirements
- 7+ years of experience in technical program or project management, with a focus on large-scale technology deployments.
- A technical background, with demonstrated ability to engage on technical topics - typically demonstrated by an Engineering degree or equivalent technical experience.
- Experience with cloud computing platforms, decentralized cloud infrastructure, and/or similar large-scale technology deployments.
- Experience with cloud-based technologies, such as AWS, Google Cloud, or Azure, and distributed systems, including containerization and orchestration tools.
- Knowledge of data center operations, including power, cooling, and networking systems.
- Excellent communication and project management skills, with the ability to work effectively with external vendors and internal stakeholders.
- Strong process management skills, with experience in developing and implementing processes to ensure efficient and effective deployment of large-scale infrastructure.
- Experience working in a fast-paced, dynamic environment, with a proven track record of driving results and delivering complex programs and products.
- Ability to thrive in a collaborative environment involving different stakeholders and subject matter experts.
- Strong analytical and problem-solving skills, with the ability to identify and mitigate risks.
Responsibilities
- Develop and execute strategic plans for large-scale GPU cluster deployments, ensuring that all deployments are delivered on time, within budget, and to the required quality standards.
- Coordinate with external data center providers and hardware vendors on timelines, and ensure seamless integration with internal teams.
- Identify and mitigate risks, and develop and implement contingency plans to ensure business continuity.
- Communicate project progress, status, and plans to internal stakeholders and customer groups, ensuring transparency and alignment across the organization.
- Work with Engineering, DevOps and SRE teams to improve collaboration and communication, and enhance our overall ability to deliver high-quality products and services.
- Analyze current processes, identify bottlenecks and areas for improvement, and develop and implement new processes and procedures to increase efficiency and effectiveness.
- Continuously seek opportunities to improve processes and systems, including the implementation of automation and data analytics dashboards.
About Together AI
Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers in our journey in building the next generation AI infrastructure.
Compensation
We offer competitive compensation, startup equity, health insurance and other competitive benefits. The US base salary range for this full-time position is: $227-266K + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge. This is a hybrid role based in the Bay Area.
Equal Opportunity
Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.
Please see our privacy policy at https://www.together.ai/privacy