AMD

Customer Solutions Engineer, Clustered Systems

Austin, Texas
185 days ago

Share:

WHAT YOU DO AT AMD CHANGES EVERYTHING We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world’s most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives. AMD together we advance_ Customer Solutions Engineer, Clustered Systems THE ROLE: We are looking for a Solutions Architect with experience designing and building Clustered Systems. Looking to play a key role in supporting the design and deployment of state-of-the-art AI/ML training and inferencing systems. Able to provide insights on at-scale system design and tuning mechanisms for large-scale compute runs. Excited to be working with the latest Accelerated computing and Deep Learning platforms and help customers to craft improved workflows and develop new solutions. Able to work well cross functionally with multiple organizations within AMD as well as with the customer to ensure a successful and trouble-free deployment. THE PERSON: Provide solutions to deploy large scale clustered system, ensure technical relationships with internal and external engineering teams, and build creative solutions based on AMD technology. Develop essential collateral such as white papers, guides, presentations, and test data to facilitate effective communication with customers and internal teams regarding the deployment/scaling of clustered systems. KEY RESPONSIBILITIES: Provide solutions to deploy large scale clustered systems. Collaborate with multi-functional teams built of customers, external partners, and internal teams from concept to prototype to deployments. Solve complex problems involving multi-site deployments of AMD products. Partner with OEM partners, AMD Engineering, Product, and Sales teams to secure design wins for customers. Enable development and growth of AMD product features through customer feedback and deployment evaluations. PREFERRED EXPERIENCE: 5+ years of experience in accelerated computing for datacenter/HPC solutions or related experience. Strong background in performance analysis, system profiling, and high-performance computing. Deep understanding of dense data center design and architecture including compute, storage, networking, cloud APIs, and IaaS. Conduct system profiling and performance analysis, utilizing tools such as perftest and rccl_test, to ensure systems operate at peak efficiency. Solid understanding of accelerated computing scheduling and I/O stacks. Experience modern automation, development, and resource management tooling such ansible, git, containers (docker), Kubernetes, etc. Knowledge of container networking, particularly Kubernetes, and experience with DevOps practices. Proficient in Linux based networking technologies and protocols such as RDMA, RoCE, CNI-based container networking, InfiniBand, Ethernet, NVLINK, and familiar with various network topologies, routing protocols and network security practices. Clear verbal and written communication skills, capable of effectively teaching others and contributing to a team's success through collaboration and open information sharing. (desired) A networker that collaborates with both intra-team and inter-team members; who promotes knowledge sharing (and able to turn that knowledge into standard operating procedures). (desired) Skilled in the development of SOPs and team knowledge base management. (desired) Experience working with engineering or research community supporting high performance computing or deep learning. ACADEMIC CREDENTIALS: BS (or equivalent experience) in Computer Science, Engineering, Physics or Mathematics (desired) Professional Credentials such as – CISSP, CSP (AWS, GCP, Azure) SA-pro, RHCA, CKA, CKS or other industry recognized certification LOCATION: AUstin Texas but open to the possibility of remote AMD makes extensive use of conferencing tools, but occasional travel is required for a local on-site visit to customers and conferences. #LI-RW1 At AMD, your base pay is one part of your total rewards package. Your base pay will depend on where your skills, qualifications, experience, and location fit into the hiring range for the position. You may be eligible for incentives based upon your role such as either an annual bonus or sales incentive. Many AMD employees have the opportunity to own shares of AMD stock, as well as a discount when purchasing AMD stock if voluntarily participating in AMD’s Employee Stock Purchase Plan. You’ll also be eligible for competitive benefits described in more detail here. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

Please mention that you found this job on MoAIJobs, this helps us grow, thanks!

Related Jobs

Meta
Enterprise Systems Engineer
Redmond, WA, Burlingame, CA
AMD
Systems Design Engineer
Taipei City 115, Taiwan
Meta
Enterprise Systems Engineer
Redmond, WA, Burlingame, CA
Apptronik
Senior Systems Engineer
Austin, TX