Fireworks
17 hours ago

AI Infrastructure Engineer

Redwood City

Share:


Job Duties:
Design core, backend software components. Interface with other teams to incorporate their innovations and vice versa. Conduct design and code reviews. Analyze and improve efficiency, scalability, and stability of various system resources. Design and implement the hardware and software infrastructure required for AI projects. Procure, configure, and manage servers, GPUs, TPUs, and other hardware resources. Set up cloud-based environments (e.g., AWS, Azure, GCP) for AI workloads. Deploy and manage distributed computing clusters (e.g., Kubernetes) for AI model training and inference. Optimize cluster performance and resource allocation for AI workloads. Monitor cluster health and troubleshoot issues as they arise. Architect and maintain data storage solutions (e.g., data lakes, databases) for AI datasets. Ensure data security, access controls, and data versioning. Implement data pipelines for efficient data ingestion and preprocessing. Develop and maintain automation scripts and tools for infrastructure provisioning and scaling. Implement continuous integration and continuous deployment (CI/CD) pipelines for AI models. Orchestrate workflows for training, evaluation, and deployment of AI models. Optimize infrastructure to handle large-scale AI workloads efficiently. Monitor and analyze system performance, making adjustments as needed. Implement load balancing and scaling strategies to meet demand. Implement security best practices to protect AI infrastructure and data. Stay up-to-date with security vulnerabilities and apply patches and updates. Ensure compliance with relevant data privacy and regulatory requirements. Collaborate with data scientists and AI engineers to understand their infrastructure needs. Provide technical support and troubleshooting assistance for AI infrastructure issues. Train and educate team members on best practices for using AI infrastructure.

Minimum Education & Experience Required:
Must have Bachelor’s degree or the equivalent in Computer Science, Computer Engineering or a related field, plus three (3) years of experience with ML infrastructure (PyTorch, Vertex AI, and Sagemaker) or related experience.


Minimum Skills Required:
Must have experience with: Experience with one or more search engine, recommendations, natural language processing, personalization, or similar applied ML domain. Experience with building, scaling, and optimizing distributed enterprise-grade Machine Learning systems. Experience with architectural patterns of large-scale software applications. Experience with publishing papers in machine learning and/or computer vision conferences and journals. Experience with large-scale machine learning techniques like semi-supervised learning, weakly-supervised learning, and online adaptation of ML models. Experience with publishing machine learning domains such as computer vision and natural language processing.

How to Apply:
Submit resume and apply online at http://www.fireworks.ai/careers and search for job by title.

Please mention that you found this job on MoAIJobs, this helps us grow, thanks!

Related Jobs

ScaleAI

1 week ago

Data Infrastructure Engineer, AI Infrastructure

San Francisco, CA; New York, NY

Normal Computing

1 month ago

AI Engineer

London

Normal Computing

2 weeks ago

AI Engineer

Copenhagen

DRW

1 week ago

AI Engineer

Chicago

DRW

1 week ago

AI Engineer

London