About Stability:

Stability AI is a community and mission driven, open artificial intelligence company that cares deeply about real-world implications and applications. Our most considerable advances grow from our diversity in working across multiple teams and disciplines. We are unafraid to go against established norms and explore creativity. We are motivated to generate breakthrough ideas and convert them into tangible solutions. Our vibrant communities consist of experts, leaders and partners across the globe who are developing cutting-edge open AI models for Image, Language, Audio, Video, 3D and Biology.

About the role:

We are looking for a talented Data Engineer with a focus on scaling efficient distributed workloads. You will work alongside a growing multidisciplinary team of talented research scientists and machine learning engineers to improve and scale the efficiency within our models. In this role, you will contribute to groundbreaking projects such as training the largest open language models and be responsible for ensuring data is collected, processed and utilized in the right way.

Responsibilities:

Clean, normalize, and preprocess data in a scalable, parallelizable way to prepare it for ingestion into our machine learning model training pipelines while ensuring of data quality
Building and maintaining highly scalable distributed workloads
Build data pipelines to ingest and process data (e.g. images and text) for feeding into ML models
AWS Resource Management
Keep up-to-date with methods regarding how to improve data quality and/or curate data for Image, Video, LLMs etc.

Qualifications:

Proven background within large scale distributed workloads
Experience with large scale data loading for machine learning training runs
Experience with cloud storage and file systems. AWS (S3) is strongly preferred, but open to other cloud platforms
Experience with Python + Pytorch
Experience with multiprocessing and multithreading python workloads
Excellent communication skills to effectively collaborate with users, solve issues, and provide guidance.
Attention to detail and the ability to document processes and solutions effectively.
Strong interest in Generative AI
Experience working with Machine Learning projects and ideally some Deep learning / Comp Vision knowledge
Experience with dataloading stack (webdataset, torchdata, fsspec, AIstore) and parallel dataframe manipulation using Pyspark/Ray is a plus point

Equal Employment Opportunity:

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or other legally protected statuses.

Related Jobs

Anthropic

2 weeks ago

Research Engineer, Data Infra

San Francisco, CA

machine learning GIS python mlops+2 more

Research Engineer

Pittsburgh, PA

machine learning robot AGI Pytorch+6 more

Invisible

3 weeks ago

Data Engineer

New York; San Francisco

SnowFlake python big data database+1 more

Monks

3 weeks ago

Data Engineer

Canada

python database

Research Engineer

Pittsburgh, PA

machine learning robot AGI Pytorch+6 more

Upgrade Your Profile With Professional Headshots

Data Engineer - Research

Share this job opportunity

Related Jobs

Research Engineer, Data Infra

Research Engineer

Data Engineer

Data Engineer

Research Engineer