Stability AI

Software Engineer, Datasets (Japanese fluency required)

Tokyo, Japan
364 days ago

Share:

About Stability: 

Stability AI is a community and mission driven, open artificial intelligence company that cares deeply about real-world implications and applications. Our most considerable advances grow from our diversity in working across multiple teams and disciplines. We are unafraid to go against established norms and explore creativity. We are motivated to generate breakthrough ideas and convert them into tangible solutions. Our vibrant communities consist of experts, leaders and partners across the globe who are developing cutting-edge open AI models for Image, Language, Audio, Video, 3D and Biology.

About the role: 

We are looking for a versatile Software Engineer to build the best datasets for training generative AI models. We will focus on projects related to image/video models, large language models, and chatbots, mostly related to Japan/Japanese.

You will adapt quickly as we try various approaches to various industries in a fast-changing environment. You will have access to state-of-the-art high-performance computing resources, and you will be able to work alongside top researchers and engineers to truly make an impact in the fast-growing world of generative AI. 

Responsibilities:  

  • Lead efforts to develop and maintain robust, scalable web scraping systems for collecting text and image data from various websites and APIs.
  • Systematize web scraping processes to improve speed and efficiency
  • Clean, normalize, and preprocess raw data in a scalable, parallelizable way to prepare it for ingestion into our machine learning model training pipelines while ensuring data quality
  • Collaborate with our machine learning teams to understand their data requirements and build suitable datasets.
  • Implement data quality checks and troubleshooting mechanisms to maintain the integrity and reliability of our datasets.
  • Keep up-to-date with papers / methods regarding how to improve data quality and/or curate data for LLMs etc.
  • Build pipelines to ingest and process data (e.g. images and text) for feeding into ML models
  • Provide technical advice to partners/clients on the integration of generative models into their products

Qualifications: 

  • Good communication skills with fluency in Japanese (including reading/typing) and business-level English proficiency
  • 5+ years of software development experience with high proficiency in 2 or more languages (Python required) across a variety of projects
  • Experience with data engineering (data pipelines for ML projects)
  • Experience with Linux and command line tools
  • Experience with cloud computing and APIs
  • Experience with web scraping/crawling

Equal Employment Opportunity:

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or other legally protected statuses. 






Please mention that you found this job on MoAIJobs, this helps us grow, thanks!

Related Jobs

Codeium
Software Engineer
Mountain View (HQ)
Otter
Software Engineer, Android
Mountain View, CA
Leonardo AI
Software Engineer - NodeJS
British Columbia (Remote)
Anthropic
Software Engineer, Agents Infrastructure
San Francisco, CA | New York City, NY | Seattle, WA
OpenAI
Software Engineer, Platform Visualization
San Francisco