companyKrea logo

Engineer, Supercomputing & Distributed Systems

KreaSan Francisco
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Entry Level

Qualifications

Qualifications:Bachelor's degree in Computer Science, Engineering, or a related field. Familiarity with AI systems and frameworks. Experience with distributed systems and data pipelines. Proficiency in programming languages such as Python, Go, or similar. Strong analytical and problem-solving skills. Ability to work collaboratively in a fast-paced environment.

About the job

Join Krea's Innovative Team

At Krea, we are at the forefront of developing next-generation AI creative tools. Our commitment lies in making AI an intuitive and controllable medium for creatives. We aspire to create tools that enhance human creativity rather than replace it.

We view AI as a transformative medium that enables expressions across diverse formats—text, images, video, sound, and even 3D. Our focus is on creating smarter, more adaptable tools that leverage this medium effectively.

The Role of Supercomputing and AI Infrastructure at Krea

Our team is responsible for building and managing the foundational infrastructure that supports Krea's research and inference processes. This includes distributed training systems, over 1000 Kubernetes GPU clusters, and extensive petabyte-scale data pipelines. Much of our work involves creating bespoke solutions, such as custom distributed datastores, job orchestration systems, and advanced streaming pipelines, which are designed to handle modern AI workloads efficiently.

Key Projects You Will Contribute To:

  • Distributed Data Systems: Design and implement multi-stage pipelines to transform petabytes of raw data into clean, annotated datasets; run classification models across billions of images; deploy and integrate large language models to caption extensive multimedia data.

  • GPU Infrastructure: Manage distributed training and inference across 1000+ GPU Kubernetes clusters; address orchestration and scaling challenges for large-scale GPU job processing; optimize research workflows across multiple datacenters.

  • Distributed Training: Profile and enhance dataloaders streaming thousands of images per second; troubleshoot InfiniBand networking during extensive training runs; develop fault tolerance systems for large-scale pretraining; collaborate with researchers to refine reinforcement learning infrastructure.

  • Applied ML Pipelines: Identify clean scenes in millions of videos utilizing distributed shot-boundary detection; tailor and train models to sift through billions of images for specific queries; construct systems that link raw cluster capacity with research outcomes.

About Krea

Krea is pioneering the development of next-generation AI creative tools aimed at empowering creatives. Our mission is to make AI a complementary force in the creative process, allowing individuals to express themselves through various mediums.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.