Lead Data Scientist At Fieldguide San Francisco Ca jobs in San Francisco – Browse 10,795 openings on RoboApply Jobs

Lead Data Scientist At Fieldguide San Francisco Ca jobs in San Francisco

Open roles matching “Lead Data Scientist At Fieldguide San Francisco Ca” with location signals for San Francisco. 10,795 active listings on RoboApply Jobs.

10,795 jobs found

1 - 20 of 10,795 Jobs
Apply
Fieldguide logo
Full-time|Remote|San Francisco, CA

About UsAt Fieldguide, we are redefining trust in global commerce and capital markets by automating and enhancing the workflows of assurance and audit professionals, particularly in cybersecurity, privacy, and financial auditing. Our mission is simple: to create software that facilitates trust between businesses.Although we are headquartered in San Francisco…

Jan 15, 2026
Apply
Fieldguide logo
Full-time|On-site|San Francisco, CA

About UsAt Fieldguide, we are revolutionizing global commerce and capital markets by automating the vital work of assurance and audit professionals, particularly in cybersecurity, privacy, and financial audits. Our mission is straightforward: we create innovative software solutions that foster trust among businesses.Based in San Francisco, CA, we embrace a remote-first culture that empowers you to excel in your work from any location. Our company is proudly supported by prestigious investors such as Goldman Sachs Alternatives, Bessemer Venture Partners, 8VC, Floodgate, Y Combinator, DNX Ventures, Global Founders Capital, Justin Kan, Elad Gil, and others.Diversity is at the heart of our values. We actively seek individuals from varied backgrounds and experiences to contribute to the evolution of audit and advisory services. Our team embodies inclusivity, ambition, humility, and supportiveness. We are intentional and reflective about the team culture we aim to cultivate, welcoming members who excel in their skills while genuinely caring about one another's growth.Joining our early-stage start-up means playing a crucial role in shaping the future of business trust. We enhance the lives of audit practitioners by streamlining up to 50% of their workload, promoting a healthier work-life balance. If you resonate with our values and are excited about nurturing a remarkable culture and product, Fieldguide is your ideal place to thrive.About the RoleFieldguide is at the forefront of developing AI agents tailored for the intricate workflows of audit and advisory services. As a San Francisco-based Vertical AI company, we operate within a rapidly evolving $100B+ market. Our solutions are trusted by over 50 of the top 100 accounting and consulting firms to drive their most critical functions.Supported by renowned investors such as Bessemer Venture Partners, 8VC, Floodgate, Y Combinator, and Elad Gil, we are poised for growth.As an AI Engineer, you will be instrumental in designing and building the intelligence layer of Fieldguide, focusing on the agentic workflows, architectures, and evaluation systems that fuel enterprise-grade agents. This role bridges product engineering, applied AI, and production systems.We are hiring for all levels of experience and will assess seniority during the interview process based on your background and aspirations. This position is geared towards engineers who appreciate in-person collaboration at our San Francisco office.

Jan 7, 2026
Apply
Fieldguide logo
Full-time|Remote|San Francisco, CA or Remote (USA)

About FieldguideAt Fieldguide, we are transforming the landscape of global commerce by revolutionizing the approaches to cybersecurity, privacy, and financial audits. Our innovative software solutions empower professionals to establish trust within businesses, enhancing their efficiency and delivering superior results.As a remote-first company based in San Francisco, CA, we are supported by prestigious investors such as Bessemer Venture Partners, 8VC, Y Combinator, and Floodgate. We pride ourselves on cultivating a diverse workforce and are committed to fostering an inclusive, humble, and high-performing company culture.About the RoleWe are seeking to onboard Software Engineers at various levels to contribute to our expanding team. Depending on your background and aspirations, you could work as a mid-level engineer, assume leadership responsibilities, or influence the technical direction across teams and systems. The specific level will be assessed during the interview process based on your experience and capabilities.This position offers a unique opportunity to make a significant impact in a company that has achieved product-market fit and is poised for substantial growth, challenging outdated audit and accounting tools that have long been overlooked.Your ResponsibilitiesCraft, develop, and implement high-quality features that create value for customers and drive business success.Work collaboratively with product and design teams to transform complex challenges into user-centered solutions.Maintain a balance between rapid iteration and sustainable system health.Continuously enhance our technology stack, developer workflows, and reliability standards.Foster a nurturing, growth-focused engineering culture based on trust, continuous learning, and excellence.

Jan 7, 2026
Apply
Fieldguide logo
Full-time|Remote|San Francisco, CA

Join Our Team:At Fieldguide, we are revolutionizing the foundation of trust in global commerce and capital markets by enhancing the efficiency of assurance and audit practitioners, specifically in cybersecurity, privacy, and financial audits. Our mission is to create innovative software that empowers those who foster trust among businesses.Based in San Francisco, CA, we embrace a remote-first culture, allowing you to work from anywhere while achieving your best performance. Our company is proudly supported by leading investors, including Growth Equity at Goldman Sachs Alternatives, Bessemer Venture Partners, 8VC, Floodgate, Y Combinator, DNX Ventures, Global Founders Capital, Justin Kan, Elad Gil, and many more.Diversity is crucial to our success — we welcome individuals from all backgrounds and experiences to join us in shaping the future of audit and advisory. Our team at Fieldguide is inclusive, motivated, humble, and supportive. We are intentional and reflective about the culture we are building, seeking team members who excel in their skills and are dedicated to nurturing each other's development.As a member of our early-stage startup, you will have the chance to help define the future of business trust. We simplify the work of audit practitioners by consolidating up to 50% of their tasks, promoting a healthier work-life balance. If you are passionate about fostering a vibrant culture and creating exceptional products, you will find your place at Fieldguide.Your ResponsibilitiesAgent Infrastructure and Tooling: Design and manage the internal platform that ensures reliable and easily adoptable agentic workflows for our team: MCP integrations, prompt/skill libraries, shared configurations, knowledge management, and the tools that link agents to our codebase, documentation, and internal services.Developer Experience for AI Workflows: Streamline AI usage for all team members by creating intuitive onboarding processes, comprehensive documentation, and clear pathways for common tasks (planning, testing, ticket creation, code review, production support), along with feedback mechanisms to assess effectiveness.Measurement and Attribution: Develop systems to track efficiency metrics, agent performance, and overall team productivity.Experimentation and Evaluation: Given the rapidly changing AI tooling environment, conduct structured trials of new tools, distill insights, and maintain an evolving perspective that guides the team.Enablement and Culture: Provide training sessions, office hours, internal demonstrations, and coaching to uplift skills across the organization, ensuring that every engineer has the resources to succeed.

Feb 20, 2026
Apply
fieldguide logo
Full-time|Remote|San Francisco, CA or Remote (USA)

Join fieldguide as a Senior Site Reliability Engineer, where you will play a pivotal role in ensuring the reliability and performance of our systems. You will collaborate with a talented team to design, implement, and maintain infrastructure solutions that are robust and scalable.Your expertise in both software development and systems engineering will be essential to enhancing our operational frameworks. This position allows for both on-site work in San Francisco and remote working opportunities across the United States.

Apr 30, 2026
Apply
Fieldguide logo
Full-time|Remote|San Francisco, CA or Remote (USA)

About UsAt Fieldguide, we are pioneering a new era of trust in global commerce and capital markets by automating and simplifying the processes for assurance and audit professionals, particularly in the domains of cybersecurity, privacy, and financial auditing. Our mission is straightforward: we develop innovative software for those who establish trust between businesses.Headquartered in San Francisco, CA, we embrace a remote-first culture, empowering our team members to excel from any location across the United States. We are proud to be backed by esteemed investors such as Growth Equity at Goldman Sachs Alternatives, Bessemer Venture Partners, 8VC, Floodgate, Y Combinator, DNX Ventures, Global Founders Capital, Justin Kan, Elad Gil, among others.Diversity is at the core of our values — we believe that various backgrounds and experiences contribute to building the future of audit and advisory. Our inclusive, driven, humble, and supportive team is intentional about creating a culture that fosters collective growth. As an early-stage startup team member, you will have the chance to shape the future of business trust, making the lives of audit practitioners easier by consolidating up to 50% of their workload and enhancing their work-life balance. If you resonate with our values and are passionate about cultivating a fantastic culture and product, we welcome you to join Fieldguide.About the Role:As an Implementation Analyst at Fieldguide, you will play a pivotal role within our go-to-market team, collaborating closely with Implementation Consultants and the broader Go-To-Market organization to facilitate successful customer adoption of our platform.Reporting to the Senior Manager of Implementation, you will be hands-on in onboarding customers to Fieldguide — configuring their environments, migrating data, and ensuring every detail is meticulously addressed so that customers can quickly derive value from our solutions. You will leverage Agentic-powered tools that are revolutionizing how audit and advisory firms function, providing you with direct insight into the future of this industry.This early-career role is designed to serve as a springboard into implementation consulting. As you develop deeper product knowledge and foster client relationships, you will transition into an Implementation Consultant position with greater responsibility for the complete customer journey.What You’ll Do:Set customers up for success. Configure customer environments, including engagement templates, user roles, and permissions, to ensure a seamless day-one experience...

Apr 8, 2026
Apply
Fieldguide logo
Full-time|Remote|San Francisco, CA or Remote (USA)

Fieldguide seeks a Director of Implementation (Audit) to lead the rollout of audit solutions. This position plays a key role in shaping how products reach clients, focusing on delivering each implementation with a high level of quality and effectiveness. Key responsibilities Direct the deployment of audit solutions for clients, ensuring projects meet established standards Collaborate with cross-functional teams to refine and improve service delivery processes Establish and uphold rigorous standards for project quality and successful outcomes Location This position is available to candidates based in San Francisco, CA or working remotely within the United States.

Apr 24, 2026
Apply
Fieldguide logo
Full-time|Remote|San Francisco, CA or Remote (USA)

Fieldguide develops software for assurance and audit professionals, with a focus on cybersecurity, privacy, and financial audits. The company’s mission centers on making it easier for those who help businesses build trust. Headquartered in San Francisco, Fieldguide supports a remote-first workforce across the United States. Investors include Goldman Sachs Alternatives, Bessemer Venture Partners, 8VC, Floodgate, Y Combinator, DNX Ventures, Global Founders Capital, Justin Kan, and Elad Gil. Fieldguide values diversity and believes that a range of backgrounds and experiences strengthens both product and culture. The team fosters a supportive, humble, and growth-oriented workplace, emphasizing inclusion and collaboration. As an early-stage startup, Fieldguide gives employees the opportunity to shape the future of business trust. The platform is designed to streamline up to half of an audit practitioner’s workload, aiming to improve work-life balance. People who care about building strong teams and meaningful products will find a sense of purpose here. Role overview The Deployed Software Engineer role blends hands-on engineering with leadership and technical strategy. Fieldguide matches the level and scope of responsibility to each candidate’s experience during the interview process. This position offers the chance to help modernize audit and accounting tools in a company with strong product-market fit and ambitious growth plans. The team is committed to challenging outdated industry software and driving meaningful change. What you will do Design, build, and deliver features that create value for customers and the business. Work directly with customers to understand their needs and tackle important challenges. Collaborate with product, design, and go-to-market teams to turn complex problems into intuitive solutions. Balance quick iteration with long-term system reliability and maintainability. Continuously improve the technology stack, developer workflows, and reliability practices. Location This position is open to candidates based in San Francisco, CA or working remotely within the United States.

Apr 20, 2026
Apply
Hilbert logo
Full-time|On-site|San Francisco

Join Hilbert, a pioneering data science-first growth engine that empowers B2C teams with predictive insights into user behavior, revenue drivers, and strategies for sustainable growth. Our innovative platform transforms lengthy decision cycles into actionable insights within minutes.From Fortune 10 corporations to cherished brands such as FreshDirect, Blank Street, and Levain Bakery, teams leverage Hilbert for their growth strategies. We're also collaborating with leading AI organizations to push the boundaries of data science.We are in search of a Lead Data Scientist who possesses a systems thinking approach, deeply understands B2C business challenges, and is capable of developing the models and analyses that drive tangible growth outcomes for major consumer companies — all with the passion and urgency akin to that of a founder.This role transcends the traditional boundaries of data science; you will oversee the entire data science function — from defining problems to model development and measuring business impact — all while working with enterprise clients where feedback loops are rapid and outcomes are critical. If you can articulate the significance of a recommender system to a retailer's profit and loss statement, design adaptable machine learning solutions for various customers, and communicate causal impacts clearly to executives, we want to connect with you.THE ROLEYou will collaborate closely with the founding team, as well as the engineering, product, and go-to-market teams to define, develop, and enhance the data science systems central to Hilbert's operations. Expect to be hands-on daily — building models, conducting analyses, and exploring data — while also establishing scientific direction, ensuring rigor, and expanding the team. Our focus is exclusively on B2C; the challenges we tackle — demand forecasting, customer lifecycle management, personalization, and activation — necessitate a deep understanding of these domains and the ability to translate business context into model architecture decisions. You will thrive in an environment characterized by high autonomy and ambiguity, where data can often be incomplete, messy, or limited.What you'll do:Build — hands-on, every dayDesign and construct machine learning models that drive essential product functionalities: recommendation engines, search relevance, customer segmentation, demand forecasting, and activation strategies.Create configurable, multi-tenant model architectures that can adjust to varying customer contexts, data availability, and business needs without the necessity for complete redevelopment.Develop effective models using available data — focusing on extracting insights from limited, noisy, or sparse datasets.

Feb 26, 2026
Apply
Databricks logo
Full-time|$192K/yr - $260K/yr|On-site|San Francisco, California

At Databricks, our passion lies in empowering data teams to tackle some of the most challenging problems globally, from detecting security threats to advancing cancer drug development. We achieve this by creating and operating the world's premier Data Intelligence Platform, enabling our customers to concentrate on high-value challenges central to their missions. Founded in 2013 by the original creators of Apache Spark, Databricks has evolved from a small office in Berkeley, California, into a global powerhouse with over 1000 employees. We are trusted by thousands of organizations—from small startups to Fortune 100 companies—with their mission-critical workloads, establishing us as one of the fastest-growing SaaS companies worldwide. Our engineering teams are dedicated to developing highly technical products that address real-world needs. We continuously push the limits of data and AI technology while maintaining the resilience, security, and scalability crucial for our customers' success on our platform. Operating one of the largest-scale software platforms, our infrastructure comprises millions of virtual machines that generate terabytes of logs and process exabytes of data daily. At this scale, we routinely encounter cloud hardware, network, and operating system faults, and our software is designed to shield our customers from such issues effectively. As a Data Scientist on the Data Team, you will play a pivotal role in fostering a data-driven culture within Databricks by addressing product and business challenges. The Data Team serves as an internal, production 'customer' that utilizes Databricks and influences the future trajectory of our products. Your Impact: Steer the direction of key data science initiatives including segmentation, recommendation systems, forecasting, product analytics, churn prediction, and insights. Collaborate closely with Engineering, Product Management, Sales, and Customer Success to discern product usage patterns and trends, facilitating data-driven decisions, recommendations, and forecasts. Manage stakeholder expectations in your focus area—gather evolving requirements, define project OKRs and milestones, and effectively communicate progress and results to non-technical audiences. Mentor and support junior data scientists within the team, assisting with project planning, technical decisions, and conducting code and documentation reviews. Advocate for the data science discipline across the organization, amplifying our commitment to becoming more data-driven. Develop self-service internal data products to simplify data access within the company. Represent Databricks at academic and industry conferences and events.

Jan 30, 2026
Apply
Semgrep Inc. logo
Full-time|$125K/yr - $147K/yr|On-site|San Francisco Office

About SemgrepSemgrep stands at the forefront of code security, empowering developers to innovate seamlessly. Our platform enables teams to identify, flag, and resolve genuine security issues before deployment, supported by an adaptive security system that evolves with their development process. Semgrep not only safeguards code in real-time but also offers essential guardrails that facilitate rapid development without compromising security. Trusted by top organizations like Snowflake, Dropbox, and Figma, we are recognized by Gartner for excellence in Application Security Testing. To learn more about our mission, visit semgrep.dev.Founded in San Francisco and backed by prominent investors including Menlo Ventures and Sequoia Capital, Semgrep is dedicated to continuous improvement, ensuring that our AI-driven system minimizes false positives and prioritizes actionable vulnerabilities. Experience the future of secure coding with us.

Feb 19, 2026
Apply
SoFi Technologies, Inc. logo
Full-time|On-site|CA - San Francisco

Join SoFi, a leading personal finance company, as a Data Scientist where you will leverage data to drive strategic decisions and enhance user experiences. In this role, you will analyze complex datasets, build predictive models, and collaborate with cross-functional teams to propel our innovative solutions forward.

Mar 27, 2026
Apply
Mercor logo
Full-time|On-site|San Francisco

Join Mercor as a Data ScientistAt Mercor, we stand at the forefront of labor markets and artificial intelligence research. We collaborate with top-tier AI laboratories and businesses to infuse the human intelligence crucial for the evolution of AI.Our expansive talent network empowers frontier AI models, mirroring the way educators impart knowledge: sharing insights and experiences that transcend mere coding. Currently, our network boasts over 30,000 experts generating more than $2 million daily.We are pioneering a new work paradigm where specialized expertise drives AI progress. To realize this vision, we seek a dynamic, fast-paced, and dedicated team. You will collaborate with leading researchers, operators, and AI companies, playing a pivotal role in the systems that are reshaping society.As a profitable Series C company, Mercor is valued at $10 billion and operates from our new headquarters in San Francisco with an in-office work schedule five days a week.Your RoleIn your first year, you will implement analyses and experiments that enhance key product metrics, including match quality, time-to-hire, candidate experience, and revenue. Your responsibilities will include:Establishing north-star and feature-specific metrics for our ranking systems, interview analytics, and payout frameworks.Designing and executing A/B tests and quasi-experiments, translating results into product decisions within the same week.Creating source-of-truth dashboards and streamlined data models to enable teams to self-serve answers.Collaborating with engineers to instrument events, enhancing data quality and latency from ingestion to insights.Rapidly prototyping models (from baseline models to gradient boosting) to optimize matching and scoring.Assisting in the evaluation of LLM-powered agents through the design of rubrics, human-in-the-loop studies, and guardrail mechanisms.What Makes You a Great FitYou possess strong foundational skills in statistics, SQL, and Python, alongside projects you are eager to showcase. You adapt swiftly, frame inquiries, test hypotheses, and deliver results within a day, valuing clarity in communication as much as statistical significance. A keen interest in LLM evaluation, retrieval, and ranking is a plus; you will learn alongside professionals from renowned firms such as Jane Street, Citadel, Databricks, and Stripe.

Aug 30, 2025
Apply
Superhuman logo
Full-time|$225K/yr - $275K/yr|Hybrid|San Francisco, CA

Superhuman offers an engaging hybrid working model for this role. This flexible approach provides team members with a balance of focused time and in-person collaboration, fostering trust, innovation, and a vibrant team culture. Team members for this position must reside in the San Francisco Bay Area. About SuperhumanSuperhuman, now part of Grammarly, is an AI productivity platform dedicated to unlocking extraordinary potential in everyone. Our suite of apps and agents seamlessly integrates AI into over 1 million applications and websites, including Grammarly’s writing assistance, Coda’s collaborative workspaces, Mail’s inbox management, and Go, the proactive AI assistant that comprehends context and provides automatic assistance. Founded in 2009, Superhuman empowers more than 40 million individuals, 50,000 organizations, and 3,000 educational institutions worldwide to eliminate busywork and concentrate on what truly matters. Discover more at superhuman.com and explore our values here.The OpportunityTo achieve our ambitious goals, we are seeking skilled Data Scientists to join our Product and Growth Data Science teams. At Superhuman, data teams are regarded as trusted experts who reveal new insights to shape marketing, product, and growth strategies that drive significant outcomes across the organization. We have access to large datasets and are looking for individuals with exceptional technical and analytical abilities who can dissect complex business challenges and offer solutions that deliver high visibility and impactful results for the company.Growth Data Scientists collaborate closely with product, engineering, growth, and/or lifecycle marketing teams. They are tasked with designing and executing product experiments, as well as performing intricate analyses to guide product strategy through advanced analytics and machine learning. The ideal candidate will possess a proven record of delivering significant analytical projects within the Growth domain and working alongside cross-functional colleagues to influence decision-making.Product Data Scientists are responsible for assessing the quality of the Superhuman products, ensuring that our offerings are not only innovative but also user-friendly and effective.

Mar 17, 2026
Apply
Mindlance logo
Full-time|On-site|San Francisco

Join Mindlance as a Data Scientist, where you will leverage data to drive insights and support decision-making processes. You will be responsible for analyzing complex datasets, developing predictive models, and collaborating with cross-functional teams to enhance business strategies.

Nov 14, 2016
Apply
Metriport logo
Full-time|Remote|San Francisco

Join Metriport as a Data Scientist and be at the forefront of data-driven decision making! In this role, you will leverage advanced analytics and machine learning techniques to derive insights from complex datasets, enhancing our product offerings and driving strategic initiatives.

Mar 20, 2026
Apply
Crusoe logo
Full-time|On-site|San Francisco, CA - US

Join Crusoe as a Senior Data Scientist, where you'll leverage data to drive impactful decisions and innovations. As a part of our dynamic team, you will play a crucial role in analyzing complex datasets, developing predictive models, and contributing to our mission of transforming data into actionable insights. You will collaborate with cross-functional teams to enhance our products and services, ensuring they align with our strategic goals.

Mar 11, 2026
Apply
Grindr LLC logo
Full-time|Hybrid|San Francisco

Join Grindr as a Staff Data Scientist in a hybrid role based out of our San Francisco, Los Angeles, or Chicago offices, requiring in-office presence on Tuesdays and Thursdays.Why This Role is Unique?Grindr (NYSE: GRND) is the world’s largest LGBTQ+ social application, boasting over 14 million monthly users globally. We are not just a platform but a vital part of the LGBTQ+ community and a cornerstone of gay culture.As a Staff Data Scientist, you will collaborate closely with product managers, designers, and engineers to create insightful metrics that drive product development. You will design and implement innovative experiments, present data-driven insights for decision-making, and explore new growth strategies through comprehensive analysis. This role allows you to work on deployed models that enhance the user experience for millions, while becoming an informal ambassador for the Data Science team, educating others on effective data utilization.You will be part of a dynamic data organization at Grindr that integrates data scientists, data engineers, and ML/AI engineers into a united and collaborative team. This is a unique opportunity to learn, share knowledge, and make a significant impact alongside industry leaders.Your ResponsibilitiesExtract actionable insights from complex, open-ended queries.Design and assess experiments to evaluate the impact of product changes.Analyze product data to identify root causes behind metric fluctuations.Effectively communicate findings to cross-functional stakeholders to inform product strategies.Develop tools to scale and automate analyses, enhancing company productivity.Mentor and guide team members, recommending best practices.Apply an engineering mindset to reduce complexity while maximizing utility and maintainability.Contribute to the development of future ML solutions to enhance recommendations, detect spam, and better serve our users.

May 13, 2025
Apply
Aircall logo
Full-time|On-site|San Francisco Office

Join Aircall, the leading integrated customer communications and intelligence platform that empowers growing businesses worldwide. With a trusted network of over 20,000 companies, Aircall seamlessly integrates voice and digital channels into one powerful platform, featuring one-click connections with top CRMs and over 100 essential business tools. Our AI-driven insights and automation capabilities enable sales and support teams to optimize their time, discover new opportunities, and deliver outstanding customer experiences. With a diverse global team of over 600 in cities such as Paris, New York, San Francisco, and more, Aircall is revolutionizing the way businesses engage with their customers, fostering deeper relationships and driving measurable success.Our Work Culture: At Aircall, we prioritize customer satisfaction, continuous learning, and extraordinary results. We encourage open collaboration, ownership, and swift, informed decision-making. If you thrive in a dynamic, team-oriented environment where curiosity, trust, and impact are key, you will fit right in.About Our TeamThe Data team at Aircall is integral to our decision-making process, driving innovation and growth through advanced data solutions, tools, and actionable insights.Role OverviewAs a Senior Data Scientist, you will play a crucial role in providing insights to our product and business teams, collaborating closely with product and engineering leaders. You will support key initiatives such as pricing strategies, packaging, multi-product approaches, and self-service solutions. Additionally, you will work cross-functionally with Engineering, Sales, Finance, Marketing, and Customer Relations to translate data requirements into impactful solutions. Your contributions will also include establishing best practices in analytics and mentoring team members to uphold high standards in data governance and insight generation.

Jul 19, 2024
Apply
Sciforium logo
Full-time|On-site|San Francisco

Sciforium is an innovative AI infrastructure company specializing in the development of state-of-the-art multimodal AI models, alongside a proprietary high-efficiency serving platform. With substantial financial backing and direct collaboration with AMD, including hands-on support from AMD engineers, our rapidly growing team is dedicated to building the comprehensive stack that powers cutting-edge AI models and real-time applications.Role OverviewWe are on the lookout for a highly skilled and visionary Data Scientist to spearhead the strategy and creation of vast datasets essential for our foundational models. In the realm of Large Language Models (LLMs), we recognize that data is the key competitive advantage. This role will encompass the entire data lifecycle—from extensive web-scale crawling to the meticulous creation of human-aligned datasets that dictate model behavior.The ideal candidate will embrace data as both a large-scale engineering challenge and a complex analytical puzzle. Your responsibilities will extend beyond simply delivering data; you will design taxonomies, filtering heuristics, and post-training pipelines to ensure our models excel in reasoning, safety, and multimodal comprehension.Key ResponsibilitiesFoundation Dataset Strategy: Oversee the comprehensive creation of pre-training datasets for LLMs, defining the optimal mix of web data, code, literature, and technical documents to enhance downstream model performance.Petabyte-Scale Curation: Innovate and implement advanced pipelines for data cleaning, deduplication (exact and fuzzy), and high-quality signal extraction from vast amounts of unstructured data.Post-Training & Alignment Data: Direct the creation of high-quality post-training datasets, including Supervised Fine-Tuning (SFT) instructions, multi-turn dialogues, and preference modeling data (RLHF/DPO).Multimodal Expansion: Lead the acquisition and processing of vision and video data, addressing the challenges of multimodal alignment, video compression, and temporal data consistency.High-Performance Engineering: Create high-throughput data processing scripts utilizing Python, employing multiprocessing and multithreading to manage large-scale ingestion and transformation without performance bottlenecks.Data Profiling & Analysis: Perform in-depth statistical analysis on training datasets to uncover biases, knowledge gaps, and quality regressions, ensuring a mathematically balanced model diet.Synthetic Data Generation: (Added Value) Develop pipelines to generate high-quality synthetic datasets that enhance model training and capabilities.

Jan 7, 2026

Sign in to browse more jobs

Create account — see all 10,795 results

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.