About the job
About DatologyAI
At DatologyAI, we understand that models are only as good as the data they are trained on. A significant portion of training compute is often wasted on data that is redundant, irrelevant, or even detrimental, leading to subpar models that incur higher training and deployment costs. Our innovative data curation suite automates the curation and optimization of petabytes of data, ensuring that your models receive the best training data possible.
Training on our curated data can lead to an astonishing reduction in training time and costs, making it possible to achieve 7-40x faster training depending on the specific use case. Moreover, it can enhance model performance as if you had utilized over 10x more raw data without escalating training costs. Our approach facilitates smaller models with fewer than half the parameters to outperform larger counterparts while using significantly less compute during inference, ultimately lowering deployment costs. For further insights, explore our recent blog posts on text models and image-text models.
With a total of $57.5 million raised across two funding rounds, including Seed and Series A, our esteemed investors feature Felicis Ventures, Radical Ventures, Amplify Partners, Microsoft, Amazon, and AI pioneers like Geoff Hinton, Yann LeCun, and Jeff Dean. Our team stands at the forefront of this research domain, equipped with deep expertise in both data research and engineering, simplifying the data curation process for anyone aiming to train their model on their data.
This position is located in Redwood City, CA, where we maintain an in-office presence four days per week.

