About the job
At Upstage, we are dedicated to our vision of "Making AI Beneficial" and our mission of "Building Intelligence for the Future of Work." We are developing next-generation AI solutions based on Vision-Language Models (VLM) that go beyond merely reading text to comprehensively understand visual information such as images, charts, and tables. By extracting hidden insights from vast document datasets, we empower our clients to realize new opportunities and added value. Our VLM team is engaged in research and development focused on web-scale data collection and synthesis, large-scale pre-training and post-training, and various assessment methodologies.
Upstage aims to provide user-friendly AI solutions that enable anyone to leverage AI technology effortlessly. We already possess top-tier OCR technology and advanced Key-Value extraction techniques to automatically extract meaningful information from documents. Recently, we launched a Document Parsing model that analyzes various document layouts. Building on these technologies, Upstage strives to deliver customized AI solutions that maximize business efficiency and productivity, ensuring that AI creates significant value in real-world applications.
Additionally, we offer Private LLM services optimized for business environments, enhancing operational efficiency and productivity. We are committed to making AI beneficial by launching a series of APIs that allow easy access to world-class AI models across various fields, contributing to our clients' business success. Among these, Upstage Document AI stands out for its superior OCR and information extraction capabilities, aiming to automate and streamline tedious document processes.
We are looking for a new member to join us on this exciting and challenging journey. If you have a passion for leading technology in the multimodal AI field and are eager to connect research with actual services in an End-to-End AI experience, while expanding technology through collaboration and rapidly growing in the productization process, you will be a perfect fit for the Upstage VLM team.
Key Responsibilities
Design and build data collection pipelines
Includes the collection and filtering of multimodal data (document images, field photos, charts, etc.)
Research and apply preprocessing and enhancement techniques to improve data quality
Model Training
Research and implement pre-training and post-training of large-scale vision encoders and vision language models
Develop and apply data and learning strategies for various vision-language tasks
Research model architecture improvements and optimization techniques considering training and inference efficiency
Evaluation
Investigate and apply various evaluation techniques to assess the performance of document-centric VLM models
Develop and introduce new evaluation methods that align with real-world usage environments
Design and implement internal benchmark tools for continuous improvement and scalability
Other Responsibilities
Share research results through publications in top-tier international conferences or as open-source code
Lead preliminary research for reproducing and implementing latest techniques, sharing knowledge within the team
Collaborate closely with product teams, MLOps teams, etc., for the application of models in real services and system integration
