About the job
About Us:
TransPerfect is a renowned leader in the realm of translation software, embodying a lively start-up culture. We are on the lookout for an innovative and enthusiastic Backend Developer to enhance our cutting-edge Artificial Intelligence (AI) team. Join us in shaping the future of AI at a global scale. With over a decade of experience, our AI team has become essential to advancing machine translation, generative AI, natural language processing, and automation.
We seek a skilled backend developer who is eager to push technological boundaries and make a significant impact in the AI domain. You will collaborate with a diverse team of professionals across the USA, Spain, Portugal, and India. If you are passionate about creating robust and scalable solutions that bring AI to life for users, this opportunity is for you.
About The Role:
In this position, you will tackle the complexities of document processing, specifically converting intricate, unstructured PDFs into well-formatted, editable .docx files. Your mission is to not only extract text but to faithfully reproduce the visual and structural intent of the original documents, encompassing nested tables, multi-column layouts, font hierarchies, and styling.
Your responsibilities will include:
Comparative Analysis: Conduct a thorough evaluation of commercial solutions (ABBYY, Adobe, AWS Textract) versus open-source/AI-native tools (Mistral OCR, Docling, Nougat, LlamaParse).
Benchmarking: Develop metrics for “format fidelity” to objectively assess how well a tool recreates headers, footers, tables, and styles.
Pipeline Development: Create a Python-based workflow that integrates OCR engines with document generation libraries, such as python-docx or Pandoc.

