Přehled
Job ad description
The OpenEuroLLM project is a landmark pan-European initiative funded under the EU’s Digital Europe Programme with a €37.4M investment, uniting 20 leading research institutions, companies, and EuroHPC supercomputing centres to build the first comprehensive, open-source family of large language models that cover all official EU languages and reflect European values of transparency, openness and regulatory compliance.
We are seeking an experienced NLP/AI researcher for a Software Developer role for working on very large-scale data for preparing them for pre-training from the regulatory compliance point of view. The main focus is on identifying personal data and marking and filtering the large datasets accordingly, by improving on existing open source solutions or building new, more accurate and efficient tools.
Main Responsibilities
- Work with the Workpackage 3 leaders (University of Oslo, Norway; PrompsIt, Spain) to develop the necessary tools and integrate them to their existing data preprocessing pipeline
- Lead a small team with other project partners working on the same topic of identifying PII
- Adherence to the Open Source approach
- Adherence to established evaluation methods for the performance of developed tools
- Working with Production-level focus on the development, including proper documentation
Collaboration and Reporting
- This role reports to Jan Hajič (project PI) and is employed by Charles University
- The role will be based in Prague or potentially at one of the consortium partner sites (preferably Oslo, Norway or Elche, Spain)
- The closest technical collaboration partners will be the partners of Task 3.4 within WP3, and the leaders of WP3
Ideal Candidate Profile
- Proven experience in Natural Language Processing using Machine Learning and Artificial Neural Networks, for at least 3-5 years
- Technical fluency to understand LLM development and other project dependencies, and communicate effectively with the team partners
- Excellent written and verbal communication skills in English
- Bonus: Experience with EU or other publicly funded research projects
- Bonus: Experience working on AI model development projects
What we offer
- Visible position in a closely watched large, strategic European project in the AI and LLM area
- If moved to Prague, workspace in the historical center of Prague, within an inspiring language technology and AI group of 100+ researchers at the Institute of Formal and Applied Linguistics, School of Computer Science, Charles University.
- Competitive salary based on previous experience (negotiable): salary range 60000 – 120000 CZK gross, based on education and experience
- University employment benefits (contribution to pension fund(s), private travel insurance, special mobile tariffs including family, lunch subsidies etc.)
- The contract is for two years, ending at the end of February 2028 unless the project is extended.
Start: immediately or as negotiated
Applications until: March 25
For informal questions, please contact hajic@ufal.mff.cuni.cz.
Please send your CV and a short cover letter to: ufal@ufal.mff.cuni.cz.