Automation QA Engineer

Ciklum

Spain Full time Automation QA
Apply on EasyApply

Create a free account to apply in seconds

Ciklum is looking for an Automation QA Engineer to join our team full-time in Spain.

We are a custom product engineering company that supports both multinational organizations and scaling startups to solve their most complex business challenges. With a global team of over 4,000 highly skilled developers, consultants, analysts and product owners, we engineer technology that redefines industries and shapes the way people live.

About the role:

As an Automation QA Engineer, become a part of a cross-functional development team engineering experiences of tomorrow.

The Project: We are partnering with B&R Industrial Automation to significantly upgrade the Retrieval-Augmented Generation (RAG) architecture of their AS client's system. Our goal is to drastically reduce AI hallucinations in code generation and optimize retrieval latency without re-architecting their existing platform.

Responsibilities:

• Own the evaluation lifecycle, offline acceptance testing, and KPI measurement for the AS client's RAG pipeline

• Lead the co-creation and management of the project's "golden dataset" to consistently benchmark AI performance

• Implement and manage the RAGAS evaluation harness and automated CI/CD regression testing

• Track, classify, and build root-cause taxonomies for LLM hallucinations, with a specialized focus on code-generation correctness

• Golden Dataset & Baselines: Collaborate with client domain experts and technical leads to build a robust synthetic test set (~90+ queries across multiple categories) and establish baseline metrics for Faithfulness, Context Precision, and Answer Relevance

• Evaluation Harness: Build and automate evaluation pipelines using RAGAS and custom Python scripts, enabling A/B comparisons between the baseline, MVP, and full implementation

• Regression & CI/CD Guardrails: Implement automated CI/CD regression checks within Azure DevOps, ensuring that a >5% drop in core metrics automatically blocks pipeline deployments

• Hallucination Tracking: Develop a root-cause taxonomy for hallucinations and track code-generation queries separately to ensure the AI generates functionally correct and compilable output

• Performance Benchmarking: Measure and monitor pipeline latency, rigorously validating P95 latency targets (sub-4.5s) under representative concurrent load

Requirements:

• Background: Mid-to-Senior level experience in Data Science, Machine Learning Evaluation, AI Quality Assurance, or Data Engineering

• Evaluation Frameworks: Deep, hands-on experience with LLM evaluation frameworks (e.g., RAGAS, DeepEval, TruLens) and establishing human-anchored or synthetic benchmarks

• Technical Stack: Strong proficiency in Python. Solid experience with CI/CD tools (especially Azure DevOps) and integrating complex test suites into automated deployment pipelines

• Data & Observability: Experience working with databases (PostgreSQL) and integrating custom telemetry or observability data (e.g., Azure App Insights) into evaluation reports

• Analytical Mindset: Strong attention to detail with the ability to perform rigorous error analysis, build structured taxonomies for failures, and identify embedding drift

Personal skills:

• Highly collaborative and data-driven; comfortable working directly with client SMEs to validate queries and presenting evaluation scorecards to guide engineering decisions

What`s in it for you?

• Care: your mental and physical health is our priority. We ensure comprehensive company-paid medical insurance and 4 additional undocumented sick leave days

• Tailored education path: boost your skills and knowledge with our regular internal events (meetups, conferences, workshops), Udemy license, language courses and company-paid certifications

• Growth environment: share your experience and level up your expertise with a community of skilled professionals, locally and globally

• Flexibility: Own your schedule – you are the one to decide when to start your working day. Just don’t miss your regular team stand-up. We are there to support your work-life balance and provide 23 vacation days & short Fridays

• Opportunities: we value our specialists and always find the best options for them. Our Internal Mobility Program helps change a project if needed to help you grow, excel professionally and fulfill your potential

• Global impact: work on large-scale projects that redefine industries with international and fast-growing clients

• Welcoming environment: feel empowered with a friendly team, open-door policy, informal atmosphere within the company and regular team-building events

About us:

At Ciklum, we are always exploring innovations, empowering each other to achieve more, and engineering solutions that matter. With us, you’ll work with cutting-edge technologies, contribute to impactful projects, and be part of a One Team culture that values collaboration and progress.

Based in Málaga, our team thrives in one of Andalusia’s leading tech hubs. Enjoy a hybrid work setup, the sunny Mediterranean vibe, and endless opportunities to grow your skills on global-scale projects.

Want to learn more about us? Follow us on Instagram, Facebook, LinkedIn.

Explore, empower, engineer with Ciklum!

Interested already? We would love to get to know you! Submit your application. We can’t wait to see you at Ciklum.

#LI-AV3

Skills

PythonCI/CDAzure DevOpsData ScienceMachine Learning EvaluationLLM Evaluation FrameworksPostgreSQLAnalytical MindsetCollaborationAttention to Detail