Introduction
One of the fundamental challenges in software testing is ensuring that requirements are written clearly and completely. This is true for any project, even those without AI.
Now, in a project that incorporates AI, the difference is that things can flow surprisingly quickly and effectively. On the testing side, we have transitioned from writing test cases from scratch, with all their fields, a tedious and slow step-by-step process, to writing them automatically, quickly, and efficiently. The same is true for the role of writing user stories.
Nevertheless, there are obstacles to overcome, as the case of the requirement project is ongoing, which implies the need for versatility in being updated throughout the development life cycle. This is something testers do personally, checking test cases written from time to time and updating them according to requirements updates.
Another equally important point is that we need to review the test cases written by AI as soon as they are created, verifying that they are valid, not repeated, complete, correcting and eliminating errors. We should also add any missing elements.
Some advantages of using an AI-powered test management tool include its ability to identify, based on test runs, which tests are good candidates for automation. It provides us with instant metrics without requiring the application of filters, making queries, or even adding gadgets to a Jira dashboard. In this case, we are referring to QASE, a test management tool integrated into the project management tool Jira, which we use in Qubika’s QA studio.
For more information about QASE, official documentation is provided here: https://docs.qase.io/
The project QAP “Qubika Agentic Platform”
Qubika Agentic Factory is one of the core pillars of Qubika, and enables Qubika’s customers to create and deploy applications and platforms that utilise AI: agents, automated bots, chatbots, and AI applications. Alongside this, our team has built the Qubika Agentic Platform, a maintainable, extensible, high-performance platform/accelerator for building and running enterprise AI agents for our customers.
The project’s general challenges and the technical challenges
In real life, and taking into account what I said earlier, here is a list of challenges I experienced and how they were overcome:
- Although the test cases were already prepared in the testing repository, the next step was to execute them. To ensure proper validation, I collaborated with the backend team, who recommended proceeding by testing the services directly. With their support, I obtained the project’s API collection and successfully validated the service responses using Postman.
- This project focuses on creating artificial intelligence agents and agent workflows that can be utilised and reused in various areas. Challenges arise, such as understanding the use of each agent, the architecture and how the agents and services were orchestrated. This includes determining which database is used, which data is persisted and when, and which is not, along with the reasons to test it appropriately. Therefore, in each instance of the project, demonstrations, talks, and explanations from experts were carefully considered to formulate the test cases. It was also necessary to study new and challenging topics related to artificial intelligence in advance, such as:
- What is RAG (retrieval-augmented generation)?
- What is a trace?
- How is a trace measured?
- What is the cost of using an AI agent?
- What is an LLM (large language model) and what it is used for?
- Why do we measure latency? Why is it important? What does this metric tell us?
- What is NLP (Natural Language Processing)? How does it work in AI?
- What is the tokenisation process, and how is it used?
- What are a judge and an evaluator in artificial intelligence?
- How are these concepts applied in the different modules of the QAP system?
Specific challenges
Below are some examples of concrete, hands-on challenges during the project:
- Testing backend-only RAG Agent implementations by validating endpoints and verifying agent responses consumed by Evals on the frontend.
- Validating Slack integration, where an agent listens and responds to a bot within a Slack app. This involved testing the “Slack Trigger” node, its connection to the workflow, and the “Slack Send Message” node.
- Understanding QASE and Jira’s automated behaviour after the AI upgrade.
- One of the possibilities Qase offers is to create defects when the tester fails a test case. Here, two situations have been presented:
- You can fail just one step of a test case and raise a defect, or fail the entire test case and report the defect immediately from that point.
- You can report the bug from Qase or from Jira. They are different; the defects in Qase have an ID that differs from those in Jira. You can have both, but it is unnecessary and duplicated.
- Regarding the issues on the board: when a test case fails, Qase automatically fails the Jira ticket by changing columns on the board, updating the ticket status from “QA in progress” to “Dev in progress”. This was an undesired behaviour of the tool. For future test failures, it was decided to control them.
Be careful and pay attention to what you decide to do the first time to keep coherence and traceability.
To avoid confusion, from a testing perspective, it was decided not to raise defects from Qase, as this would result in the same bugs with different IDs in both Qase and Jira. Instead, create and track them in Jira, as Jira maintains a consistent prefix for issues across the entire project.
Conclusion: Improving QA through human + AI collaboration
The integration of AI into software testing has transformed the way we design, execute, and manage quality assurance processes. While traditional challenges, such as maintaining accurate, evolving requirements, still remain, AI tools introduce new levels of efficiency, automation, and insight that allow testers to work smarter and faster.
As seen in the testing and development of the Qubika Agentic Platform, the speed of innovation in AI systems also demands significant adaptability, the need to understand complex AI concepts, keeping test artifacts aligned with constant updates, and validating AI-generated test cases with human expertise.



