Back to Insights

March 13, 2025

Devin AI: A real-life review of an autonomous AI coding agent

Devin AI is an autonomous AI agent designed to aid in coding tasks, but its performance can vary depending on task complexity. This post explores its advantages, limitations, and the testing process, providing insight into its strengths and weaknesses in real-world scenarios.

Devin AI

What is Devin AI? Exploring the features and workflow of an autonomous coding agent

Devin AI is an autonomous AI coding agent that works in its own environment. This environment is a virtual machine that has:

  • Browser interface – it uses it to test UI locally or check out tools documentation
  • Its VSCode instance
  • A terminal – uses it to run all their commands
  • A planner – where the main instructions are internally defined and executed

It also has integrations with Slack and GitHub for a more complete workflow.

Note: The Slack integration was not tested, and only the chat interface provided on the devin.ai site was used.

The difference with other tools like Cursor or Copilot is that it is not an IDE nor a VSCode extension and does not act like one. Devin does provide a VSCode extension, but it is just a chat interface, and no code is interacted with in the user’s local environment. The changes in the code would still happen on Devin’s machine.

Testing Devin AI: A case study of implementing a “deals” feature in a CRM system

I tested Devin on one of Qubika’s clients projects with a simple task: adding a deals feature to the site’s CRM.

Initial setup

Using Devin’s Github integration, the repositories for both the backend and frontend projects were connected to Devin, which was prompted to clone the repositories and set up its environment locally.

It managed to set up the local environment after prompting for the environment variables and asking for certain values on some of them.

Task breakdown and planning for the deals page implementation in Devin AI

For the initial prompt, Devin was provided with:

  • The entire schema definition of the Deal-related entities
  • An image of the DB schema

During its planning, it was reminded of some ongoing work that was not merged but related (company). Thanks to the Github integration it only needed the PR numbers to be able to access them.

It did find the task too large and recommended that it be separated into different sessions:

This was good on one hand as it showed enough understanding of the task to break it down into reasonable smaller tasks. However, this kind of task separation ended up being a problem due to the sessions in Devin not sharing the same context, so parallelization of the task would not work here.

In less than 10 minutes it had a PR opened for the first point of the breakdown plan.

The PR failed its CI checks, and it made several attempts to fix the linting and typescript errors.
It also recognized some typescript errors unrelated to the changes and asked for guidance on what to do with them, as they were outside the scope of the task.

As it proposed to separate the task into other sessions, another session was created for another part of the breakdown. In this case, Devin was prompted to create its PR with the PR created before as its base. It did not have issues doing this. Other Sessions were created for the rest of the tasks.

The PRs created for these were acceptable but contained extra unnecessary changes. When asked if a library was being used elsewhere in the code (Joi), it did not answer truthfully—it told me it was not used in other parts except where it added it, which was not true, causing unnecessary changes later on.
It also assumed NestJS was being used, adding unnecessary libraries as the project was not using the framework.

Challenges faced by Devin in managing PRs and local environments

The main issue when Devin created their PRs was that it did not check for build errors, just CI checks. When tasked with loading its local docker environment, it started failing and kept attempting unrelated fixes. In one attempt, it refactored the authentication methods unprompted because it thought the issue was on the login page.

The last session, which was tasked with making the project run locally, did not finish as it hit session usage limits.

After trying to fix it locally, I found that most of the docker issues came from the changes Devin introduced in the PRs. It made errors like importing a .js file when the file was a typescript file, and also errors on the migrations it created where it did not follow the TypeORM standard naming pattern.

Key advantages of Devin in implementing the deals feature

Devin has successfully shown some key advantages when implementing the Deals feature:

  • It was able to quickly break down the complex task into more manageable parts, showing good project planning capabilities
  • It completed the first part of the implementation in under 10 minutes, demonstrating efficiency for straightforward tasks
  • It showed awareness of existing code context by recognizing the ongoing work related to company entities
  • When faced with CI failures, it made multiple attempts to fix linting and typescript errors
  • It demonstrated good judgment by identifying out-of-scope issues and requesting guidance rather than making assumptions
  • It frequently uses its Knowledge base, the equivalent of “team-wide knowledge” that it checks whenever doing a related task. This knowledge can include everything from which variables to use when running locally to how to format a PR. Devin may also suggest updates or additions to the knowledge base.

Limitations and resource constraints of using Devin

While I saw the advantages of using Devin, there were also several limitations and constraints:

  • Even the docs advise treating Devin as a junior developer. It is not ready to tackle advanced tasks without adding context, modules, resources, and examples that Devin can use as a base. It also mentions that Devin is not suited for heavy visual tasks (like implementing Figma designs)
  • It was not consistent in addressing CR comments. Some of the comments it ignored but didn’t give a reason on why. Some others were treated as addressed but the changes were actually not done. It had to be prodded on
  • Apart from the initial subscription prize, there’s a limit of ACUs that can be used (150ACUs). More ACUs can be added, with the price being 2USD per ACU.
    • ACU is Agent Compute Unit, a measure of the resources Devin uses.
    • In my testing, I used all 150ACUs in less than a week
  • After a conversation consumes 10ACUs, Devin’s performance can degrade, as per the docs.
    • This was verified in many of the sessions worked on, where the ACUs surpassed 10, and the results were not exactly what was expected.
    • If there’s too much back-and-forth debugging an issue or commenting on the generated PR, this value can be reached quickly.

Conclusion: Challenges in Devin’s workflow and user experience

My biggest issue with Devin was the workflow. Not having direct access to the code while Devin works on it makes the back-and-forth much slower and more prone to sending errors on PRs that could have been caught during development. The fact that it does not promote itself as an “assistant” but instead as a “teammate” and that it pushes the Slack functionality as the way to interact with it makes one wonder if the tool’s target users are developers.

Qubika's AI Studio

Qubika's AI Studio specializes in building industry-specific, enterprise-grade AI solutions

Learn More
Avatar photo
Lesly Acuña

By Lesly Acuña

Software Engineer at Qubika

Lesly is a Software Engineer at Qubika with over 13 years of experience, specializing in Node.js, React.js, and modern web technologies. She has been instrumental in delivering innovative software solutions and her team is focused on exploring AI Agents and how they can enhance and streamline Qubika’s projects. Lesly combines a strong technical foundation with a collaborative mindset, making her a key contributor to the team’s success.

News and things that inspire us

Receive regular updates about our latest work

Let’s work together

Get in touch with our experts to review your idea or product, and discuss options for the best approach

Get in touch