PS Content Team: These days, it seems like you can’t open the news without hearing about someone using GPT-4 in some weird new way. People are starting to experiment with using AI to automatically fix code, not just on demand, and rerun the code until it’s fixed. What do you think about this kind of “self-modifying” or “replaying” code? Could it replace your QA process?
Matthias: Well, my immediate thought is that it will work until it explodes! I can imagine someone saying, “We don’t need you to debug this,” but that’s like putting a brick on the gas pedal, which only works until the car falls into a ditch.
If you change the code until the AI runs, you’re only fixing syntax errors that cause it to crash, rather than fixing true bugs. And that’s the main part of software development, digging into the root causes. Syntax errors can help you pinpoint the real error, but it’s also quite possible that you made a mistake elsewhere. After all, what if the AI decides that the best way to solve the problem is to comment out the offending code?
Lars: I think it’s great for fixing actual compilation errors, but it’s limited without context about your business use case. For example, you might run into an error that won’t refund you $1,000 instead of $5, and you might not want that to happen as a business use case, but the AI that’s fixing your code doesn’t know that. There is a possibility.
Matthias: Yeah, that’s a good example. The AI might decide that disabling the error message is the solution and “fix” the code, but it won’t do the right thing. This is what I mean when things are going full blast against a brick wall.
Lars: Especially in software development, there are cases where throwing an error is a way to detect this type of business logic error, and this is a best practice. I get an error like “This number is too large” for the number of students that should be in the class. It acts as a notification for that kind of issue, and just “fixing” it is a problem. I’m sure technology will get to a point where it can accommodate that, but we’re not there yet.
PS Content Team: What if an AI told you what errors were fixed and gave you the option to accept them? What if Wolverine, a recently released program that fixes Python programs, told you what was “fixed”? think. Does the feature you review make a difference? If so, do you use self-modifying code?
Jeremy: Well, I think that might work.
Matthias: If I can get something to review, I might use it. This increases productivity and is different from going into production without reviews. That’s stupid, like allowing an intern to work on your code without review. The key is to have appropriate checks and balances.
Jeremy: Nothing should be put into operation in isolation. Humans shouldn’t edit code in production, but sometimes they do. The same goes for AI, it’s nothing special.
Lars: I absolutely want to use it, but like anything else I can’t let it roam freely in production. I treat this like a code review by a team member. And anything you review helps inform and teach that model, so it has value in improving the model’s ability to help.
PS Content Team: What do you think about “autonomous” AI agents that appear in GPT-4 output, such as AutoGPT and BabyAGI, that repeatedly complete complex tasks? Do you see any business risks or opportunities there? How well does this technology work? Do you think you are mature?
Lars: Automation has always been the holy grail of most software development, and autonomous AI agents are another step in that process. As mentioned earlier, the risk is the lack of context. Unless you feed your model/agent with enough information to understand the nuances and edge cases, autonomy can give you a different output than you really wanted. In my opinion, it’s not quite there yet, but like any other AI, it could get there soon.
PS Content Team: What do you think about prompt engineers, people who write code entirely from ChatGPT? Do you think this will lead to cargo cult programming, where they write code that they don’t understand and have a hard time fixing bugs?
Lars: I am ambivalent. On the other hand, I’m all for technology that gets more people interested in programming and coding. Writing code with an LLM attracts people who want to understand and learn more. On the other hand, you are asking a machine to create another machine, so the bug is not syntactic, but rather semantic. The result can be highly biased applications that are not fully understood.
PS Content Team: Do you have any other thoughts about the impact of self-healing code or AI-assisted programming in general? Do you think there are other risks or opportunities worth talking about?
Jeremy: I think there may be some bias in some cases. I once had a boss who was very picky about code. When I build the code and he reviews it, people will recognize it and say, “This is what Bob worked on, right?” And they knew it because it was elaborate and over-engineered. He influenced us and made things more complicated than they needed to be. AI can influence your code and introduce bias, whether you’re aware of it or not. It may move things in a particular direction, which may be common but not necessarily correct.
It’s like a telephone game, right? In that game, you can ask, “What did George say?” and Google will go around and write down what everyone said, then look through your notes and give you the answer. But ChatGPT writes down what everyone said in every game of Telephone and then guesses what George said based on what George said when he played Telephone 10,000 times. And because of that, things go wrong.
Lars: AI isn’t going anywhere, but it is enabling more people to do smarter things, faster. It comes with both risks and opportunities, but the most exciting thing is that we don’t yet fully know how it will end up.