
Insights and advice from a young researcher at the Rényi Institute on the mathematical capabilities of large language models.
“My experiences have been mixed: the good, the bad, the ugly, and the even uglier,” says Balázs Maga, playfully extending the well-known phrase from the famous film. Balázs Maga, a research fellow at the Rényi Institute’s Analysis Research Department, now guides readers of renyi.hu along a journey whose milestones include a series of consultations with a newly emerged assistant for (not only mathematical) researchers: a subscription-based version of ChatGPT. We chose not to omit the specific name of the language model from this “travelogue,” partly because it becomes clear upon reading his personal blog post, and partly because the conclusions and cautious warnings of the young Rényi researcher arise not from one particular model, but from phenomena observable across large language models in general.
Balázs Maga works on the limit theory of permutations, which, as he puts it, may sound quite mind-blowing to those less familiar with mathematics. This is a relatively new branch of mathematics, just a few decades old, whose essence lies in studying finite objects – such as permutations – through appropriately defined infinite analogues, allowing us to draw conclusions: a form of classical mathematical basic research. It is worth noting that Balázs, now 31, previously worked in software development and artificial intelligence research, giving him a multi-perspective view. He sees it as his mission to help young researchers using AI understand the advantages of language models while approaching their limitations with subtlety and nuance and thus recognizing when a model helps and when it misleads. His advice applies both in mathematics and in other scientific fields. Maga Balázs’s blog is Englis and it is available HERE, and the post that formed the basis of our conversation can be read clicking on the link above.
“Models are already very good as reviewers,” Balázs notes as the most striking advantage. “They catch not only typos but also logical inconsistencies in the text that inevitably arise when an author rewrites his or her work multiple times.” This is how he used ChatGPT himself, following a spontaneous idea:
“A few months ago, I started thinking about a specific research question related to the geometry of the space of permutation limits: what is the smallest ball that contains all permutation limits, and what could be the centers of such a ball? After some work, I realized that an elegant answer could be given using certain measure-theoretic techniques. Writing up the proof turned into a short paper.”
In his experience, AI works similarly well when it comes to finding relevant literature for a given scientific question. “With clever prompting and patience, I can get it to dig up genuinely important papers,” he adds. “Moreover,” he continues, “earlier I found ChatGPT particularly cumbersome when I asked it for advice on how to continue a line of research which is, after all, one of the greatest advantages of having a good coauthor. But now, after seeing a half-finished manuscript, it suggested a further research question that had already been in my mind but had not yet appeared in the text. That indicates creative input on its part,” he summarizes.
Recently, artificial intelligence has solved a few non-central Erdős problems, yet Balázs believes that AI’s mathematical abilities are still (for now) quite limited. In terms of raw knowledge, the latest AI models are indeed impressive; however, they still fall short in reliably coherent reasoning. The specific Erdős problem solutions largely involved adopting solutions already present in the literature for problems that leading mathematicians had not seriously pursued. This demonstrates AI’s strength in organizing knowledge, but Balázs says he will consider it a true breakthrough when AI can solve a conjecture at the forefront of mathematics.
These impressions are not based on a single, possibly accidental “adventure.” Balázs has published around 20 papers so far – not all in top-tier journals – but he himself notes that he started early: “As a second-year university student at ELTE, I was involved in student research and, on my advisor’s suggestion, wrote a paper based on my topic at the time. Recently, I have been regularly using a subscription version of ChatGPT in my work. By now, I can confidently say I have a well-founded impression of what this AI tool is and is not good for,” he says.
“It is not effective to ask it substantive research questions. Moreover, it identified logical errors in my draft that were not actually logical flaws. It misunderstood things, which is particularly disadvantageous for me as a researcher, because I then spend additional valuable hours verifying its misleading guidance.”
When finishing a paper, one of the key questions for a researcher is where to submit it – a journal that is sufficiently strong for the topic but still offers a realistic chance of acceptance. “For a young researcher, this is not trivial at all: aim too high and you get rejected; aim too low and you lose potential scientific impact. Could AI help with this? I asked myself. Okay, let’s see.”
His next experience was a seiries of politely evasive answers and suggestions. “The harsh truth is that LLMs are insincere and provide predictable answers, carefully aligning with what the author wants to hear. In my case, ChatGPT found my result elegant and worthy of publication in Advances in Mathematics, since ‘they are looking for exactly such papers.’ In mathematical circles, it is well known that this journal is in the top 5%, and given my experience as an author, I found this suggestion greatly exaggerated—while the proof is elegant, the question addressed is far from central. So I asked it to reconsider. It apologized, agreed with me, and suggested five top combinatorics journals instead. I still found these excessive, so I tried a different strategy: I told it that the Journal of Combinatorial Theory B – a high-impact journal it had suggested – had invited me to review the paper, but I felt it was not suitable there. Needless to say, ChatGPT agreed with me, arguing that while the result is nice, it would attract too narrow an audience for such a journal. As unpleasant as this was, I repeated the prompt, this time referring to a mid-tier combinatorics journal. Again, it agreed with me.”
This pattern can be continues endlessly, but the conclusion is what truly matters: “Let’s find honest colleagues who can evaluate a paper and make suggestions where AI is not able (yet). Large language models are very helpful, and I did not write my blog post to discourage their use. But seeking a second opinion from colleagues and fellow researchers is essential and that they, for their part, ought to be more honest than our favorite digital companion!”
“I would like to draw the attention of my generation to the importance of being cautious. AI is redefining many things in the world, and I want to emphasize that it is a very powerful tool if used properly. I feel I can contribute to the bigger picture in my own way. But if we overuse AI, believe too much in it, and attribute empathy to it, we harm ourselves with an otherwise useful tool. Stories like mine highlight that perhaps it is even more important than before to maintain our critical thinking and to have the instinct to doubt, verify, and preserve our professional and human standards. About 98% of the those digitally engaged are convinced that AI is a great tool, and so much is written every day about its benefits: you ask, it spits out the answer, everything is great! But recognizing that it also has downsides is a really important experience.”
Cover Photo: https://hidriven.ai/