Revolution in Grading by AI? Changes in University Transcripts, ChatGPT Alters the "Reliability of Evaluations"

Revolution in Grading by AI? Changes in University Transcripts, ChatGPT Alters the "Reliability of Evaluations"

What Does an "A" Grade Mean in Universities After ChatGPT?

For a long time, an "A" on a university transcript was a testament to excellence. For students, it was a weapon for graduate school applications and job hunting; for universities, it was a number demonstrating educational outcomes; and for companies, it was a convenient metric for screening applicants.

However, now that generative AI has become a staple on students' desks, the meaning of that "A" is beginning to waver. Does a high grade truly reflect the student's own understanding and thinking skills? Or are teachers merely evaluating AI-polished essays, AI-written code, and AI-refined submissions?

A study from UC Berkeley, introduced by the German tech media Blogspan, presents this issue with quite specific numbers. The study focused on grade data from courses at a large public research university in Texas. It analyzed over 500,000 grade records from the fall semesters of 2018 to 2025, covering 319 courses and 84 departments, based on student credit hours.

The focus of the study was on how the grade distribution in universities changed after ChatGPT was made publicly available in November 2022. Rather than simply concluding that "recent students have better grades," the researchers focused on the content of assignments in each course. They differentiated between courses with many assignments in writing, reports, and programming, which generative AI excels at, and courses with oral presentations, practical skills, and face-to-face exams, which AI finds difficult to substitute. They then compared the grade changes before and after the advent of ChatGPT.

As a result, in courses with many assignments where AI is easily used, the percentage of A grades increased by 13 points, which corresponds to about a 30% increase compared to the 2022 level. The average GPA also rose by 0.12 points, compressing the grade distribution towards the top. In other words, rather than a uniform improvement across the board, there was a shift where those who were A-minus or B-plus were pushed up to an A.

It is important to note that the study does not conclude that "all students who used AI were cheating." Generative AI can also serve as a learning aid, helping with structuring essays, expanding ideas, finding code errors, and assisting in reading reference materials. The problem lies in the blurred line between support and substitution.

For instance, if a student organizes their own ideas using AI and then reconsiders them, AI can be said to be assisting in learning. On the other hand, if a student pastes the assignment text and submits the output almost as is, what is being evaluated is closer to the quality of AI's output rather than the student's understanding. This latter possibility is what the study focused on.

The clue to this was the weight of homework and take-home assignments. If the grade increase was truly due to an improvement in students' understanding, grades should have risen similarly not only in homework-centered courses but also in exams and face-to-face evaluations. However, in reality, the grade increase was strongly concentrated in courses with a high weight of homework and take-home assignments. This suggests a high possibility that AI is doing the work on behalf of students in places where teachers cannot directly observe the students' work processes.

Furthermore, in verification using the ratio of oral presentations, where AI is not very useful, similar grade increases were not observed. This also makes it difficult to explain the phenomenon as merely a change of the times or an overall improvement in student excellence. Researchers point out the possibility that generative AI is creating a new type of grade inflation that "raises grades but does not necessarily raise abilities."

Of course, grade inflation in universities itself is not a new issue. In prestigious universities in the United States, the percentage of A grades has been increasing for some time. Structures have long existed that make it difficult for teachers and universities to grade strictly due to student satisfaction, course evaluations, inter-university competition, and consideration for the job market.

However, grade inflation due to AI differs in nature from conventional issues. Traditional grade inflation was mainly caused by grading standards on the teacher's side or systems on the university's side. But generative AI changes the submissions themselves before they are graded. Even if teachers do not change their standards, only the completeness of the reports or code submitted by students is elevated. The appearance becomes impressive, and the evaluation rises. However, it is unclear whether the thinking and trial-and-error behind it belong to the student.

This point has elicited strong reactions on social media and in expert communities. On LinkedIn, among educators and business people, the sentiment that "it's not surprising" is prominent. The reaction is that it's only natural for the completeness of reports and programming assignments to improve if generative AI becomes widely usable. On the other hand, many voices express that the problem is not so much about cheating itself, but rather that it has become unclear what universities are evaluating.

Particularly striking is the reaction with the sentiment, "AI fluency is important, but so is learning. We must not confuse the two." This is a crucial perspective when considering education in the AI era. Being able to use AI will undoubtedly become a necessary skill in the future society. However, the ability to use AI to refine deliverables and the ability to deeply understand the subject, explain it in one's own words, and respond to unknown problems are not the same.

On social media, there is also a noticeable opinion that simply banning AI in universities is not enough. Measures such as cracking down on students with AI detection tools, reverting all reports to handwritten form, and making all exams supervised seem straightforward at first glance. However, in reality, it is difficult to completely eliminate the use of AI. Furthermore, education that completely prohibits the use of AI does not necessarily cultivate practical skills for the workforce after graduation.

The emerging discussion is that the evaluation design itself should be changed. For example, evaluate not only the submissions but also the work process. Have students explain at which stage they used AI, how they considered AI's suggestions, what they adopted, and what they revised. Conduct a short oral examination after the report submission to have them explain their points on the spot. For coding assignments, inquire not only about the completed program but also about the design decisions and error handling history. This approach makes it easier to see whether students truly understand the deliverables, rather than focusing on whether they used AI.

What should be most avoided in educational settings is the simplistic dichotomy of "using AI is cheating" and "not using it is correct." This is because students are already using AI. Without clearly defining what is permissible and what constitutes substitution, neither students nor teachers can make judgments.

For instance, allow checking for typographical errors but prohibit generating arguments. Allow using AI for brainstorming but require students to create the final argument and structure themselves. In programming, permit debugging assistance but do not allow outsourcing the design of major algorithms. These rules should differ for each class, which is why they need to be specified in syllabi and assignment explanations.

This issue is not confined to universities alone. It also affects corporate recruitment. If GPAs and transcripts reflect evaluations of AI-refined deliverables rather than the student's own abilities, companies will find it harder to trust grades. As a result, interviews, practical exams, portfolios, and achievements during internships will likely become more emphasized as alternative evaluation methods.

For students, this is not merely a story of "getting high marks with less effort." If they become too accustomed to an environment where AI does the thinking for them, they will have fewer experiences of struggling, failing, and correcting themselves. Learning requires a certain amount of effort. Deciphering difficult texts, grappling with non-functional code, and putting into words ideas that are hard to explain—these processes are what make knowledge one's own.

Generative AI can shorten that effort. The ability to shorten effort itself is not inherently bad. However, if all effort is eliminated, only grades remain, and abilities do not develop. This is precisely the danger highlighted by the study. University transcripts are becoming more polished. However, that polish does not necessarily signify a deepening of learning.

This discussion is not unrelated to Japanese universities, vocational schools, and high schools. There are already many situations where generative AI can intervene, such as report assignments, essays, programming exercises, inquiry-based learning, and presentation material creation. The quality of output in Japanese is also rapidly improving, making it increasingly difficult to distinguish between "student-written text" and "AI-polished text."

Therefore, educational institutions need to change their approach to evaluation early on. Instead of grading only the finished product, they should assess the process, explanatory skills, application skills, and understanding through dialogue. Rather than hiding AI usage, they should have students record how they use it. Cultivate the ability to question, verify, and reject AI-generated answers if necessary. These are also the new academic skills of the AI era.

Ultimately, the question is not just whether "students used AI." It is a more fundamental issue of "what universities call academic ability," "what grades prove," and "what abilities society trusts."

In universities after ChatGPT, just because the number of A grades has increased does not necessarily mean that education is successful. Rather, the more A's there are, the more we need to rigorously question what those A's are measuring. The ability to effectively use AI is important. However, we must not mistake AI-generated deliverables for the student's own understanding.

University evaluation is now at a major crossroads. Will we try to return to the past by banning AI? Will we leave AI unchecked and hollow out the meaning of grades? Or will we redesign evaluations to make the student's own thinking visible while assuming the presence of AI?

To ensure that an "A" continues to be a symbol of true excellence, the way grades are assigned must be redesigned to fit the AI era.


Source URL

Blogspan "Seit ChatGPT regnet es Einsen: Was eine Studie über die Noten-Inflation an der Uni herausfand"
Starting point of the article. Introduces grade inflation in universities after ChatGPT based on a study from UC Berkeley.
https://www.blogspan.net/ki-noten-inflation-studie-uni-chatgpt/

UC Berkeley Center for Studies in Higher Education "Artificial Intelligence and Grade Inflation"
Official introduction page of the study. Confirmed authors, publication date, research summary, and key points such as the 13-point increase in A grades.
https://cshe.berkeley.edu/publications/artificial-intelligence-and-grade-inflation-cshe-higher-education-working-paper-series

Igor Chirikov "Artificial Intelligence and Grade Inflation" PDF
Original working paper. Confirmed details on analysis subjects, research methods, grade distribution, relationship with homework weight, and verification using oral presentations.
https://escholarship.org/content/qt80x8d3qd/qt80x8d3qd.pdf

The Decoder "AI is inflating student grades, and the effect points to outsourced work, not better learning"
English-language article explaining the study results. Confirmed the argument that AI is substituting for assignment work rather than improving learning.
https://the-decoder.com/ai-is-inflating-student-grades-and-the-effect-points-to-outsourced-work-not-better-learning/

Axios "ChatGPT fuels boom of A grades in schools"
Confirmed reports on researcher comments, homework weight, and the need for AI-integrated assignments and usage records.
https://www.axios.com/local/colorado-springs/2026/06/18/ai-grade-inflation-college

LinkedIn Post: Igor Chirikov
Author's reaction to the Wall Street Journal's report on the study. Confirmed the argument on how grades as recruitment and evaluation signals might change.
https://www.linkedin.com/posts/igor-chirikov_a-grades-are-suddenly-everywhere-since-activity-7460733177150754816-IpAz

LinkedIn Post: Emma Cummings / William Garrity
Example of reactions on social media. Referred to discussions on not confusing AI utilization skills with learning and the need to rethink evaluation methods.
https://www.linkedin.com/posts/emma-g-c_a-grades-are-suddenly-everywhere-since-activity-7462192795160588290-3WXn

LinkedIn Post: Eric Menna
Referred to reactions suggesting that AI highlights the weaknesses of traditional assignments and promotes a shift to oral examinations, interactive evaluations, and project-based assessments.
https://www.linkedin.com/posts/eric-menna_ai-is-making-skepticism-about-higher-ed-even-activity-7458183195553857536-1eyK

Harvard Magazine "The True Cost of Grade Inflation at Harvard"
Confirmed the context of increasing A grades at Harvard as a background to grade inflation in U.S. universities that predates AI.
https://www.harvardmagazine.com/university-news/harvard-grade-inflation-faculty-marks

Yale "Report of the Committee on Trust in Higher Education"
Confirmed the context of the trust issue in U.S. higher education as grades are losing their function of conveying learning content.
https://president.yale.edu/sites/default/files/2026-04/Report-of-the-Committee-on-Trust-in-Higher-Education.pdf