We developed a series of exams that were performance based, based on Microsoft technology over 8 years ago. The idea was to put a candidate in a room with 4 servers that had predetermined problems.
Frontier AI models now match or surpass human expert performance on graduate-level science exams, competition mathematics, ...
Researchers graded the AI program alongside real students on four different law school final exams ChatGPT's grades ranged from a B to C- (Reuters) - ChatGPT cannot yet outscore most law students on ...