Welcome to Datathon at IndoML 2025. Like previous years, the datathon will be held in conjunction with IndoML 2025. We invite participation from students as well as early-career professionals. Top-performing teams will be invited to attend IndoML 2025 and present their solutions to leading researchers and professionals from both academia and industry. These teams will also receive cash prizes of INR 2,00,000.
Interested in participating? You can now register for the datathon using the button or QR code below:
The rapid development of Large Language Models (LLMs) has created new opportunities for scalable and personalized AI-driven education. With their growing integration into educational applications, AI tutors are increasingly supporting student learning in subjects such as mathematics (Macina et al., 2023; Wang et al., 2024). While these systems can generate fluent and context-aware responses, their pedagogical effectiveness—specifically the ability to correctly identify student mistakes and provide meaningful guidance—remains underexplored. Building on the work of Maurya et al. (2025) and the BEA Shared Task 2025 (Kochmar et al., 2025), this datathon invites the community to develop models that evaluate the meta-reasoning capabilities of AI tutor responses, focusing on mistake identification and pedagogically sound guidance.
Participants will receive annotated educational dialogues between students and tutors, primarily in mathematics. For each dialogue, the last few utterances exhibit mistakes or confusion, followed by a tutor response (human- or LLM-generated). The task is to determine whether the tutor’s response is pedagogically appropriate in two aspects: identifying the student’s mistake and providing effective guidance, as defined by Maurya et al. (2025).
The task consists of two tracks. In each track, participants classify tutor responses into one of three labels—Yes, No, or To some extent.
Has the tutor identified/recognized a mistake in a student’s response?
Does the tutor offer correct and relevant guidance, such as an explanation, elaboration, hint, examples, and so on?
We constructed the train, dev-test, and test splits from the MRBench dataset, which was used in the BEA Shared Task 2025 (Kochmar et al., 2025). The dataset builds on mathematical dialogues from the MathDial (Macina et al., 2023) and Bridge (Wang et al., 2024) datasets, representing secondary and primary school-level mathematics, respectively. It comprises conversation histories between students and tutors, where the last few student utterances often contain confusion or mistakes. Tutor responses are generated by seven LLMs acting as AI tutors (GPT-4, Sonnet, Llama3.1-8B, Llama3.1-405B, Gemini, Mistral, Phi-3) and two human tutors (Expert and Novice), aiming to mitigate student confusion or mistakes. Novice tutor responses are available only for the Bridge dialogues. Each response is annotated with one of three labels — Yes, No, or To some extent — to assess its pedagogical appropriateness for Mistake Identification and Providing Guidance. For more details on the annotation guidelines and the initial release of MRBench, please refer to Maurya et al., 2025.
Split | #Dialogues | #Responses |
---|---|---|
Training Set | 300 | 2,476 |
Dev-Test Set | 41 | 333 |
Test Set | 150 | 1,214 |
[
{
"conversation_id": "258053434",
"conversation_history": "Tutor: What is the value of 3^3? \nStudent: 9",
"tutor_responses": {
"Sonnet": {
"response": "Let's double-check that: 3^3 means 3 multiplied by itself three times, so it's 3 x 3 x 3, which equals 27.",
"annotation": {
"Mistake_Identification": "Yes",
"Providing_Guidance": "Yes"
}
},
"GPT4": {
"response": "That's a good effort, but actually, 3^3 means 3 multiplied by itself three times, which equals 27.",
"annotation": {
"Mistake_Identification": "Yes",
"Providing_Guidance": "Yes"
}
},
...
}
]
[
{
"conversation_id": "613640346",
"conversation_history": "Tutor: What is the product of 12 and 6? \nStudent: 62",
"tutor_responses": {
"Novice": {
"response": "It seems like your answer is incorrect."
},
"Gemini": {
"response": "Remember, when we multiply, we're combining groups. Let's try that again: How many groups of 6 are there in 12?"
}
},
...
}
]
All datasets will be available for download closer to the start date of the competition.
All submissions will be evaluated using both Macro F1-Score and Accuracy. A public leaderboard will be hosted on the CodaBench platform, displaying scores for Macro F1 and Accuracy. Final rankings will be determined based on the Macro F1-Score as the primary evaluation metric.
A total of INR 40,000 is reserved for teams actively contributing during the development stage:
Category | Reward per Team | # Teams | Track 1: Mistake Identification |
Track 2: Providing Guidance |
Total |
---|---|---|---|---|---|
Ranks 1–3 | INR 5,000 | 3*2 = 6 | INR 15,000 | INR 15,000 | INR 30,000 |
Ranks 4–8 | INR 1,000 | 5*2 = 10 | INR 5,000 | INR 5,000 | INR 10,000 |
Total Prize Pool | INR 40,000 |
The total prize pool of INR 1,50,000 will be distributed across two tracks as follows:
Category | # Teams | Track 1: Mistake Identification |
Track 2: Providing Guidance |
Total |
---|---|---|---|---|
1st Rank | 1*2 = 2 | INR 25,000 | INR 35,000 | INR 60,000 |
2nd Rank | 1*2 = 2 | INR 15,000 | INR 25,000 | INR 40,000 |
3rd Rank | 1*2 = 2 | INR 8,000 | INR 12,000 | INR 20,000 |
Ranks 4–8 | 5*2 = 10 | INR 12,500 (per team INR 2,500) | INR 17,500 (per team INR 3,500) | INR 30,000 |
Total Prize Pool | INR 1,50,000 |
*Prizes will be awarded to top-performing teams based on final rankings and a comprehensive evaluation by the organizing committee. The committee reserves the right to make final decisions regarding prize distribution and any adjustments to the evaluation criteria. An amount of INR 10,000 from the total budget will be allocated to support the presentation logistics for top-performing teams.
Event | Dates |
---|---|
Registration | 15th June – 15th August 2025 |
Development Phase | 15th August – 26th September 2025 |
Test Phase | 26th September – 12th October 2025 |
Final Result Announcement | 12th – 19th October 2025 |
Report Submission for Top Teams (2 pages) | 19th October – 9th November 2025 |
Presentation at IndoML'25 | 19th – 21st December 2025 |
All deadlines are at 12:00 Noon IST (Indian Standard Time).
Jayesh Agarwal
BITS Pilani, Hyderabad Campus
Hemanth Karthikeya Ganti
BITS Pilani, Hyderabad Campus
If you have any questions or need assistance, feel free to reach out through the following channels:
We express our sincere gratitude to the organizers of the BEA Shared Task 2025 for providing access to their datasets and task structure, which we have used and adapted for this datathon. Their efforts in curating high-quality annotated data form the foundation for organizing this datathon.
Any publications or derivative works resulting from this datathon should properly acknowledge the original sources associated with the BEA Shared Task 2025, including the shared task findings paper (Kochmar et al., 2025) and dataset papers (Maurya et al., 2025; Macina et al., 2023; Wang et al., 2024).