Welcome to Datathon at IndoML 2025. Like previous years, the datathon will be held in conjunction with IndoML 2025. We invite participation from students as well as early-career professionals. Top-performing teams will be invited to attend IndoML 2025 and present their solutions to leading researchers and professionals from both academia and industry. These teams will also receive cash prizes of INR 2,00,000.
📅 19th August 2025
The registration deadline for the Datathon has been extended until 31st August 2025.
📝 Register NowFAQ section is also updated!
📅 15th August 2025
The Datathon officially begins today! Get ready to showcase your skills, collaborate with your team, and submit your best solutions.
Download the datasets from: GitHub Repository .
📅 15th June 2025
We are excited to announce that registration for the Datathon is now open. Click the button below to register and secure your spot.
📝 Register NowThe rapid development of Large Language Models (LLMs) has created new opportunities for scalable and personalized AI-driven education. With their growing integration into educational applications, AI tutors are increasingly supporting student learning in subjects such as mathematics (Macina et al., 2023; Wang et al., 2024). While these systems can generate fluent and context-aware responses, their pedagogical effectiveness—specifically the ability to correctly identify student mistakes and provide meaningful guidance—remains underexplored. Building on the work of Maurya et al. (2025) and the BEA Shared Task 2025 (Kochmar et al., 2025), this datathon invites the community to develop models that evaluate the meta-reasoning capabilities of AI tutor responses, focusing on mistake identification and pedagogically sound guidance.
Participants will receive annotated educational dialogues between students and tutors, primarily in mathematics. For each dialogue, the last few utterances exhibit mistakes or confusion, followed by a tutor response (human- or LLM-generated). The task is to determine whether the tutor’s response is pedagogically appropriate in two aspects: identifying the student’s mistake and providing effective guidance, as defined by Maurya et al. (2025).
The task consists of two tracks. In each track, participants classify tutor responses into one of three labels—Yes, No, or To some extent.
Has the tutor identified/recognized a mistake in a student’s response?
Does the tutor offer correct and relevant guidance, such as an explanation, elaboration, hint, examples, and so on?
We constructed the train, dev-test, and test splits from the MRBench dataset, which was used in the BEA Shared Task 2025 (Kochmar et al., 2025). The dataset builds on mathematical dialogues from the MathDial (Macina et al., 2023) and Bridge (Wang et al., 2024) datasets, representing secondary and primary school-level mathematics, respectively. It comprises conversation histories between students and tutors, where the last few student utterances often contain confusion or mistakes. Tutor responses are generated by seven LLMs acting as AI tutors (GPT-4, Sonnet, Llama3.1-8B, Llama3.1-405B, Gemini, Mistral, Phi-3) and two human tutors (Expert and Novice), aiming to mitigate student confusion or mistakes. Novice tutor responses are available only for the Bridge dialogues. Each response is annotated with one of three labels — Yes, No, or To some extent — to assess its pedagogical appropriateness for Mistake Identification and Providing Guidance. For more details on the annotation guidelines and the initial release of MRBench, please refer to Maurya et al., 2025.
Split | #Dialogues | #Responses | Intended Use |
---|---|---|---|
Training Set | 300 | 2,476 | Model Training |
Dev-Test Set | 41 | 333 | Test set for development phase |
Test Set | 150 | 1,214 | Test set for evaluation phase |
[
{
"conversation_id": "258053434",
"conversation_history": "Tutor: What is the value of 3^3? \nStudent: 9",
"tutor_responses": {
"Sonnet": {
"response": "Let's double-check that: 3^3 means 3 multiplied by itself three times, so it's 3 x 3 x 3, which equals 27.",
"annotation": {
"Mistake_Identification": "Yes",
"Providing_Guidance": "Yes"
}
},
"GPT4": {
"response": "That's a good effort, but actually, 3^3 means 3 multiplied by itself three times, which equals 27.",
"annotation": {
"Mistake_Identification": "Yes",
"Providing_Guidance": "Yes"
}
},
...
}
]
[
{
"conversation_id": "613640346",
"conversation_history": "Tutor: What is the product of 12 and 6? \nStudent: 62",
"tutor_responses": {
"Novice": {
"response": "It seems like your answer is incorrect."
},
"Gemini": {
"response": "Remember, when we multiply, we're combining groups. Let's try that again: How many groups of 6 are there in 12?"
}
},
...
}
]
Download the datasets from: GitHub Repository .
All submissions will be evaluated using both Macro F1-Score and Accuracy. The public CodaBench leaderboard will display both Macro F1-Score and Accuracy, with the final ranking for both phases determined primarily by the Macro F1-Score.
A total of INR 40,000 is reserved for teams actively contributing during the development stage:
Category | Reward per Team | # Teams | Track 1: Mistake Identification |
Track 2: Providing Guidance |
Total |
---|---|---|---|---|---|
Ranks 1–3 | INR 5,000 | 3*2 = 6 | INR 15,000 | INR 15,000 | INR 30,000 |
Ranks 4–8 | INR 1,000 | 5*2 = 10 | INR 5,000 | INR 5,000 | INR 10,000 |
Total Prize Pool | INR 40,000 |
The total prize pool of INR 1,50,000 will be distributed across two tracks as follows:
Category | # Teams | Track 1: Mistake Identification |
Track 2: Providing Guidance |
Total |
---|---|---|---|---|
1st Rank | 1*2 = 2 | INR 25,000 | INR 35,000 | INR 60,000 |
2nd Rank | 1*2 = 2 | INR 15,000 | INR 25,000 | INR 40,000 |
3rd Rank | 1*2 = 2 | INR 8,000 | INR 12,000 | INR 20,000 |
Ranks 4–8 | 5*2 = 10 | INR 12,500 (per team INR 2,500) | INR 17,500 (per team INR 3,500) | INR 30,000 |
Total Prize Pool | INR 1,50,000 |
*Prizes will be awarded to top-performing teams based on final rankings and a comprehensive evaluation by the organizing committee. The committee reserves the right to make final decisions regarding prize distribution and any adjustments to the evaluation criteria. An amount of INR 10,000 from the total budget will be allocated to support the presentation logistics for top-performing teams.
Event | Dates |
---|---|
Registration | 15th June – |
Development Phase | 15th August – 26th September 2025 |
Test Phase | 26th September – 12th October 2025 |
Final Result Announcement | 12th – 19th October 2025 |
Report Submission for Top Teams (2 pages) | 19th October – 9th November 2025 |
Presentation at IndoML'25 | 19th – 21st December 2025 |
All deadlines are at 12:00 Noon IST (Indian Standard Time).
Jayesh Agarwal
BITS Pilani, Hyderabad Campus
Hriday Bhuta
BITS Pilani, Hyderabad Campus
Q1: For each phase, there is a maximum of 2 submissions per day and a total of 5 submissions. Will this limit increase?
Answer: No. Each phase will have a maximum of 2 submissions per day and a total of 5 submissions. Teams are strongly advised to split the provided training data into a local training set and a local validation set. Models should be developed using the local training set and validated on the local validation set. Finally, only the best-performing models’ predictions should be submitted during the Dev Test and Test phases in the development and evaluation stages, respectively.
Q2: Can I create an account with a different email address than the one I used for registration?
Answer: No. You must use the same email address for both the official registration and your Codabench account.
Q3: I am working alone, so I can only make 5 submissions on Codabench. But if another team has 3 members, each of them can make 5 submissions (totaling 15). Isn’t this unfair?
Answer: As per the official guidelines, all team members may register for the competition on Codabench, but only one designated member (team leader or single representative) should make the submissions. If more than one team member makes submissions, the team will be disqualified. Therefore, each team—regardless of size—has the same submission quota.
Q4: If my team member has also registered on Codabench, will we be disqualified?
Answer: No, your team will not be disqualified simply because multiple members registered. However, only one member should submit on behalf of the team. If more than one member makes submissions, the team will be disqualified.
Q5: Are there any restrictions on what type of models or data a team can use for model development?
Answer: We follow an open model and open data policy. Teams may use any publicly available, closed-source, or proprietary models, as well as additional data, augmentation techniques, or other strategies to improve their solutions.
Q6: What is the maximum team size?
Answer: There is no restriction on team size. The only requirement is that at least one member of the team must be affiliated with an Indian university or institution.
Q7: Why was my request for CodaBench denied?
Answer: There could be two possible reasons why your request to participate was denied: (1) Direct request via Codabench without prior registration: If you requested to join the competition directly through the Codabench platform, but did not complete the registration via the Google Form, your request will not be accepted. In this case, please register through the Google Form first and then send an email to the organizers. Once verified, we will approve your participation in the competition. (2) Mismatch in email addresses: If you are trying to participate on Codabench using a different email than the one you used during registration, the system will not recognize your request. Please make sure to use the same email address for both registration and Codabench participation. Please follow the appropriate step based on your case, and feel free to reach out to the organizers if you face any further difficulties.
If you have any questions or need assistance, feel free to reach out through the following channels:
We express our sincere gratitude to the organizers of the BEA Shared Task 2025 for providing access to their datasets and task structure, which we have used and adapted for this datathon. Their efforts in curating high-quality annotated data form the foundation for organizing this datathon.
Any publications or derivative works resulting from this datathon should properly acknowledge the original sources associated with the BEA Shared Task 2025, including the shared task findings paper (Kochmar et al., 2025) and dataset papers (Maurya et al., 2025; Macina et al., 2023; Wang et al., 2024).