This project, titled "Empowering Education in the Digital Age," is a pioneering initiative aimed at enhancing the learning experience in computer science education through the use of Large Language Models (LLMs). Our goal is to develop an educational assistant that utilizes advanced AI techniques to provide personalized learning experiences, thereby improving student engagement and critical thinking skills.
The primary objective of this project is to explore the potential of LLMs in transforming educational methodologies. By integrating these models into educational content, we aim to create an interactive, adaptive learning environment that caters to the unique needs and learning styles of each student.
The project encompasses a series of steps, beginning with the conversion of textbook content from PDF to XML format, followed by OCR processing of textbook datasets. We then transform Markdown content into XML for structured data representation. A significant part of our project involves generating conversation trees to simulate educational dialogues, which are vital for training our LLM. The conversation outputs in JSONL format are then converted to JSON for compatibility with various tools and platforms. The culmination of our project is the fine-tuning of the Mistral OpenOrca model in Google Colab, specifically tailored to our educational context.
We anticipate that the integration of LLMs into educational content will result in a more engaging and effective learning experience. This project aims to demonstrate how AI can be used to create adaptive learning materials that respond to the needs of individual learners, thereby enhancing the overall quality of education.
- The detailed research paper associated with this project can be found at GitHub: elijah-kulpinski.
- Trained models and datasets utilized in this project are available at Hugging Face: ByteSized.
- The notebook composed of individual scripts used to run this is available at Google CoLab.