Digital technologies make it increasingly easy to collect large amounts of data from people in a variety of contexts (e.g., health, fitness, games, education). In Education, these large data sets concern the assessment of skills or knowledge (e.g., standardized tests) but may also be informative about the learning process itself (e.g., log files from digital tutoring systems). Big Data presents a unique opportunity to advance our understanding of human learning and improve Education (e.g., via personalized training); yet, much work needs to be done to exploit those data sets. Most approaches today require expert knowledge to label problem sets and to structure both assessments and learning interventions. However, such expert knowledge is only rarely available and may be inaccurate. In addition, individual differences are commonly treated using an ability parameter, which masks the role of individual differences in background knowledge and may lead to stigmatization. The present project is grounded in state-of-the-art machine learning and aims to develop new theoretical and practical approaches to assessment and learning that don’t require expert knowledge and allow automatic discovery of student knowledge states that both account for individual differences and can guide interventions.This project will address three major problems of instructional assessment and facilitation. Aim 1. Inspired by machine learning methods that can discover shared topics from scientific document corpora, we propose to discover a latent knowledge space spanned by tests and tutoring systems, identifying both the student’s knowledge state and natural topics in the domain. For example, there might be different subgroups of students (e.g., German- vs. French-native speakers passing an exam in German) and the performance of these subgroups might differ on subclasses of items (e.g., some test items rely less on language than others). These methods have been successfully applied to recommender systems, discovering natural subgroups of people and items based on structure in preferences. Aim 2. In contrast to the complex learning that occurs in education, most approaches equate learning with the improvement in performance on a one-dimensional variable (e.g., number of errors with practice). Instead, we will investigate learning as dynamic changes in knowledge state by developing models that take inspiration from dynamic topic modelling a recent unsupervised machine learning framework that has been successfully applied to a large corpus of journal articles to discover scientific topics (in our case: sets of skills) and how these topics evolve over time (in our case: learning). Aim 3. Personalized training requires matching instructions to the learners’ state. However, doing so effectively requires not only knowledge that is typically unavailable but also meaningful measures of instructional efficacy. To model interactions between instructional order and knowledge state, we will introduce a novel measure of scaffolding efficiency grounded in information-theoretic causal power theory. This measure assesses the impact of an instructional unit on subsequent learning using mutual information and will permit the identification of “instructional bottlenecks” that gate learning and require further scaffolding. Using the recommender item response theory framework, these bottlenecks can be associated with clusters of students, allowing individualized scaffolding based on knowledge and learning type subgroups.The ultimate goal of this research project is to develop efficient algorithms to design optimal learning sequences at the single-subject level. In addition to our focus on developing a computational framework and on extracting knowledge from data sets we will use experimental approaches to test the efficacy of our automated instructional design algorithms in an education relevant domain.