A blog about Open Source, my work at the Gates Foundation and those I am fortunate enough to collaborate with

You can scroll the shelf using and keys

Why do we make learning analytics so @$^*%#$ hard!?

August 5, 2011

Bigger whiteboards would help!

Here we are jamming away earlier this week on a workshop Gates co-hosted with rockstar computergrid maven Ian Foster and the Computation Institute at the University of Chicago.  I’ve listed the attendees and their expertise at the end of this post so you have a sense for the mix in the room.

We gathered with 4 objectives in mind:

  1. Describe a reference implementation of a set of shared services capable of supporting cognitive analytics, adapative learning and educational datamining across 10mm+ users
  2. Outline the data, analytical applications or infrastructural components we are currently missing in order to deliver such an implementation
  3. Identify the 1 or 2 demonstration projects that need to be built in the next 12-24 months in order to signal new market direction and capabilities required
  4. Flag specific policy or IP enablers that a group should tackle in order to increase likelihood of new market success

Here’s some of the highlights of where we came out, and good news is that after 6 intensive hours we DO have our marching orders to get cracking on a demonstration project (more to come on that topic in future posts):

Paradata matters – strictly speaking a class of metadata that captures information about the process used to collect data or each observation in the data.  Used thoughtfully paradata can expand the range of data types you capture to include thereby enriching the data types you have to work with and the inferences you can derive from their analysis.  This is one of the core principles behind Steve Midgley’s work at DOE.  Put another way: paradata gives you richer context

Enable analysis of dataflow rather than data – data is too static a concept. We need to start thinking in terms of mining dataflow – kids these days traverse informal and formal learning spaces at increasing speed and frequency.  Educational researchers are still stuck struggling to get access to 2 year old end-of-year test data!  In order for personalized learning to be made actionable at point of service, we need to be able to better track the flow of data for a struggling individual (subject to security and privacy etc.) If we can do it for Medicine, why not Education?  You think my sperm count is any less sensitive than how I am doing in 5th grade math!?  Wait a minute, that didn’t come out right….

There’s a whole new market for services waiting to emerge – from recommendation and predictive services to content aggregation and capability measurement.  Hard to predict what will actually succeed with teachers, kids and parents, but clear that there is a rich group of services that can save teachers time and actually help kids and parents get a handle on why and where they are struggling.  Socos which is led by Vivienne Ming is one exciting example of an early start-up in this space

Trust is earned a recommendation at a time – as potential users service providers need to quickly establish some level of trust in terms of their ability to support us and secure our repeat business.  That trust needs to be formed as early in the transaction process as possible.  Netflix, iTunes and Amazon all demonstrate the power of recommendations.  However, to really convert you need to provide context, and that’s where most of the current consumer services still fall short.  Why is this resource being recommended to me now?  What is the recommendation based on?  Are there alternatives I might want to consider?  Were they factored in before this choice was prioritized?  The nagging feeling I have here is that the consumer engines actually have the ability to do that now, but choose not to for fear of freaking us out completely in a Big Brother way.  This is why we desperately need Diaspora or similar concept to gain traction soon so we can all get our heads around what it means to own and manage a persona and avoid becoming a gadget

Current approaches to data privacy may be barse ackward– researchers at Microsoft Research are currently pursuing some hard-core work around the concept of Differential Privacy which asserts that “achieving differential privacy revolves around hiding the presence or absence of a single individual”  What’s cool about this (and I in no way profess to understand all the math behind it completely) is that “sharper upper and lower bounds on noise required for achieving differential privacy against a sequence of linear queries can be obtained by understanding the geometry of the query sequence”  Which in other words means that sufficient noise can be introduced into any given query in order to render it essentially private.  Match this with point of service permissioning based on access rights and you have a much more robust and scalable approach to enabling researcher access to data that does not require months and years of paper application processing.  For more on this, and the source of the above quotes please check out Cynthia Dwork’s paper in the Communications of the Association for Computing Machinery

The current IRB process needs mending – that’s Institutional Review Board to you.  The groups that exist to protect the rights and welfare of research subjects.  They have the power to reject or approve any and every aspect of a research request.  The result – rather like the horrific Patent Process we are subject to in the US – is a humungous backlog of requests and a byzantine review and approval process.  With the best of intentions we have managed to create a system that is choking the life out of the very research it is meant to enable.  And heaven help you if your request cuts across more than one industry or IRB.

Looking forward to sharing more details as we progress on this area.  For now here is a list of the folks my colleagues and I were lucky enough to work with that day:

Ian Foster (Argonne National Laboratory and University Chicago, Mathematics and Computer Science)

Paul Goren
(University of Chicago Urban Education Institute, Education data research and policy)

Stacy Ehrlich (University of Chicago)

Connie Yowel (MacArthur Foundation, Public Education and Digital Media)

An-Me Chung (MacArthur Foundation)

Ken Koedinger (Carnegie Mellon, Computer Science, Learning Analytics and Cognitive Psychology)

Steve Midgley (Office of Education and Technology, Department of Education, Data interoperability and Online learning)

Helen Taylor Martin (UT Austin, College of Education, Linguistics, Psychology and Classroom Technologies)

Vivienne Ming (Socos, Cognitive Modelling and Predictive Analytics)

Roy Pea (Stanford School of Education, Learning Sciences and Education)

Armistead Sapp (SAS Institute, Software development, Data and Analytics)

Daniel Schwartz (Stanford School of Education, Instructional Methods, Teachable Agents, Cognitive Neuroscience)

John Palmer (Applied Minds, Computer Science and Mathematics)

Tony Hey (Microsoft Research, Technical Computing)

Gary West (CCSSO, Education Information Systems and Research)

Mark Luetzelschwab (Agilix, Education Technology & Systems Interoperability)

Alex Szalay (John Hopkins University)


What do you think?

Please keep your comments polite and on-topic.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: