A blog about Open Source, my work at the Gates Foundation and those I am fortunate enough to collaborate with

You can scroll the shelf using and keys

Use the Metadata and set our children free!

August 6, 2011 1 Comment

In an earlier post I wrote about the emergence of a new UX for personalized learning called Learning Maps and how they will sit at the nexus of content, performance data and diagnostics for individual learners and study groups.  One of the enabling components which will make that nexus possible is a more consistent approach to metadata and metadata’s contextual wrapper, paradata, (a topic I just blogged on and which @stevemidgley is a key sponsor of at DOE)

Recently @BrandtRedd and I have collaborated to underwrite a partnership between the Association of Education Publishers and Creative Commons on their publication of a lightweight extension to – the Open Search collaborative recently launched by Google Bing and Yahoo and which is intended to accelerate markup of the web’s pages in ways recognized by the major search providers.  In an exercise in mutual self-interest, it is our hope that early adoption of a schema extension can drive an improvement in the search experience for educational resources while giving OER and commercial publishers sufficient incentive to stay the course as a result of the improved UX the extension will help them to deliver to their customers (and yes, in some cases advertisers, as a result).

Our investment in a new and lightweight schema represents one of the 4 building blocks necessary to create a vibrant, competitive market of high-quality resources for personalized learning (the other three being learning maps, data and identity interop, and APIs for learning orchestration)

You can read more about the extension effort here and I will be blogging shortly on the improvements in UX we can expect as a result of the introduction of and its purposeful leveraging of HTML5 and CSS3


Why do we make learning analytics so @$^*%#$ hard!?

August 5, 2011

Bigger whiteboards would help!

Here we are jamming away earlier this week on a workshop Gates co-hosted with rockstar computergrid maven Ian Foster and the Computation Institute at the University of Chicago.  I’ve listed the attendees and their expertise at the end of this post so you have a sense for the mix in the room.

We gathered with 4 objectives in mind:

  1. Describe a reference implementation of a set of shared services capable of supporting cognitive analytics, adapative learning and educational datamining across 10mm+ users
  2. Outline the data, analytical applications or infrastructural components we are currently missing in order to deliver such an implementation
  3. Identify the 1 or 2 demonstration projects that need to be built in the next 12-24 months in order to signal new market direction and capabilities required
  4. Flag specific policy or IP enablers that a group should tackle in order to increase likelihood of new market success

Here’s some of the highlights of where we came out, and good news is that after 6 intensive hours we DO have our marching orders to get cracking on a demonstration project (more to come on that topic in future posts):

Paradata matters – strictly speaking a class of metadata that captures information about the process used to collect data or each observation in the data.  Used thoughtfully paradata can expand the range of data types you capture to include thereby enriching the data types you have to work with and the inferences you can derive from their analysis.  This is one of the core principles behind Steve Midgley’s work at DOE.  Put another way: paradata gives you richer context

Enable analysis of dataflow rather than data – data is too static a concept. We need to start thinking in terms of mining dataflow – kids these days traverse informal and formal learning spaces at increasing speed and frequency.  Educational researchers are still stuck struggling to get access to 2 year old end-of-year test data!  In order for personalized learning to be made actionable at point of service, we need to be able to better track the flow of data for a struggling individual (subject to security and privacy etc.) If we can do it for Medicine, why not Education?  You think my sperm count is any less sensitive than how I am doing in 5th grade math!?  Wait a minute, that didn’t come out right….

There’s a whole new market for services waiting to emerge – from recommendation and predictive services to content aggregation and capability measurement.  Hard to predict what will actually succeed with teachers, kids and parents, but clear that there is a rich group of services that can save teachers time and actually help kids and parents get a handle on why and where they are struggling.  Socos which is led by Vivienne Ming is one exciting example of an early start-up in this space

Trust is earned a recommendation at a time – as potential users service providers need to quickly establish some level of trust in terms of their ability to support us and secure our repeat business.  That trust needs to be formed as early in the transaction process as possible.  Netflix, iTunes and Amazon all demonstrate the power of recommendations.  However, to really convert you need to provide context, and that’s where most of the current consumer services still fall short.  Why is this resource being recommended to me now?  What is the recommendation based on?  Are there alternatives I might want to consider?  Were they factored in before this choice was prioritized?  The nagging feeling I have here is that the consumer engines actually have the ability to do that now, but choose not to for fear of freaking us out completely in a Big Brother way.  This is why we desperately need Diaspora or similar concept to gain traction soon so we can all get our heads around what it means to own and manage a persona and avoid becoming a gadget

Current approaches to data privacy may be barse ackward– researchers at Microsoft Research are currently pursuing some hard-core work around the concept of Differential Privacy which asserts that “achieving differential privacy revolves around hiding the presence or absence of a single individual”  What’s cool about this (and I in no way profess to understand all the math behind it completely) is that “sharper upper and lower bounds on noise required for achieving differential privacy against a sequence of linear queries can be obtained by understanding the geometry of the query sequence”  Which in other words means that sufficient noise can be introduced into any given query in order to render it essentially private.  Match this with point of service permissioning based on access rights and you have a much more robust and scalable approach to enabling researcher access to data that does not require months and years of paper application processing.  For more on this, and the source of the above quotes please check out Cynthia Dwork’s paper in the Communications of the Association for Computing Machinery

The current IRB process needs mending – that’s Institutional Review Board to you.  The groups that exist to protect the rights and welfare of research subjects.  They have the power to reject or approve any and every aspect of a research request.  The result – rather like the horrific Patent Process we are subject to in the US – is a humungous backlog of requests and a byzantine review and approval process.  With the best of intentions we have managed to create a system that is choking the life out of the very research it is meant to enable.  And heaven help you if your request cuts across more than one industry or IRB.

Looking forward to sharing more details as we progress on this area.  For now here is a list of the folks my colleagues and I were lucky enough to work with that day:

Ian Foster (Argonne National Laboratory and University Chicago, Mathematics and Computer Science)

Paul Goren
(University of Chicago Urban Education Institute, Education data research and policy)

Stacy Ehrlich (University of Chicago)

Connie Yowel (MacArthur Foundation, Public Education and Digital Media)

An-Me Chung (MacArthur Foundation)

Ken Koedinger (Carnegie Mellon, Computer Science, Learning Analytics and Cognitive Psychology)

Steve Midgley (Office of Education and Technology, Department of Education, Data interoperability and Online learning)

Helen Taylor Martin (UT Austin, College of Education, Linguistics, Psychology and Classroom Technologies)

Vivienne Ming (Socos, Cognitive Modelling and Predictive Analytics)

Roy Pea (Stanford School of Education, Learning Sciences and Education)

Armistead Sapp (SAS Institute, Software development, Data and Analytics)

Daniel Schwartz (Stanford School of Education, Instructional Methods, Teachable Agents, Cognitive Neuroscience)

John Palmer (Applied Minds, Computer Science and Mathematics)

Tony Hey (Microsoft Research, Technical Computing)

Gary West (CCSSO, Education Information Systems and Research)

Mark Luetzelschwab (Agilix, Education Technology & Systems Interoperability)

Alex Szalay (John Hopkins University)

Learning Maps – A new UX for Personalized Learning

August 4, 2011

One of my princpal projects is a collaboration with Brandt Redd, Danny Hillis and Danny’s incredible team at Applied Minds.  We’re calling it the “learning map”, a navigational tool for knowledge that learners would be able use to orient themselves to a particular subject and track their progress in it.  Coupled with a datastore and links to content respositories capable of serving up interventions in context, the map would be able to sustain both rudimentary learning pathways but also more advanced functions and services such as predictive analytics and recommendation engines.
The basic idea is that the Learning Map sits at the nexus of information about me as a learner, the content I am consuming, and the diagnostics I am taking to determine my level of understanding of the concepts I am expected to master.  Here’s a graphic that I hope will help you picture what I am describing.

There are 7 core use cases that we believe such a map can help us address:

  1. Teachers can plot their students’ locations on the map, see the resources available for them to use in class, and determine
    appropriate interventions to help the kids progress
  2. Learners can plot themselves on the map,understand what lies ahead, see the resources available to them in their
    current location, and plan their own progressions.  Ultimately skin their own maps, plot their own quests and share them with friends
  3. School and District leaders can plot student movement on the map and get a snapshot and trend projection of a body of
    learners, including what resources are being used by the learner population (and which are not)
  4. Curriculum developers and content producers can publish and locate their courses and resources in the map for discovery and use
  5. Application developers can publish and attach their applications to specific points on the map
  6. Investors can see resources currently available in the market and a status of their use (think a resource heatmap for devs and investors)
  7. Data miners and diagnostic service providers can write adaptive algorithms against the map that allow machine based diagnostics, predictive analytics and recommendation engines for learner interventions (think Bayesnet enablement  similar to predictive traffic flow in GPS)

All one requries to get started is a set of learning objectives in a machine-readable format.  The author of those objectives would also have the option at publication of describing the relationship between them thereby enabling a rendering of the progression from topic to topic.  This is but an initial assertion.  Evidential probability analysis would help true learning pathways and relationships between objectives emerge over time as more and more people lay down paths through the subject area.  If one wanted to get truly funky, one could leverage an arcane markup like PROWL to weight the relationships between objectives allowing for further differentiation and customization to an individual’s learning patterns.

So what might the rest of the recipe for a learning map look like?  Here’s my guestimate:

  1. A marked up set of learning objectives, such as the new Common Core standards for math and literacy.  XML, OWL or (PR)OWL are all viable as would RDFa if one wanted to roll like that
  2. A semantic schema to describe key structural elements that make up the standards and, by extension, the map, e.g. Domain, cluster, concept
  3. A machine-readable tagging of each structural element within the map.  Something like URI.  This would permit linking of data to data, and node to content resources and applications
  4. A way to describe demonstrations and activities required to progress along the standards, e.g. ‘Skill’, ‘Task’
  5. An API to bind the map to either a native or web-based UI, a datastore (ideally a large-scale tuplestore like Freebase) and various content registries such as Steve Midgley’s exciting Learning Registry project

Coupled with these basic ingredients we would also require some transactional web service capabilities to support the feedback loops and uses I listed earlier.  In rough increasing order of complexity those would include:

  • Some form of asynchronous or duplex communication between a datastore and the map so learners show up in
    their appropriate locations
  • A way for URI’d nodes in the map to link to URI’d resources located on 3rd party servers, like Google Scholar or Wiki (this is where the Learning Registry might hook in in a powerful way)
  • A way to broker services between nodes and resource repositories
  • A way to build probabilistic analytical engines underneath the map capable of leveraging the map’s tuplestore to plot, diagnose and serve up recommended learner progressions based on their particular performance and consumption patterns across a range of individual and collaborative activities.  Vivienne Ming and her team at SOCOS are an exciting example of the sort of services we’d like to see emerge in this space

So what might one of these maps actually look like.  Figure 2 below shows an example.  It was created by Larry Berger and Laurence Holt of Wireless Generation and provides an exciting sample visualization of a learning map for the Common Core Math Standards that could be built using the basic ingredients described above.  Larry and Laurence write: “Known as “the honeycomb,” this application would be interactive and display a student’s progress through the standards.  Each hexagon represents a single skill or concept, and groups of hexagons reflect the clusters of skills and concepts that together make up a standard. Drawing on the data infrastructure of the SLI, such a map could track a student’s progress, with cells turning from red to yellow to green as he mastered components of the standards. The slider on the left side of the screen would allow the student or his teacher to zoom in on the cells, which would display more granular information and links to aligned content and diagnostic assessments to help the student continue to move ahead. Individual student maps could roll up to classroom maps, classrooms could be aggregated to school maps and so on, up to the district and state levels.”

Figure 2: Visualization of the Common Core Learning Map

I am interpreting Larry and Laurence to be describing a visualization of raw XML in a native app or downloadable client.  It would not be hard to add to their list of features the URI links and a way to express sequencing between the hexagons.  One could imagine a service event being triggered either by onMouseover or when a student actually “shows up” in that hexagon, i.e. data informs the map of the student’s new location.  The one tricky part would be brokering the link between a URI describing a hexagon in the visualization from the app, over a firewall, through a service broker, through a proxy or two, over another firewall and into a publisher’s digital content server where a relevant resource is then retrieved.  Steve Midgely and his team are on of the groups working to tackle that problem, which is great, because its really tricky and Steve is really smart.

The visualization is exciting for the possibilities it represents and its intuitive UI.  However it is also limited by its medium.  For example, there is no reason why the same functionality described in the visualization could not also be accomplished using a combination of the APIs contained in the new HTML5, along with the design advances in CSS3 and performance gains we’re
seeing from a Java framwork like JQuery or Python.  Such an approach would also have advantages and afford additional options over a more traditional app approach, namely:

  1. Scalable Vector Graphics – no pixilation and therefore no detriment across device form factors. Key for mobile scenarios
  2. Capability of creating multiple visualizations (trees, roads, mazes) that leverage CSS classes and elements
  3. Ability to render those visualizations in zoom-able 2d or manipulable 3d using the SVG and webgl capabilities
  4. Ability for users to take the maps offline using a combination of the new App and Data Caches found in HTML 5
  5. Ability to run browser intensive background scripts using the Worker API in order to facilitate more advanced query, and
    script-intensive mapping or diagnostic services
  6. Ability to use the web socket API to set up full duplex connection between a back-end db, web or app server and the UI
    effectively conferring first-class status on the map and avoiding latency drivers like long-polling, and roundtrips, or the need for the client to initiate communications
  7. No procurement process to navigate, no sysadmin installation required

Based on my early inspection of HTML5 and its API set I believe we can build an open and extensible Learning Map Web app, and that it would be the sort of project that would lend itself well to the Open Community to sustain.  However, we would still need some solution to the earlier list of system capabilities required to support such an app, namely: integration with a student datastore so learners can be mapped; a way to link URIs across servers; a way to broker services between URIs; and finally, a way to build engines underneath the map capable of supporting adaptive tutoring and diagnostics.

Of these issues, the first could be dealt with by some form of federated datastore .  The last will be dealt with using a combination of datamining, Bayesian analytics and scalable machine learning algorithms like Apache Mahout, or with integrated approaches from commercial providers like SAS or IBM which can now couple Watson’s capabilities to its recently acquired SPSS program.

So that leaves URI linking and transaction services. To solve that problem one could take advantage of Steve’s Registry and its elegant NNTP-like approach.  Here’s an illustration of the Registry’s Transport Network:

There are some questions we need to answer before going with an HTML/script-based approach to produce an actual navigable map:

  • Validation of the assumptions I listed in this posting
  • A sense for when all top 7 browsers will support those assumptions on both desktop and mobile
  • A view on backward compatibility and what our fall-back option is for older browsers (almost certainly an app)
  • A better sense of dev resource requirements we would be incurring
  • The authoring tool required to edit or create new visualizations (more on that in a future post)
  • A way to express relations between concept nodes or clusters, some of them multivariate

I hope this paper has been easy to follow and I would greatly appreciate hearing your reaction to it.