Education Week published an article on edtech review services in August 2013 that concluded with the following statement:
As the review sites evolve, observers are watching to see which will emerge as the “go to” resources for educators making decisions about technology tools.
“What they’re trying to do is a noble goal, because I do believe that teachers have an overwhelming number of choices in the marketplace,” said Rob Mancabelli, the co-founder and CEO of BrightBytes Inc., a San Francisco-based learning-analytics company.
“One of the things the review sites need to do better is to base the reviews on data,” he said—for instance, matching a student’s areas of deficiency with technology proven to address those shortcomings.
Mr. Chatterji, who is working on EduStar, agrees with the need for more data.
“We’re all chasing after the same North Star, which is to raise educational outcomes with technology,” he said. “In the ecosystem of [review-site] players, the gold standard is evidence.”
I totally agree. How is an educator or parent to know which math app they should purchase? Which will definitively lead to an advancement of their student’s or child’s knowledge?
After thinking obsessing over this problem non-stop for a few years, I’ve come to the conclusion that it is extremely difficult, if not impossible, to get this kind of information.
Which metrics should you use?
The student’s scores in the app? Not all app vendors will be willing or able to provide this data. Even if you have it, normalizing it for comparison against thousands of other apps is a monumental feat. How about the student’s standardized test scores? Those metrics are already questionable, as any educator will tell you.
How do you conduct a scientifically-controlled evaluation?
There are many factors that influence a student’s education, from teachers to parents to peers to instructional materials to socio-economic factors to genetics to, well, you get the point. Determining whether a specific app has made a discernible difference is a significant challenge. Some vendors already try this, but knowing how an app will perform in a classroom or home full of distractions versus a pristine laboratory are very different things.
How do you account for individual differences?
Some students will take to a game and learn a lot from it, others will just learn how to cheat the game. Some will excel with just a bit of hands-on guidance, others will need constant attention. How an app performs depends a lot on the learner. Does this mean the app is effective or ineffective? Unfortunately, one size does not fit all.
How do you account for rapidly changing products?
Let’s say you do find a way to conduct a scientifically controlled test in a classroom environment using a reliable metric. The likelihood that the developer will read your report and incorporate your findings is very high. Apps are constantly being improved and evolved. What does that mean to your research? It is out-of-date the moment a new version is released.
There are some fantastic and noble efforts underway to solve this challenge, though I think we are all a long ways off. That gold standard of data and evidence is monumentally difficult to get in a repeatable, reliable, personalized, and scalable way.
One solution involves using rubrics through which to evaluate all of these apps. Though the evaluation requires manual work from someone trained in pedagogical assessments, it is a reasonable proxy of quality, if not true efficacy. Many current edtech review services use some kind of rubric, with some differences in breadth and depth.
Another involves in-depth reviews written by an individual with a good understanding of the domain of the app. This is similar to the product reviews seen on popular electronics guides. Such reviews are limited to the expertise and biases of the reviewer, though they can at least provide information from someone who has used, poked, and prodded the app.
One of the weaker solutions is the use of ratings. Since they are relatively easy to get from reviewers, they harness the collective intelligence of the crowd. Ratings are also easy to understand at a glance and to use as a basic filter. However, they fail to provide any context. Some services get around this by offering multiple rating dimensions, but these can also suffer from the problem of averaging out very positive and very negative ratings.
These solutions sit on a spectrum with the wisdom of the crowds on one end and the wisdom of experts on the other.
If you believe the pedagogical value of an educational app could be obtained from the crowd, then try the Yelp model. If you believe the pedagogical value needs to be evaluated by experts, then try the Consumer Reports model. Edtech review services are currently employing strategies all over that spectrum because getting to a true gold standard with actual data and evidence is so difficult.
Will these solutions be enough? Will someone be able to crack the gold standard? I’m not sure it is possible, but I sincerely hope someone will.