TidyBlocks: Project History

TidyBlocks officially started in May 2019, but its roots go back to the creation of Scratch in the early 2000s, and beyond that to the programming language Logo designed in the late 1960s. Both use turtle graphics to make programming more accessible, but Scratch went further by allowing people to build programs by clicking blocks together. Its interface reduces cognitive load by making simple syntax mistakes impossible (you can't forget semi-colons if they aren't there) and makes the structure of the program much easier to see and understand.

But for loops and if/else statements that animate sprites are only one kind of computation, and judging by the number of students who wrote the AP exams in the US between 2008 and 2018, it's not the most widely used:

Subject 2008 2018 Change
Calculus 222,835 308,538 +38%
Computer Science 15,537 65,133 +319%
Statistics 108,284 222,501 +105%

The growth of Computer Science is impressive, but the absolute numbers for Statistics are still several times higher. While Scratch is beautiful and effective, it's not designed for doing or teaching data science: the online version doesn't even include a way to load or store data tables. We therefore wanted to create a Scratch-like environment capable of handling the kinds of questions that come up on the AP exam (and similar exams in other countries like A Levels in the UK). We also wanted to create something that would prepare users for full-strength data science tools in the way that Scratch prepares people for languages like Python and Java.

Maya Gans started work in May 2019 as an intern with RStudio. Over the next three months she built a fully-functional prototype using the same Blockly toolkit that underpins in Scratch. She learned a lot in a hurry, and wowed the crowd with her demo at rstudio::conf 2020.

Like most prototypes, though, that first version had a lot of technical debt. Blockly is a complex framework---in our opinion, more complex than it needs to be---and the code in Version 1 was very brittle as a result. To address this, Greg Wilson started rewriting TidyBlocks' internals in March 2020, and in July Justin Singh began work on a modern user interface using React.

And that brings us to where we are now. We have blocks that implement the core operations in the tidyverse, which we think is the most user-friendly framework for data science available. We're a bit light on statistical tests right now, but we can select, filter, mutate, and summarize tidy data, join tables in a couple of different ways, and create plots with Vega-Lite. Oh, and did we mention 100% branch coverage in our unit tests?

We believe we now have a solid foundation for further work. A new block can be added and tested in under fifteen minutes, so we're ready to start adding more statistical tests from the Simple Statistics package, working through examples from both old AP exams, and seeing how much of the excellent new book Data Science in Education Using R we can translate into blocks. It's going to be a lot of work, but having a user-friendly interface that can run on school and library computers without any installation requirements and that will serve as a bridge to full-strength data science tools is pretty exciting. If you'd like to help us, please get in touch.

— Greg Wilson / 2020-07-26