Today I released stitch
into the
wild. If you haven’t yet, check out the examples
page to see an example of what stitch does,
and the Github repo for how to
install. I’m using this post to explain why I wrote stitch, and some
issues it tries to solve.
Why knitr / knitpy / stitch / RMarkdown?
Each of these tools or formats have the same high-level goal: produce reproducible, dynamic (to changes in the data) reports. They take some source document (typically markdown) that’s a mixture of text and code and convert it to a destination output (HTML, PDF, docx, etc.).
The main difference from something like pandoc, is that these tools actually execute the code and interweave the output of the code back into the document.
Reproducibility is something I care very deeply about. My workflow when writing a report is typically
- prototype in the notebook or IPython REPL (data cleaning, modeling, visualizing, repeat)
- rewrite and cleanup those prototypes in a
.py
file that produces one or more outputs (figure, table, parameter, etc.) - Write the prose contextualizing a figure or table in markdown
- Source output artifacts (figure or table) when converting the markdown to the final output
This was fine, but had a lot of overhead, and separated the generated report from the code itself (which is sometimes, but not always, what you want).
Stitch aims to make this a simpler process. You (just) write your code and results all in one file, and call
stitch input.md -o output.pdf
Why not Jupyter Notebooks?
A valid question, but I think misguided. I love the notebook, and I use
it every day for exploratory research. That said, there’s a continuum
between all-text reports, and all-code reports. For reports that have a
higher ratio of text:code
, I prefer writing in my comfortable
text-editor (yay spellcheck!) and using stitch / pandoc to compile the
document. For reports that have more code:text
, or that are very early
on in their lifecycle, I prefer notebooks. Use the right tool for the
job.
When writing my pandas ebook, I had to jump through hoops to get from notebook source to final output (epub or PDF) that looked OK. NBConvert was essential to that workflow, and I couldn’t have done without it. I hope that the stitch-based workflow is a bit smoother.
If a tool similar to podoc is developed, then we can have transparent conversion between text-files with executable blocks of code and notebooks. Living the dream.
Why python?
While RMarkdown / knitr are great (and way more usable than stitch at
this point), they’re essentially only for R. The support for other
languages (last I checked) is limited to passing a code chunk into the
python
command-line executable. All state is lost between code chunks.
Stitch supports any language that implements a Jupyter kernel, which is a lot.
Additionally, when RStudio introduced R
Notebooks, they did so
with their own file format, rather than adopting the Jupyter notebook
format. I assume that they were aware of the choice when going their own
way, and made it for the right reasons. But for these types of tasks
(things creating documents) I prefer language-agnostic tools where
possible. It’s certain that RMarkdown / knitr are better than stitch
right now for rendering .Rmd
files. It’s quite likely that they will
always be better at working with R than stitch; specialized tools
exist for a reason.
Misc.
Stitch was heavily inspired by Jan Schulz’s knitpy, so you might want to check that out and see if it fits your needs better. Thanks to Jan for giving guidance on difficulty areas he ran into when writing knitpy.
I wrote stitch in about three weeks of random nights and weekends I had free. I stole time that from family or maintaining pandas. Thanks to my wife and the pandas maintainers for picking up my slack.
The three week thing isn’t a boast. It’s a testament to the rich libraries already available. Stitch simply would not exist if we couldn’t reuse
- pandoc via pypandoc for parsing markdown and converting to the destination output (and for installing pandoc via conda-forge)
- Jupyter for providing kernels as execution contexts and a client for easily communicating with them.
- pandocfilters for wrapping code-chunk output
And of course RMarkdown, knitr, and knitpy for proving that a library like this is useful and giving a design that works.
Stitch is still extremely young. It could benefit from users trying it out, and letting me know what’s working and what isn’t. Please do give it a shot and let me know what you think.
© Tom Augspurger