Overview

• I: Introduction
• II: Basics (Markup, Code Chunks, Graphs and Output)
• III: Advanced (Workflow, Graphic Devices, Caching, Hooks)

I: What are 'dynamic documents'?

source documents containing both program code and narratives

Yihui Xie

program code: e.g., for statistical analyses, making graphs

narratives: literate text, explaining the results or output from the program code

I: Why write dynamic documents?

• Program code and text are side-by-side: no more hunting for the code file or wondering why you ran analysis x
• Code output is automatically included in the final document: no tedious copying, no cut and paste errors
• If your data or code get updated, so does your final document
• It just makes sense: analyses and writing up results & findings go together

I: Why not write dynamic documents?

• You have to conduct analyses and graphs using code, not point-and-click
• You have to write your text and report in a markup language, such as $$\LaTeX$$ or Markdown
• Sometimes it is easier to edit the output from your code manually than programmatically (e.g., deleting a line of output, proper variable names, etc.)
• use a markup language
• work in R
• or make the same report frequently (e.g., monthly sales reports)
dynamic documents decrease errors and save time.

II: Markdown

knitr supports many markup languages, but to start, markdown is nice and simple

Headers (level 1, 2, 3): #, ##, ###

Bold, italics: **x**, _x_

Table:

| Column 1 | Column 2 | Column 3 | |:---------|---------:|:--------:| | left | right | center | | row 2 | text | text |

II: R

# graphics packages require(ggplot2) # boxplot qplot(cut, price, data = diamonds, geom = "boxplot") # linear regression summary(lm(price ~ carat, data = diamonds)) 

II: R & Markdown

R Markdown files: .Rmd

Write regular markdown for most the file

Put R code in chunks. Chunks start with “{r}” and end with “

Chunk options go at the start between the braces:
{r, options}

Next is a simple, but complete, R markdown file

II: R & Markdown

# Diamond Cut, Size, and Price Diamonds with an _ideal_ cut have a lower median price. {r} # graphics packages require(ggplot2) # boxplot qplot(cut, price, data = diamonds, geom = "boxplot")  One explanation for this unexpected finding is that ideal cut diamonds also tend to be **smaller**, and size is related to price. {r} summary(lm(price ~ carat, data = diamonds))  

II: R & Markdown Results

We run require(knitr); knit2html("example1.rmd") and the result is

The defaults are really easy to use, but for a more polished report, we often need to change them.

All the options available are documented here

To start, we can resize the image, add a caption, and hide the source code, using these options:
• fig.width, fig.height
• fig.cap
• echo=FALSE

We will also inline some output and show to to use MathJax for $$\LaTeX$$ equations

II: R & Markdown

# Diamond Cut, Size, and Price Diamonds with an _ideal_ cut have a lower median price. {r fig.height=3, fig.width=4, fig.cap="Boxplot of Diamond Prices by Cut", echo=FALSE} # graphics packages require(ggplot2, quietly=TRUE) # boxplot qplot(cut, price, data = diamonds, geom = "boxplot")  One explanation for this unexpected finding is that ideal cut diamonds also tend to be **smaller**. The mean is $$E(carat | cut = Ideal) = r mean(subset(diamonds, cut == "Ideal")carat)$$, and size is related to price. {r echo=FALSE} summary(lm(price ~ carat, data = diamonds))  

II: Customizing

There are many more knitr options, but often, to further refine reports, we need to customize the output from R itself.

We are also going to look at a new type of markup, $$\LaTeX$$. The convention for these files is to use the .rnw extension.

One common output from R is tables, whether snippets of data, descriptive information, or summaries of analyses. The xtable package converts lots of R model output, matrices, and data frames to a nice tabular output for $$\LaTeX$$ or HTML (suitable for HTML or markdown files).

II: R & $$\LaTeX$$

\documentclass{article} \usepackage{floatrow} \newfloatcommand{capbtabbox}{table}[][\FBwidth] \begin{document} << include=FALSE >>= require(ggplot2); require(xtable) # load packages opts_chunk$set(fig.path="figures/knitr_intro-ex3-") @ \section{Diamond Cut, Size, and Price} Diamonds with an \emph{ideal} cut have a lower median price. One explanation is that ideal cut diamonds are \emph{smaller}. The mean is$E(carat | cut = Ideal) = \Sexpr{mean(subset(diamonds, cut == "Ideal")$carat)}$, and size is related to price (shown in the table). \begin{figure}[!h] \begin{floatrow} \capbtabbox{ << echo=FALSE, results='asis' >>= print(xtable(coef(summary(lm(price ~ carat, data = diamonds)))[, -3]), floating = FALSE) @ }{ \caption{Regression predicting diamond price by size} } \ffigbox{ << echo=FALSE >>= qplot(cut, price, data = diamonds, geom = "boxplot") @ }{ \caption{Diamond price by quality of cut} } \end{floatrow} \end{figure} \end{document} 

II: Customizing

We saw a few new commands:

• The tags for R chunks in $$\LaTeX$$ are <<>>= and @, and for inline chunks, Sexpr{}
• include=FALSE which runs the R but does not show code or output
• results='asis' which includes R output directly without wrapping in highlighting or markup. Useful when the output is valid markup code (e.g., from xtable)

III: Workflow

As you move from simple, single file projects to ones with multiple files, an automated system can lead to some new issues

By default, knitr names output files based on the input files but with a different extension (e.g., .rnw becomes .tex)

Graphic plot files go into a relative subdirectory, figure/ by default

To have code in one place and have the output (e.g., for a production server in another) we can customize the output file and directory

III: Workflow

In code_setup.R options(width = 100, digits = 2, warn=-1, width.cutoff=140) opts_knit$set(base.dir="~/SkyDrive/web/Lab-Website") opts_chunk$set(warning=FALSE, message=FALSE, echo=FALSE, width.cutoff=140) suppressPackageStartupMessages(require(rCharts, quietly=TRUE)) 
In my .Rhtml files source("code_setup.R") if (MO6LOCK) { opts_chunk$set(fig.path="figures/outcome6-") } else { opts_chunk$set(fig.path="figures/outcome-") } header("Outcome") 

Run via knit("file.Rhtml", "path/to/go/file.html")

III: Workflow

Watch out for files overwriting themselves: unnamed chunks have the same names between files and default to going to the same figure subdirectory, use fig.path option to set a custom one, put each file in a separate directory, or name chunks (well)

If you will be moving files, watch out for absolute vs. relative file paths. If you are putting a presentation online but do not have your own server, you can upload images and link the URLs, in which case you want an absolute path

In the default setup, any change to the document requires recompiling: whether the change is in data, code, or narrative

Given you are mixing different types of code (R, some markup language, etc.), nice to have a good editor. RStudio is pretty awesome, I use Emacs + ESS

Version control can help recover changes if you overwrite something you want back, and help you keep track of your progress. I like git, and GitHub is free and helps make the process easy

III: Caching

For slow computations, rerunning everything when only one code chunk changed, is tedious. Even worse, you have to rerun if you find a typo in your narrative

You can cache code chunks in knitr setting the cache.path and the flag cache=TRUE in each chunk you want cached

Caching creates a database of the R objects, saved graphs, and text output from a chunk, as well as an md5 hash of the chunk.
As long as the md5 is the same, the chunk is not rerun, which brings up...

Dependencies: if a code chunk depends on another, you must include dependson or try out knitrs auto dependency based on global variables

III: Caching

You can specify code chunks that are dependencies using positive integers (e.g., dependson = c(1, 4) for chunks 1 and 4) or negative integers for previous chunks (e.g., dependson = c(-1, -2) for previous two chunks

Caching not only saves objects, but also the text output and graphics, but watch out for things like loading packages that are needed later, or setting global chunk options

If you use chunk referencing by reusing the same name, you cannot cache both chunks, instead use a new name and use ref.label to reference the other chunk

cache.extra is a great way to cache extra information, such as the version of R or a package, the random seed, etc.

III: Graphics Engines

By default, knitr will pick the graphics device to use by the file type. For example, png for HTML files, PDF for $$\LaTeX$$.

Although knitr has default graphics devices, you can use almost any graphics device in R in knitr. Common examples are: pdf, ps, png, bmp, svg.

Finally, you can specify multiple devices (e.g., dev=c('pdf', 'png')) so that for example both png and pdf image files were made for each graph. This is useful for R markdown, that you may want to process to HTML or use pandoc to process to PDF.

III: Custom Hooks

knitr has a default set of functions that control how code in chunks is processed, called hooks, these include functions that change what happens when an option is set

For advanced users or special cases, you may want to write custom hooks. Writing your own hooks allows you to control what happens before and after a chunk is processed.

Custom hooks can be used by setting additional options to the code chunk. Combined, this allows flexible customization, such as a simple option to set margins, or make a grid layout for graphics, setup animations, etc.

You can also set output hooks, to customize the output from R`, such as how warnings and error messages are formatted.

/

#