NYR Conference 2023
Postdoctoral Research and Teaching Fellow
UBC, MDS-Vancouver
Data Science Educator,
Posit, PBC
Author, Pandas for Everyone
---
title: "example-analysis"
author: "Daniel Chen"
output: html_document
---
```{r setup, include=FALSE}
library(tidyverse)
library(readxl)
library(writexl)
```
## Load Data
Retrospective Cohort Study of the Effects of
Donor KIR genotype on the reactivation of cytomegalovirus (CMV)
after myeloablative allogeneic hematopoietic stem cell transplant.
```{r}
cmv <- read_excel("data/cmv.xlsx")
head(cmv)
```
.Rmd
with {rmarkdown}
Demo file: example-analysis.Rmd
Render Command:
Specify output file (and location):
.Rmd
with quarto
Demo file: example-analysis.Rmd
Render Command:
Specify output file:
# output folders only work with quarto projects
touch _quarto.yml
quarto render example-analysis.Rmd \
--toc \
--output output/020-example-analysis-rmd-qmd.html
quarto
is command line tool!Tiffany Timbers DSCI 310: Reproducible and trustworthy workflows for data science: https://ubc-dsci.github.io/dsci-310-student/
RMarkdown YAML
RMarkdown and Quarto chunk options:
Demo file: example-analysis.qmd
Render Command:
Specify output file:
nbgrader
for course assignment creation + grading{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "4a9a7246-de20-4aac-945a-b8f0e7db0ac6",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import plotnine as p9\n",
"from plotnine import ggplot, aes, geom_histogram\n",
"import statsmodels.formula.api as smf"
]
},
{
"cell_type": "markdown",
"id": "8f8205a7-a172-492a-bb22-e24bc1fc7ce2",
"metadata": {},
"source": [
"## Load Data"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a0fe3045-4d26-4dba-b673-5f95ee3d635c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ID</th>\n",
" <th>age</th>\n",
" <th>prior.radiation</th>\n",
" <th>aKIRs</th>\n",
" <th>cmv</th>\n",
" <th>donor_negative</th>\n",
" <th>donor_positive</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>61</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>recipient_positive</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>62</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>recipient_negative</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>63</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>recipient_positive</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>33</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>recipient_positive</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>54</td>\n",
" <td>0</td>\n",
" <td>6</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>recipient_positive</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ID age prior.radiation aKIRs cmv donor_negative \\\n",
"0 1 61 0 1 1 recipient_positive \n",
"1 2 62 1 5 0 recipient_negative \n",
"2 3 63 0 3 0 NaN \n",
"3 4 33 1 2 0 recipient_positive \n",
"4 5 54 0 6 0 NaN \n",
"\n",
" donor_positive \n",
"0 NaN \n",
"1 NaN \n",
"2 recipient_positive \n",
"3 NaN \n",
"4 recipient_positive "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cmv = pd.read_excel(\"data/cmv.xlsx\")\n",
"cmv.head()"
]
},
VSCode
Jupyter Lab
IRKernel
installed: https://github.com/IRkernel/IRkernel.ipynb
with nbconvert
example-analysis-python.ipynb
example-analysis-r.ipynb
Python Kernel:
jupyter nbconvert \
--to html \
--output output/040-example-analysis-python-jupyter.html \
--execute example-analysis-python.ipynb
R Kernel:
jupyter nbconvert \
--to html \
--output output/050-example-analysis-r-jupyter.html \
--execute example-analysis-r.ipynb
(Hint: they’re the same command)
To make your version control diffing easier, you may want to clear the output from the notebook JSON file.
In nbconvert 6.0+
, you can use--clear-output --inplace
:
jupyter nbconvert --clear-output --inplace example-analysis-python.ipynb
jupyter nbconvert --clear-output --inplace example-analysis-r.ipynb
Or use the --to notebook
argument if you want to preserve a rendered notebook
.ipynb
with quarto
Takes whatever is in the notebook (no additional execution) and rendered (to html by default)
Use --execute
to execute the cells and render
.ipynb
with quarto
Python Kernel:
quarto render example-analysis-python.ipynb \
--to html \
--execute \
--toc \
--output-dir output \
--output 060-example-analysis-python-ipynb.html
R Kernel:
From a Jupyter notebook with code output:
example-analysis-python-qmd_meta.ipynb
example-analysis-python-qmd_meta.qmd
Using a notebook with existing output:
You can add quarto #|
metadata comments to a cell, and use jupyter output directly in a quarto document
#| label: fig-age_hist
#| fig-cap: >
#| A histogram of the ages in our Cytomegalovirus dataset
ggplot(cmv_tidy, aes(x="age")) + geom_histogram()
Use a quarto shortcode:
Render the example:
jupytext
https://jupytext.readthedocs.io/
Rmd
-> qmd
ipynb
-> qmd
quarto convert
@chendaniely
Daniel Chen. @chendaniely. Repo/Slides: https://github.com/chendaniely/rstatsnyc-2023-quarto