import plotly.express as px
import seaborn as sns
penguins = sns.load_dataset("penguins")
px.scatter(penguins, x="body_mass_g", y="bill_length_mm", color="sex")
or Effective DS Reporting
As a scientist,
you’re proffesional writer.
TAČR Research project, Profinit + MFF UK
Based on PURPOSE, AUDIENCE and RESOURCES
Relates to CRISP-DM phases Cross-industry standard process for data mining
YData profiling
(docs) ydata_profiling data.csv report.html
Knowledge report
Business report
Example report.
M<Ž
(cze) while F<M
(eng)And stay tuned, we’ll cover them in next session.
plotly
package.
pgfSweave
for TikZ, R2HTML
, …; BUT only one at a time!)# example.Snw
\ documentclass[a4paper]{article}
\ begin{document}
<<echo=false, results=hide>>=
library(lattice)
library(xtable)
data(cats, package="MASS")
@
\section*{The Cats Data}
Consider the \texttt {cats} regression example from
Venables \& Ripley (1997). The data frame contains
measurements of heart and body weight
of \Sexpr{nrow(cats)} cats (\Sexpr{sum(cats$Sex=="F")}
female , \Sexpr{sum(cats$Sex=="M")} male).
A linear regression model of heart weight by sex and
gender can be fitted in R using the command
<<>>=
lm1 = lm(Hwt~Bwt*Sex, data=cats)
lm1
@
Tests for significance of the coefficients are shown in
Table ~\ref{tab:coef}, a scatter plot including the
regression lines is shown in Figure ~\ref{fig:cats}.
\SweaveOpts{echo=false}
<<results=tex>>=
xtable(
lm1,
caption="Linear regression model for cats data.",
label="tab:coef"
)
@
\begin{figure}
\centering
<<fig=true, width=12, height=6>>=
lset(col.whitebg())
print(xyplot(Hwt~Bwt|Sex, data=cats, type=c("p", "r")))
@
\caption{The cats data from package MASS.}
\label{fig:cats}
\end{figure}
\end{document}
R
)jupyter lab
– IDE, wrapper around notebooks, extensions, etc.jupyter nbconvert
– converts .ipynb
to .html
etc.pretty-jupyter
– prettier outputs, dynamic text, ToC, tabsets, etc..ipynb
is not a format to be shared with business / general public 🙂nbconvert
(based on jinja
package)
.html
and .pdf
.jupyter nbconvert --to html path/to/notebook.ipynb
--no-input
(anytime you can share the .ipynb
)--execute
to reevaluate cells (to ensure reproducibility)pretty-jupyter
extension to get ToC, tabsets, prettier output, …Python
, R
, Julia
and Observable
.Pandoc
).qmd
.Rmd
and .ipynb
Quarto output example
.qmd
, .ipynb
or .rmd
):
quarto render path/to/notebook.qmd --to html
quarto
first. Then install this extension.ctrl + shift + k
: knit (render) the documentctrl + alt + i
: insert a code block---
title: "Clickers vs. Nonclickers"
author: "Dominik Matula"
date: "2024-10-04"
abstract: "Some text ..."
format:
html:
toc: true
embed-resources: true
css: styles.css
...
pdf:
...
---
ctrl + space
).format: revealjs
in headerh2
becomes slidesh1
becomes section dividers::: {}
sections to set up behaviour, e.g.:
::: {.callout-tip}
gives you info box::: {.fragment}
reveals things on next clickexplorations
etc..qmd
/.ipynb
) and rendered files (.html
)
GIT LFS
to reduce repo size#tidytuesday
or thos viewerYour task:
ydata-profiling
.
coffee_x_bitterness
, coffee_x_acidity
)?prefer_overall
column vs. age
, gender
, number_of_children
, political_affiliation
, …)species_code
) overview – how many of them have been spotted? Which ones are rare/most common?latitude
& longitude
etc. or you can use other dataset and study are characteristics vs specific species occurances..)jupyter nbcovert
--no-input
to turn off code cells.pretty-jupyter
package & --template pj
to make it better lookingYour task:
Quarto
+ VSCode
on your computer. Install quarto
extension as well.quarto document
(.qmd)author
, title
, date
ctrl + shift + k
)
format
to revealjs
in the report header (yaml
). Rerender the document.
h2
sections become slides, h1
sections become section separators.