import plotly.express as px
import seaborn as sns
penguins = sns.load_dataset("penguins")
px.scatter(penguins, x="body_mass_g", y="bill_length_mm", color="sex")or Effective DS Reporting
As a scientist,
you’re proffesional writer.
TAČR Research project, Profinit + MFF UK
Based on PURPOSE, AUDIENCE and RESOURCES
Relates to CRISP-DM phases Cross-industry standard process for data mining
YData profiling (docs) ydata_profiling data.csv report.htmlKnowledge report
Business report
Example report.
M<Ž (cze) while F<M (eng)And stay tuned, we’ll cover them in next session.
plotly package.
pgfSweave for TikZ, R2HTML, …; BUT only one at a time!)# example.Snw
\ documentclass[a4paper]{article}
\ begin{document}
<<echo=false, results=hide>>=
library(lattice)
library(xtable)
data(cats, package="MASS")
@
\section*{The Cats Data}
Consider the \texttt {cats} regression example from
Venables \& Ripley (1997). The data frame contains
measurements of heart and body weight
of \Sexpr{nrow(cats)} cats (\Sexpr{sum(cats$Sex=="F")}
female , \Sexpr{sum(cats$Sex=="M")} male).
A linear regression model of heart weight by sex and
gender can be fitted in R using the command
<<>>=
lm1 = lm(Hwt~Bwt*Sex, data=cats)
lm1
@
Tests for significance of the coefficients are shown in
Table ~\ref{tab:coef}, a scatter plot including the
regression lines is shown in Figure ~\ref{fig:cats}.
\SweaveOpts{echo=false}
<<results=tex>>=
xtable(
lm1,
caption="Linear regression model for cats data.",
label="tab:coef"
)
@
\begin{figure}
\centering
<<fig=true, width=12, height=6>>=
lset(col.whitebg())
print(xyplot(Hwt~Bwt|Sex, data=cats, type=c("p", "r")))
@
\caption{The cats data from package MASS.}
\label{fig:cats}
\end{figure}
\end{document}R)jupyter lab – IDE, wrapper around notebooks, extensions, etc.jupyter nbconvert – converts .ipynb to .html etc.pretty-jupyter – prettier outputs, dynamic text, ToC, tabsets, etc..ipynb is not a format to be shared with business / general public 🙂nbconvert (based on jinja package)
.html and .pdf.jupyter nbconvert --to html path/to/notebook.ipynb--no-input (anytime you can share the .ipynb)--execute to reevaluate cells (to ensure reproducibility)pretty-jupyter extension to get ToC, tabsets, prettier output, …Python, R, Julia and Observable.Pandoc).qmd
.Rmd and .ipynbQuarto output example
.qmd, .ipynb or .rmd):
quarto render path/to/notebook.qmd --to htmlquarto first. Then install this extension.ctrl + shift + k : knit (render) the documentctrl + alt + i : insert a code block---
title: "Clickers vs. Nonclickers"
author: "Dominik Matula"
date: "2024-10-04"
abstract: "Some text ..."
format:
html:
toc: true
embed-resources: true
css: styles.css
...
pdf:
...
---ctrl + space).format: revealjs in headerh2 becomes slidesh1 becomes section dividers::: {} sections to set up behaviour, e.g.:
::: {.callout-tip} gives you info box::: {.fragment} reveals things on next clickexplorations etc..qmd/.ipynb) and rendered files (.html)
GIT LFS to reduce repo size#tidytuesday or thos viewerYour task:
ydata-profiling.
coffee_x_bitterness, coffee_x_acidity)?prefer_overall column vs. age, gender, number_of_children, political_affiliation, …)species_code) overview – how many of them have been spotted? Which ones are rare/most common?latitude & longitude etc. or you can use other dataset and study are characteristics vs specific species occurances..)jupyter nbcovert
--no-input to turn off code cells.pretty-jupyter package & --template pj to make it better lookingYour task:
Quarto + VSCode on your computer. Install quarto extension as well.quarto document (.qmd)author, title, datectrl + shift + k)
format to revealjs in the report header (yaml). Rerender the document.
h2 sections become slides, h1 sections become section separators.