-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.qmd
95 lines (78 loc) · 4.66 KB
/
index.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
# Welcome! {.unnumbered}
## How using a few ideas from software engineering can help data scientists, analysts and researchers write reliable code
*This is the Python edition of the book "Building reproducible analytical
pipelines", and it's a work-in-progress. If you're looking for the R version,
visit [this link](https://www.raps-with-r.dev).*
::: {.content-hidden when-format="pdf"}
<figure>
<a href = "https://leanpub.com/raps-with-python">
<img style="float: right; clear: right" height="200" src="https://d2sofvawe08yqg.cloudfront.net/raps-with-r/s_hero2x?1676902653" alt="You can buy a copy on Leanpub."></img>
</a>
</figure>
:::
Data scientists, statisticians, analysts, researchers, and many other
professionals write a lot of code.
Not only do they write a lot of code, but they must also read and review a lot
of code as well. They either work in teams and need to review each other’s code,
or need to be able to reproduce results from past projects, be it for peer
review or auditing purposes. And yet, they never, or very rarely, get taught the
tools and techniques that would make the process of writing, collaborating,
reviewing and reproducing projects possible.
Which is truly unfortunate because software engineers face the same challenges
and solved them decades ago.
The aim of this book is to teach you how to use some of the best practices from
software engineering and DevOps to make your projects robust, reliable and
reproducible. It doesn’t matter if you work alone, in a small or in a big team.
It doesn’t matter if your work gets (peer-)reviewed or audited: the techniques
presented in this book will make your projects more reliable and save you a lot
of frustration!
As someone whose primary job is analysing data, you might think that you are not
a developer. It seems as if developers are these genius types that write
extremely high-quality code and create these super useful packages. The truth is
that you are a developer as well. It’s just that your focus is on writing code
for your purposes to get your analyses going instead of writing code for others.
Or at least, that’s what you think. Because in others, your team-mates are
included. Reviewers and auditors are included. Any people that will read your
code are included, and there will be people that will read your code. At the
very least future you will read your code. By learning how to set up projects
and write code in a way that future you will understand and not want to murder
you, you will actually work towards improving the quality of your work,
naturally.
The book can be read for free on
[https://b-rodrigues.github.io/raps_with_py/](https://b-rodrigues.github.io/raps_with_py/)
and you'll be able buy a DRM-free Epub or PDF on
[Leanpub](https://leanpub.com/)^[https://leanpub.com/] once there's more
content.
This is the *Python edition* of my book titled [Building reproducible analytical
pipelines with R](https://raps-with-r.dev)^[https://raps-with-r.dev]. This means
that a lot of text is copied over, but the all of the code and concepts are
completely adapted to the Python programming language. This book is also
shorter than the R version. Here's the topics that I will cover:
- Dependency management with `pipenv`;
- Some thoughts on functional programming with Python;
- Unit and assertive testing;
- Build automation with `ploomber`;
- Literate programming with Quarto;
- Reproducible environments with Docker;
- Continuous integration and delivery.
While this is not a book for beginners (you really should be familiar with
Python before reading this), I will not assume that you have any knowledge of
the tools discussed. But be warned, this book will require you to take the time
to read it, and then type on your computer. Type *a lot*.
I hope that you will enjoy reading this book and applying the ideas in your
day-to-day, ideas which hopefully should improve the reliability, traceability
and reproducibility of your code. You can read this book for free on
TO UPDATE
::: {.content-hidden when-format="pdf"}
You can also buy a physical copy of the book on [Amazon](https://www.amazon.com/dp/B0C87H6MGF/).
:::
If you want to get to know me better, read my
[bio](https://www.brodrigues.co/about/me/)^[https://www.brodrigues.co/about/me/].
::: {.content-hidden when-format="pdf"}
You'll also be able to buy a physical copy of the book on Amazon once it's done. In the
meantime, you could buy the [R edition](https://www.amazon.com/dp/B0C87H6MGF/).
If you find this book useful, don’t hesitate to let me know or leave a rating on
Amazon!
:::
You can submit issues, PRs and ask questions on the book’s
[Github repository](https://github.com/b-rodrigues/raps_with_py)^[https://github.com/b-rodrigues/raps_with_py].