You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<li><ahref="/2024/02/19/extracting-data-from-a-small-csv-file-with-haskell">Extracting data from a small CSV file with Haskell</a></li>
89
-
<li>Extracting data from a small CSV file with Python</li>
89
+
<li><ahref="/2024/03/18/extracting-data-from-a-small-csv-file-with-python">Extracting data from a small CSV file with Python</a></li>
90
90
</ul>
91
91
<p>
92
92
For this small task, I don't think that there's a clear winner. I still like my Haskell code the best, but I'm sure someone better at Python could write a much cleaner script. I also have to admit that <ahref="https://matplotlib.org/">Matplotlib</a> makes it a breeze to produce nice-looking plots with Python, whereas I don't even know where to start with that with Haskell.
Copy file name to clipboardexpand all lines: _posts/2024-03-18-extracting-data-from-a-small-csv-file-with-python.html
+5-5
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
layout: post
3
3
title: "Extracting data from a small CSV file with Python"
4
4
description: "My inept adventures with a dynamically typed language."
5
-
date: 2024-02-02 20:29 UTC
5
+
date: 2024-03-18 08:36 UTC
6
6
tags: [Languages]
7
7
image: "/content/binary/sum-pmf-plot.png"
8
8
image_alt: "Bar chart of the sum-of-grades PMF."
@@ -14,7 +14,7 @@
14
14
<em>{{ page.description }}</em>
15
15
</p>
16
16
<p>
17
-
This article is the third in <ahref="">a small series about ad-hoc programming in two languages</a>. In <ahref="">the previous article</a> you saw how I originally solved a small data extraction and analysis problem with <ahref="https://www.haskell.org/">Haskell</a>, even though it was strongly implied that <ahref="https://www.python.org/">Python</a> was the language for the job.
17
+
This article is the third in <ahref="/2024/02/05/statically-and-dynamically-typed-scripts">a small series about ad-hoc programming in two languages</a>. In <ahref="/2024/02/19/extracting-data-from-a-small-csv-file-with-haskell">the previous article</a> you saw how I originally solved a small data extraction and analysis problem with <ahref="https://www.haskell.org/">Haskell</a>, even though it was strongly implied that <ahref="https://www.python.org/">Python</a> was the language for the job.
18
18
</p>
19
19
<p>
20
20
Months after having solved the problem I'd learned a bit more Python, so I decided to return to it and do it again in Python as an exercise. In this article, I'll briefly describe what I did.
In other Python code that I've written, I've been a heavy user of <ahref="https://numpy.org/">NumPy</a>, and while I several times added it to my imports, I never needed it for this task.That was a bit surprising, but I've only done Python programming for a year, and I still don't have a good feel for the ecosystem.
43
+
In other Python code that I've written, I've been a heavy user of <ahref="https://numpy.org/">NumPy</a>, and while I several times added it to my imports, I never needed it for this task.That was a bit surprising, but I've only done Python programming for a year, and I still don't have a good feel for the ecosystem.
44
44
</p>
45
45
<p>
46
46
The above code snippet also demonstrates how easy it is to slice a <em>dataframe</em> into columns: <code>grades</code> contains all the values in the (zero-indexed) second column, and <code>experiences</code> likewise the third column.
Notice that <code>combinations</code> doesn't list <code>('o', 'f')</code>, since (apparently) it doesn't consider ordering important. That's more in line with the <ahref="https://en.wikipedia.org/wiki/Binomial_coefficient">binomial coefficient</a>, whereas <ahref="">my Haskell code</a> considers a tuple like <code>('f', 'o')</code> to be distinct from <code>('o', 'f')</code>. This is completely consistent with how Haskell works, but means that all the counts I arrived at with Haskell are double what they are in this article. Ultimately, <em>6/1406</em> is equal to <em>3/703</em>, so the probabilities are the same. I'll try to call out this factor-of-two difference whenever it occurs.
59
+
Notice that <code>combinations</code> doesn't list <code>('o', 'f')</code>, since (apparently) it doesn't consider ordering important. That's more in line with the <ahref="https://en.wikipedia.org/wiki/Binomial_coefficient">binomial coefficient</a>, whereas <ahref="/2024/02/19/extracting-data-from-a-small-csv-file-with-haskell">my Haskell code</a> considers a tuple like <code>('f', 'o')</code> to be distinct from <code>('o', 'f')</code>. This is completely consistent with how Haskell works, but means that all the counts I arrived at with Haskell are double what they are in this article. Ultimately, <em>6/1406</em> is equal to <em>3/703</em>, so the probabilities are the same. I'll try to call out this factor-of-two difference whenever it occurs.
60
60
</p>
61
61
<p>
62
62
A <code>Counter</code> object counts the number of occurrences of each value, so reading, picking combinations without replacement and adding them together is just two lines of code, and one more to print them:
The bar chart has the same style as before, but obviously displays different data. See the bar chart in the <ahref="">previous article</a> for the Excel-based rendition of that data.
164
+
The bar chart has the same style as before, but obviously displays different data. See the bar chart in the <ahref="/2024/02/19/extracting-data-from-a-small-csv-file-with-haskell">previous article</a> for the Excel-based rendition of that data.
0 commit comments