You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: CHANGELOG.rst
+4-3
Original file line number
Diff line number
Diff line change
@@ -3,9 +3,9 @@
3
3
4
4
**BACKWARDS-INCOMPATIBLE CHANGES:**
5
5
6
-
* :doc:`/scripts/csvclean` now writes its output to standard output and its errors to standard error, instead of to ``basename_out.csv`` and ``basename_err.csv`` files. Consequently, it no longer supports a :code:`--dry-run` flag to output summary information like ``No errors.``, ``42 errors logged to basename_err.csv`` or ``42 rows were joined/reduced to 24 rows after eliminating expected internal line breaks.``.
7
-
* :doc:`/scripts/csvclean` no longer fixes errors by default. Opt in using the :code:`--join-short-rows` option.
8
-
* :doc:`/scripts/csvclean` joins short rows using a newline by default, instead of a space.
6
+
* :doc:`/scripts/csvclean` now writes its output to standard output and its errors to standard error, instead of to ``basename_out.csv`` and ``basename_err.csv`` files. Consequently, it no longer supports a :code:`--dry-run` option to output summary information like ``No errors.``, ``42 errors logged to basename_err.csv`` or ``42 rows were joined/reduced to 24 rows after eliminating expected internal line breaks.``.
7
+
* :doc:`/scripts/csvclean` no longer fixes errors by default. Opt in to the original behavior using the :code:`--join-short-rows` option.
8
+
* :doc:`/scripts/csvclean` joins short rows using a newline by default, instead of a space. Restore the original behavior using the :code:`--separator " "` option.
9
9
10
10
Other changes:
11
11
@@ -15,6 +15,7 @@ Other changes:
15
15
* :code:`--separator`, to change the string with which to join short rows
16
16
* :code:`--fill-short-rows`, to fill short rows with the missing cells
17
17
* :code:`--fillvalue`, to change the value with which to fill short rows
18
+
* :code:`--empty-columns`, to error on empty columns
18
19
19
20
* feat: The :code:`--quoting` option accepts 4 (`csv.QUOTE_STRINGS <https://docs.python.org/3/library/csv.html#csv.QUOTE_STRINGS>`__) and 5 (`csv.QUOTE_NOTNULL <https://docs.python.org/3/library/csv.html#csv.QUOTE_NOTNULL>`__) on Python 3.12.
20
21
* feat: :doc:`/scripts/csvformat`: The :code:`--out-quoting` option accepts 4 (`csv.QUOTE_STRINGS <https://docs.python.org/3/library/csv.html#csv.QUOTE_STRINGS>`__) and 5 (`csv.QUOTE_NOTNULL <https://docs.python.org/3/library/csv.html#csv.QUOTE_NOTNULL>`__) on Python 3.12.
Copy file name to clipboardexpand all lines: docs/scripts/csvclean.rst
+27-2
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,8 @@ Description
7
7
8
8
Cleans a CSV file of common syntax errors:
9
9
10
-
- reports rows that have a different number of columns than the header row
10
+
- Reports rows that have a different number of columns than the header row.
11
+
- Reports columns that are empty, if the :code:`--empty-columns` option is set.
11
12
- If a CSV has unquoted cells that contain line breaks, like:
12
13
13
14
.. code-block:: none
@@ -103,13 +104,14 @@ All valid rows are written to standard output, and all error rows along with lin
103
104
--fillvalue FILLVALUE
104
105
The value with which to fill short rows. Defaults to
105
106
none.
107
+
--empty-columns Report empty columns as errors.
106
108
107
109
See also: :doc:`../common_arguments`.
108
110
109
111
Examples
110
112
========
111
113
112
-
Test a file with known bad rows:
114
+
Test a file with data rows that are shorter and longer than the header row:
113
115
114
116
.. code-block:: console
115
117
@@ -125,6 +127,29 @@ Test a file with known bad rows:
125
127
126
128
If any data rows are longer than the header row, you need to add columns manually: for example, by adding one or more delimiters (``,``) to the end of the header row. :code:`csvclean` can't do this, because it is designed to work with standard input, and correcting an error at the start of the CSV data based on an observation later in the CSV data would require holding all the CSV data in memory – which is not an option for large files.
0 commit comments