You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: account_invoice_import_simple_pdf/README.rst
+2-13
Original file line number
Diff line number
Diff line change
@@ -79,10 +79,9 @@ The module supports 5 different extraction methods:
79
79
1. `PyMuPDF <https://github.com/pymupdf/PyMuPDF>`_ which is a Python binding for `MuPDF <https://mupdf.com/>`_, a lightweight PDF toolkit/viewer/renderer published under the AGPL licence by the company `Artifex Software <https://artifex.com/>`_.
80
80
#. `pdftotext python library <https://pypi.org/project/pdftotext/>`_, which is a python binding for the pdftotext tool.
81
81
#. `pdftotext command line tool <https://en.wikipedia.org/wiki/Pdftotext>`_, which is based on `poppler <https://poppler.freedesktop.org/>`_, a PDF rendering library used by `xpdf <https://www.xpdfreader.com/>`_ and `Evince <https://wiki.gnome.org/Apps/Evince/FrequentlyAskedQuestions>`_ (the PDF reader of `Gnome <https://www.gnome.org/>`_).
82
-
#. `pdfplumber <https://pypi.org/project/pdfplumber/>`_, which is a python library built on top the of the python library `pdfminer.six <https://pypi.org/project/pdfminer.six/>`_. pdfplumber is a pure-python solution, so it's very easy to install on all OSes.
83
82
#. `pypdf <https://github.com/py-pdf/pypdf/>`_, which is one of the most common PDF lib for Python. pypdf is a pure-python solution, so it's very easy to install on all OSes.
84
83
85
-
PyMuPDF and pdftotext both give a very good text output. So far, I can't say which one is best. pdfplumber and pypdf often give lower-quality text output, but their advantage is that they are pure-Python librairies, so you will always be able to install it whatever your technical environnement is.
84
+
PyMuPDF and pdftotext both give a very good text output. So far, I can't say which one is best. pypdf often gives lower-quality text output, but its advantage is that it is a pure-Python librairy, so you will always be able to install it whatever your technical environnement is.
86
85
87
86
You can choose one extraction method and only install the tools/libs for that method.
88
87
@@ -123,15 +122,6 @@ To install **pdftotext command line**, run:
123
122
124
123
sudo apt install poppler-utils
125
124
126
-
Install pdfplumber
127
-
~~~~~~~~~~~~~~~~~~
128
-
129
-
To install the **pdfplumber** python lib, run:
130
-
131
-
.. code::
132
-
133
-
sudo pip3 install --upgrade pdfplumber
134
-
135
125
Install pypdf
136
126
~~~~~~~~~~~~~
137
127
@@ -172,7 +162,7 @@ To force regex to version 2022.3.2, run:
172
162
Configuration
173
163
=============
174
164
175
-
By default, for the PDF to text conversion, the module tries the different methods in the order mentionned in the INSTALL section: it will first try to use **PyMuPDF**; if it fails (for example because the lib is not properly installed), then it will try to use the **pdftotext python lib**, if that one also fails, it will try to use **pdftotext command line** and, if it also fails, it will eventually try **pdfplumber**. If none of the 4 methods work, Odoo will display an error message.
165
+
By default, for the PDF to text conversion, the module tries the different methods in the order mentionned in the INSTALL section: it will first try to use **PyMuPDF**; if it fails (for example because the lib is not properly installed), then it will try to use the **pdftotext python lib**, if that one also fails, it will try to use **pdftotext command line** and, if it also fails, it will eventually try **pypdf**. If none of the 4 methods work, Odoo will display an error message.
176
166
177
167
If you want to force Odoo to use a specific text extraction method, go to the menu *Configuration > Technical > Parameters > System Parameters* and create a new System Parameter:
178
168
@@ -182,7 +172,6 @@ If you want to force Odoo to use a specific text extraction method, go to the me
182
172
1. pymupdf
183
173
#. pdftotext.lib
184
174
#. pdftotext.cmd
185
-
#. pdfplumber
186
175
#. pypdf
187
176
188
177
In this configuration, Odoo will only use the selected text extraction method and, if it fails, it will display an error message.
Copy file name to clipboardexpand all lines: account_invoice_import_simple_pdf/readme/CONFIGURE.rst
+1-2
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
By default, for the PDF to text conversion, the module tries the different methods in the order mentionned in the INSTALL section: it will first try to use **PyMuPDF**; if it fails (for example because the lib is not properly installed), then it will try to use the **pdftotext python lib**, if that one also fails, it will try to use **pdftotext command line** and, if it also fails, it will eventually try **pdfplumber**. If none of the 4 methods work, Odoo will display an error message.
1
+
By default, for the PDF to text conversion, the module tries the different methods in the order mentionned in the INSTALL section: it will first try to use **PyMuPDF**; if it fails (for example because the lib is not properly installed), then it will try to use the **pdftotext python lib**, if that one also fails, it will try to use **pdftotext command line** and, if it also fails, it will eventually try **pypdf**. If none of the 4 methods work, Odoo will display an error message.
2
2
3
3
If you want to force Odoo to use a specific text extraction method, go to the menu *Configuration > Technical > Parameters > System Parameters* and create a new System Parameter:
4
4
@@ -8,7 +8,6 @@ If you want to force Odoo to use a specific text extraction method, go to the me
8
8
1. pymupdf
9
9
#. pdftotext.lib
10
10
#. pdftotext.cmd
11
-
#. pdfplumber
12
11
#. pypdf
13
12
14
13
In this configuration, Odoo will only use the selected text extraction method and, if it fails, it will display an error message.
Copy file name to clipboardexpand all lines: account_invoice_import_simple_pdf/readme/INSTALL.rst
+7-23
Original file line number
Diff line number
Diff line change
@@ -5,10 +5,9 @@ The module supports 5 different extraction methods:
5
5
1. `PyMuPDF <https://github.com/pymupdf/PyMuPDF>`_ which is a Python binding for `MuPDF <https://mupdf.com/>`_, a lightweight PDF toolkit/viewer/renderer published under the AGPL licence by the company `Artifex Software <https://artifex.com/>`_.
6
6
#. `pdftotext python library <https://pypi.org/project/pdftotext/>`_, which is a python binding for the pdftotext tool.
7
7
#. `pdftotext command line tool <https://en.wikipedia.org/wiki/Pdftotext>`_, which is based on `poppler <https://poppler.freedesktop.org/>`_, a PDF rendering library used by `xpdf <https://www.xpdfreader.com/>`_ and `Evince <https://wiki.gnome.org/Apps/Evince/FrequentlyAskedQuestions>`_ (the PDF reader of `Gnome <https://www.gnome.org/>`_).
8
-
#. `pdfplumber <https://pypi.org/project/pdfplumber/>`_, which is a python library built on top the of the python library `pdfminer.six <https://pypi.org/project/pdfminer.six/>`_. pdfplumber is a pure-python solution, so it's very easy to install on all OSes.
9
8
#. `pypdf <https://github.com/py-pdf/pypdf/>`_, which is one of the most common PDF lib for Python. pypdf is a pure-python solution, so it's very easy to install on all OSes.
10
9
11
-
PyMuPDF and pdftotext both give a very good text output. So far, I can't say which one is best. pdfplumber and pypdf often give lower-quality text output, but their advantage is that they are pure-Python librairies, so you will always be able to install it whatever your technical environnement is.
10
+
PyMuPDF and pdftotext both give a very good text output. So far, I can't say which one is best. pypdf often gives lower-quality text output, but its advantage is that it is a pure-Python librairy, so you will always be able to install it whatever your technical environnement is.
12
11
13
12
You can choose one extraction method and only install the tools/libs for that method.
14
13
@@ -19,7 +18,7 @@ Install it via pip:
19
18
20
19
.. code::
21
20
22
-
sudo pip3 install --upgrade pymupdf
21
+
pip3 install --upgrade pymupdf
23
22
24
23
Beware that *PyMuPDF* is not a pure-python library: it uses MuPDF, which is written in C language. If a python wheel for your OS, CPU architecture and Python version is available on pypi (check the `list of PyMuPDF wheels <https://pypi.org/project/PyMuPDF/#files>`_ on pypi), it will install smoothly. Otherwize, the installation via pip will require MuPDF and all its development libs to compile the binding.
25
24
@@ -36,7 +35,7 @@ and then install the lib via pip:
36
35
37
36
.. code::
38
37
39
-
sudo pip3 install --upgrade pdftotext
38
+
pip3 install --upgrade pdftotext
40
39
41
40
On OSes other than Debian/Ubuntu, follow the instructions on the `project page <https://github.com/jalan/pdftotext>`_.
42
41
@@ -49,23 +48,14 @@ To install **pdftotext command line**, run:
49
48
50
49
sudo apt install poppler-utils
51
50
52
-
Install pdfplumber
53
-
~~~~~~~~~~~~~~~~~~
54
-
55
-
To install the **pdfplumber** python lib, run:
56
-
57
-
.. code::
58
-
59
-
sudo pip3 install --upgrade pdfplumber
60
-
61
51
Install pypdf
62
52
~~~~~~~~~~~~~
63
53
64
54
To install the **pypdf** python lib, run:
65
55
66
56
.. code::
67
57
68
-
sudo pip3 install --upgrade pypdf
58
+
pip3 install --upgrade pypdf
69
59
70
60
71
61
Other requirements
@@ -80,17 +70,11 @@ The dateparser lib depends itself on regex. So you can install these Python libr
80
70
81
71
.. code::
82
72
83
-
sudo pip3 install --upgrade dateparser
84
-
85
-
The dateparser lib is not compatible with all regex lib versions. As of September 2022, the `version requirement <https://github.com/scrapinghub/dateparser/blob/master/setup.py#L30>`_ declared by dateparser for regex is **!=2019.02.19, !=2021.8.27, <2022.3.15**. So the latest version of regex which is compatible with dateparser is **2022.3.2**. To know the version of regex installed in your environment, run:
86
-
87
-
88
-
.. code::
73
+
pip3 install --upgrade dateparser
89
74
90
-
sudo pip3 show regex
75
+
The dateparser lib is not compatible with all regex lib versions. As of February 2024, the `version requirement <https://github.com/scrapinghub/dateparser/blob/master/setup.py#L36>`_ declared by dateparser for regex is **!=2019.02.19, !=2021.8.27**. So the latest version of dateparser is currenly compatible with the latest version of regex. To know the version of regex installed in your environment, run:
<li><aclass="reference external" href="https://github.com/pymupdf/PyMuPDF">PyMuPDF</a> which is a Python binding for <aclass="reference external" href="https://mupdf.com/">MuPDF</a>, a lightweight PDF toolkit/viewer/renderer published under the AGPL licence by the company <aclass="reference external" href="https://artifex.com/">Artifex Software</a>.</li>
434
433
<li><aclass="reference external" href="https://pypi.org/project/pdftotext/">pdftotext python library</a>, which is a python binding for the pdftotext tool.</li>
435
434
<li><aclass="reference external" href="https://en.wikipedia.org/wiki/Pdftotext">pdftotext command line tool</a>, which is based on <aclass="reference external" href="https://poppler.freedesktop.org/">poppler</a>, a PDF rendering library used by <aclass="reference external" href="https://www.xpdfreader.com/">xpdf</a> and <aclass="reference external" href="https://wiki.gnome.org/Apps/Evince/FrequentlyAskedQuestions">Evince</a> (the PDF reader of <aclass="reference external" href="https://www.gnome.org/">Gnome</a>).</li>
436
-
<li><aclass="reference external" href="https://pypi.org/project/pdfplumber/">pdfplumber</a>, which is a python library built on top the of the python library <aclass="reference external" href="https://pypi.org/project/pdfminer.six/">pdfminer.six</a>. pdfplumber is a pure-python solution, so it’s very easy to install on all OSes.</li>
437
435
<li><aclass="reference external" href="https://github.com/py-pdf/pypdf/">pypdf</a>, which is one of the most common PDF lib for Python. pypdf is a pure-python solution, so it’s very easy to install on all OSes.</li>
438
436
</ol>
439
-
<p>PyMuPDF and pdftotext both give a very good text output. So far, I can’t say which one is best. pdfplumber and pypdf often give lower-quality text output, but their advantage is that they are pure-Python librairies, so you will always be able to install it whatever your technical environnement is.</p>
437
+
<p>PyMuPDF and pdftotext both give a very good text output. So far, I can’t say which one is best. pypdf often gives lower-quality text output, but its advantage is that it is a pure-Python librairy, so you will always be able to install it whatever your technical environnement is.</p>
440
438
<p>You can choose one extraction method and only install the tools/libs for that method.</p>
<p>This module also requires the following Python libraries:</p>
485
476
<ulclass="simple">
486
477
<li><aclass="reference external" href="https://pypi.org/project/regex/">regex</a> which is backward-compatible with the <em>re</em> module of the Python standard library, but has additional functionalities.</li>
<p>By default, for the PDF to text conversion, the module tries the different methods in the order mentionned in the INSTALL section: it will first try to use <strong>PyMuPDF</strong>; if it fails (for example because the lib is not properly installed), then it will try to use the <strong>pdftotext python lib</strong>, if that one also fails, it will try to use <strong>pdftotext command line</strong> and, if it also fails, it will eventually try <strong>pdfplumber</strong>. If none of the 4 methods work, Odoo will display an error message.</p>
<p>By default, for the PDF to text conversion, the module tries the different methods in the order mentionned in the INSTALL section: it will first try to use <strong>PyMuPDF</strong>; if it fails (for example because the lib is not properly installed), then it will try to use the <strong>pdftotext python lib</strong>, if that one also fails, it will try to use <strong>pdftotext command line</strong> and, if it also fails, it will eventually try <strong>pypdf</strong>. If none of the 4 methods work, Odoo will display an error message.</p>
506
497
<p>If you want to force Odoo to use a specific text extraction method, go to the menu <em>Configuration > Technical > Parameters > System Parameters</em> and create a new System Parameter:</p>
<p>You will find a full demonstration about how to configure each Vendor and import the PDF invoices in this <aclass="reference external" href="https://www.youtube.com/watch?v=edsEuXVyEYE">screencast</a>.</p>
0 commit comments