Skip to content

Commit 30c26b5

Browse files
committed
Merge PR OCA#934 into 14.0
Signed-off-by alexis-via
2 parents 023bed2 + 1fce0dc commit 30c26b5

File tree

8 files changed

+28
-96
lines changed

8 files changed

+28
-96
lines changed

account_invoice_import_simple_pdf/README.rst

+2-13
Original file line numberDiff line numberDiff line change
@@ -79,10 +79,9 @@ The module supports 5 different extraction methods:
7979
1. `PyMuPDF <https://github.com/pymupdf/PyMuPDF>`_ which is a Python binding for `MuPDF <https://mupdf.com/>`_, a lightweight PDF toolkit/viewer/renderer published under the AGPL licence by the company `Artifex Software <https://artifex.com/>`_.
8080
#. `pdftotext python library <https://pypi.org/project/pdftotext/>`_, which is a python binding for the pdftotext tool.
8181
#. `pdftotext command line tool <https://en.wikipedia.org/wiki/Pdftotext>`_, which is based on `poppler <https://poppler.freedesktop.org/>`_, a PDF rendering library used by `xpdf <https://www.xpdfreader.com/>`_ and `Evince <https://wiki.gnome.org/Apps/Evince/FrequentlyAskedQuestions>`_ (the PDF reader of `Gnome <https://www.gnome.org/>`_).
82-
#. `pdfplumber <https://pypi.org/project/pdfplumber/>`_, which is a python library built on top the of the python library `pdfminer.six <https://pypi.org/project/pdfminer.six/>`_. pdfplumber is a pure-python solution, so it's very easy to install on all OSes.
8382
#. `pypdf <https://github.com/py-pdf/pypdf/>`_, which is one of the most common PDF lib for Python. pypdf is a pure-python solution, so it's very easy to install on all OSes.
8483

85-
PyMuPDF and pdftotext both give a very good text output. So far, I can't say which one is best. pdfplumber and pypdf often give lower-quality text output, but their advantage is that they are pure-Python librairies, so you will always be able to install it whatever your technical environnement is.
84+
PyMuPDF and pdftotext both give a very good text output. So far, I can't say which one is best. pypdf often gives lower-quality text output, but its advantage is that it is a pure-Python librairy, so you will always be able to install it whatever your technical environnement is.
8685

8786
You can choose one extraction method and only install the tools/libs for that method.
8887

@@ -123,15 +122,6 @@ To install **pdftotext command line**, run:
123122
124123
sudo apt install poppler-utils
125124
126-
Install pdfplumber
127-
~~~~~~~~~~~~~~~~~~
128-
129-
To install the **pdfplumber** python lib, run:
130-
131-
.. code::
132-
133-
sudo pip3 install --upgrade pdfplumber
134-
135125
Install pypdf
136126
~~~~~~~~~~~~~
137127

@@ -172,7 +162,7 @@ To force regex to version 2022.3.2, run:
172162
Configuration
173163
=============
174164

175-
By default, for the PDF to text conversion, the module tries the different methods in the order mentionned in the INSTALL section: it will first try to use **PyMuPDF**; if it fails (for example because the lib is not properly installed), then it will try to use the **pdftotext python lib**, if that one also fails, it will try to use **pdftotext command line** and, if it also fails, it will eventually try **pdfplumber**. If none of the 4 methods work, Odoo will display an error message.
165+
By default, for the PDF to text conversion, the module tries the different methods in the order mentionned in the INSTALL section: it will first try to use **PyMuPDF**; if it fails (for example because the lib is not properly installed), then it will try to use the **pdftotext python lib**, if that one also fails, it will try to use **pdftotext command line** and, if it also fails, it will eventually try **pypdf**. If none of the 4 methods work, Odoo will display an error message.
176166

177167
If you want to force Odoo to use a specific text extraction method, go to the menu *Configuration > Technical > Parameters > System Parameters* and create a new System Parameter:
178168

@@ -182,7 +172,6 @@ If you want to force Odoo to use a specific text extraction method, go to the me
182172
1. pymupdf
183173
#. pdftotext.lib
184174
#. pdftotext.cmd
185-
#. pdfplumber
186175
#. pypdf
187176

188177
In this configuration, Odoo will only use the selected text extraction method and, if it fails, it will display an error message.

account_invoice_import_simple_pdf/__manifest__.py

-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,6 @@
1414
"depends": ["account_invoice_import"],
1515
"external_dependencies": {
1616
"python": [
17-
"pdfplumber",
1817
"regex",
1918
"dateparser",
2019
"pypdf>=3.1.0",

account_invoice_import_simple_pdf/readme/CONFIGURE.rst

+1-2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
By default, for the PDF to text conversion, the module tries the different methods in the order mentionned in the INSTALL section: it will first try to use **PyMuPDF**; if it fails (for example because the lib is not properly installed), then it will try to use the **pdftotext python lib**, if that one also fails, it will try to use **pdftotext command line** and, if it also fails, it will eventually try **pdfplumber**. If none of the 4 methods work, Odoo will display an error message.
1+
By default, for the PDF to text conversion, the module tries the different methods in the order mentionned in the INSTALL section: it will first try to use **PyMuPDF**; if it fails (for example because the lib is not properly installed), then it will try to use the **pdftotext python lib**, if that one also fails, it will try to use **pdftotext command line** and, if it also fails, it will eventually try **pypdf**. If none of the 4 methods work, Odoo will display an error message.
22

33
If you want to force Odoo to use a specific text extraction method, go to the menu *Configuration > Technical > Parameters > System Parameters* and create a new System Parameter:
44

@@ -8,7 +8,6 @@ If you want to force Odoo to use a specific text extraction method, go to the me
88
1. pymupdf
99
#. pdftotext.lib
1010
#. pdftotext.cmd
11-
#. pdfplumber
1211
#. pypdf
1312

1413
In this configuration, Odoo will only use the selected text extraction method and, if it fails, it will display an error message.

account_invoice_import_simple_pdf/readme/INSTALL.rst

+7-23
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,9 @@ The module supports 5 different extraction methods:
55
1. `PyMuPDF <https://github.com/pymupdf/PyMuPDF>`_ which is a Python binding for `MuPDF <https://mupdf.com/>`_, a lightweight PDF toolkit/viewer/renderer published under the AGPL licence by the company `Artifex Software <https://artifex.com/>`_.
66
#. `pdftotext python library <https://pypi.org/project/pdftotext/>`_, which is a python binding for the pdftotext tool.
77
#. `pdftotext command line tool <https://en.wikipedia.org/wiki/Pdftotext>`_, which is based on `poppler <https://poppler.freedesktop.org/>`_, a PDF rendering library used by `xpdf <https://www.xpdfreader.com/>`_ and `Evince <https://wiki.gnome.org/Apps/Evince/FrequentlyAskedQuestions>`_ (the PDF reader of `Gnome <https://www.gnome.org/>`_).
8-
#. `pdfplumber <https://pypi.org/project/pdfplumber/>`_, which is a python library built on top the of the python library `pdfminer.six <https://pypi.org/project/pdfminer.six/>`_. pdfplumber is a pure-python solution, so it's very easy to install on all OSes.
98
#. `pypdf <https://github.com/py-pdf/pypdf/>`_, which is one of the most common PDF lib for Python. pypdf is a pure-python solution, so it's very easy to install on all OSes.
109

11-
PyMuPDF and pdftotext both give a very good text output. So far, I can't say which one is best. pdfplumber and pypdf often give lower-quality text output, but their advantage is that they are pure-Python librairies, so you will always be able to install it whatever your technical environnement is.
10+
PyMuPDF and pdftotext both give a very good text output. So far, I can't say which one is best. pypdf often gives lower-quality text output, but its advantage is that it is a pure-Python librairy, so you will always be able to install it whatever your technical environnement is.
1211

1312
You can choose one extraction method and only install the tools/libs for that method.
1413

@@ -19,7 +18,7 @@ Install it via pip:
1918

2019
.. code::
2120
22-
sudo pip3 install --upgrade pymupdf
21+
pip3 install --upgrade pymupdf
2322
2423
Beware that *PyMuPDF* is not a pure-python library: it uses MuPDF, which is written in C language. If a python wheel for your OS, CPU architecture and Python version is available on pypi (check the `list of PyMuPDF wheels <https://pypi.org/project/PyMuPDF/#files>`_ on pypi), it will install smoothly. Otherwize, the installation via pip will require MuPDF and all its development libs to compile the binding.
2524

@@ -36,7 +35,7 @@ and then install the lib via pip:
3635

3736
.. code::
3837
39-
sudo pip3 install --upgrade pdftotext
38+
pip3 install --upgrade pdftotext
4039
4140
On OSes other than Debian/Ubuntu, follow the instructions on the `project page <https://github.com/jalan/pdftotext>`_.
4241

@@ -49,23 +48,14 @@ To install **pdftotext command line**, run:
4948
5049
sudo apt install poppler-utils
5150
52-
Install pdfplumber
53-
~~~~~~~~~~~~~~~~~~
54-
55-
To install the **pdfplumber** python lib, run:
56-
57-
.. code::
58-
59-
sudo pip3 install --upgrade pdfplumber
60-
6151
Install pypdf
6252
~~~~~~~~~~~~~
6353

6454
To install the **pypdf** python lib, run:
6555

6656
.. code::
6757
68-
sudo pip3 install --upgrade pypdf
58+
pip3 install --upgrade pypdf
6959
7060
7161
Other requirements
@@ -80,17 +70,11 @@ The dateparser lib depends itself on regex. So you can install these Python libr
8070

8171
.. code::
8272
83-
sudo pip3 install --upgrade dateparser
84-
85-
The dateparser lib is not compatible with all regex lib versions. As of September 2022, the `version requirement <https://github.com/scrapinghub/dateparser/blob/master/setup.py#L30>`_ declared by dateparser for regex is **!=2019.02.19, !=2021.8.27, <2022.3.15**. So the latest version of regex which is compatible with dateparser is **2022.3.2**. To know the version of regex installed in your environment, run:
86-
87-
88-
.. code::
73+
pip3 install --upgrade dateparser
8974
90-
sudo pip3 show regex
75+
The dateparser lib is not compatible with all regex lib versions. As of February 2024, the `version requirement <https://github.com/scrapinghub/dateparser/blob/master/setup.py#L36>`_ declared by dateparser for regex is **!=2019.02.19, !=2021.8.27**. So the latest version of dateparser is currenly compatible with the latest version of regex. To know the version of regex installed in your environment, run:
9176

92-
To force regex to version 2022.3.2, run:
9377

9478
.. code::
9579
96-
sudo pip3 install regex==2022.3.2
80+
pip3 show regex

account_invoice_import_simple_pdf/static/description/index.html

+18-28
Original file line numberDiff line numberDiff line change
@@ -410,17 +410,16 @@ <h1 class="title">Account Invoice Import Simple PDF</h1>
410410
<li><a class="reference internal" href="#install-pymupdf" id="toc-entry-2">Install PyMuPDF</a></li>
411411
<li><a class="reference internal" href="#install-pdftotext-python-lib" id="toc-entry-3">Install pdftotext python lib</a></li>
412412
<li><a class="reference internal" href="#install-pdftotext-command-line" id="toc-entry-4">Install pdftotext command line</a></li>
413-
<li><a class="reference internal" href="#install-pdfplumber" id="toc-entry-5">Install pdfplumber</a></li>
414-
<li><a class="reference internal" href="#install-pypdf" id="toc-entry-6">Install pypdf</a></li>
415-
<li><a class="reference internal" href="#other-requirements" id="toc-entry-7">Other requirements</a></li>
413+
<li><a class="reference internal" href="#install-pypdf" id="toc-entry-5">Install pypdf</a></li>
414+
<li><a class="reference internal" href="#other-requirements" id="toc-entry-6">Other requirements</a></li>
416415
</ul>
417416
</li>
418-
<li><a class="reference internal" href="#configuration" id="toc-entry-8">Configuration</a></li>
419-
<li><a class="reference internal" href="#bug-tracker" id="toc-entry-9">Bug Tracker</a></li>
420-
<li><a class="reference internal" href="#credits" id="toc-entry-10">Credits</a><ul>
421-
<li><a class="reference internal" href="#authors" id="toc-entry-11">Authors</a></li>
422-
<li><a class="reference internal" href="#contributors" id="toc-entry-12">Contributors</a></li>
423-
<li><a class="reference internal" href="#maintainers" id="toc-entry-13">Maintainers</a></li>
417+
<li><a class="reference internal" href="#configuration" id="toc-entry-7">Configuration</a></li>
418+
<li><a class="reference internal" href="#bug-tracker" id="toc-entry-8">Bug Tracker</a></li>
419+
<li><a class="reference internal" href="#credits" id="toc-entry-9">Credits</a><ul>
420+
<li><a class="reference internal" href="#authors" id="toc-entry-10">Authors</a></li>
421+
<li><a class="reference internal" href="#contributors" id="toc-entry-11">Contributors</a></li>
422+
<li><a class="reference internal" href="#maintainers" id="toc-entry-12">Maintainers</a></li>
424423
</ul>
425424
</li>
426425
</ul>
@@ -433,10 +432,9 @@ <h1><a class="toc-backref" href="#toc-entry-1">Installation</a></h1>
433432
<li><a class="reference external" href="https://github.com/pymupdf/PyMuPDF">PyMuPDF</a> which is a Python binding for <a class="reference external" href="https://mupdf.com/">MuPDF</a>, a lightweight PDF toolkit/viewer/renderer published under the AGPL licence by the company <a class="reference external" href="https://artifex.com/">Artifex Software</a>.</li>
434433
<li><a class="reference external" href="https://pypi.org/project/pdftotext/">pdftotext python library</a>, which is a python binding for the pdftotext tool.</li>
435434
<li><a class="reference external" href="https://en.wikipedia.org/wiki/Pdftotext">pdftotext command line tool</a>, which is based on <a class="reference external" href="https://poppler.freedesktop.org/">poppler</a>, a PDF rendering library used by <a class="reference external" href="https://www.xpdfreader.com/">xpdf</a> and <a class="reference external" href="https://wiki.gnome.org/Apps/Evince/FrequentlyAskedQuestions">Evince</a> (the PDF reader of <a class="reference external" href="https://www.gnome.org/">Gnome</a>).</li>
436-
<li><a class="reference external" href="https://pypi.org/project/pdfplumber/">pdfplumber</a>, which is a python library built on top the of the python library <a class="reference external" href="https://pypi.org/project/pdfminer.six/">pdfminer.six</a>. pdfplumber is a pure-python solution, so it’s very easy to install on all OSes.</li>
437435
<li><a class="reference external" href="https://github.com/py-pdf/pypdf/">pypdf</a>, which is one of the most common PDF lib for Python. pypdf is a pure-python solution, so it’s very easy to install on all OSes.</li>
438436
</ol>
439-
<p>PyMuPDF and pdftotext both give a very good text output. So far, I can’t say which one is best. pdfplumber and pypdf often give lower-quality text output, but their advantage is that they are pure-Python librairies, so you will always be able to install it whatever your technical environnement is.</p>
437+
<p>PyMuPDF and pdftotext both give a very good text output. So far, I can’t say which one is best. pypdf often gives lower-quality text output, but its advantage is that it is a pure-Python librairy, so you will always be able to install it whatever your technical environnement is.</p>
440438
<p>You can choose one extraction method and only install the tools/libs for that method.</p>
441439
<div class="section" id="install-pymupdf">
442440
<h2><a class="toc-backref" href="#toc-entry-2">Install PyMuPDF</a></h2>
@@ -465,22 +463,15 @@ <h2><a class="toc-backref" href="#toc-entry-4">Install pdftotext command line</a
465463
sudo apt install poppler-utils
466464
</pre>
467465
</div>
468-
<div class="section" id="install-pdfplumber">
469-
<h2><a class="toc-backref" href="#toc-entry-5">Install pdfplumber</a></h2>
470-
<p>To install the <strong>pdfplumber</strong> python lib, run:</p>
471-
<pre class="code literal-block">
472-
sudo pip3 install --upgrade pdfplumber
473-
</pre>
474-
</div>
475466
<div class="section" id="install-pypdf">
476-
<h2><a class="toc-backref" href="#toc-entry-6">Install pypdf</a></h2>
467+
<h2><a class="toc-backref" href="#toc-entry-5">Install pypdf</a></h2>
477468
<p>To install the <strong>pypdf</strong> python lib, run:</p>
478469
<pre class="code literal-block">
479470
sudo pip3 install --upgrade pypdf
480471
</pre>
481472
</div>
482473
<div class="section" id="other-requirements">
483-
<h2><a class="toc-backref" href="#toc-entry-7">Other requirements</a></h2>
474+
<h2><a class="toc-backref" href="#toc-entry-6">Other requirements</a></h2>
484475
<p>This module also requires the following Python libraries:</p>
485476
<ul class="simple">
486477
<li><a class="reference external" href="https://pypi.org/project/regex/">regex</a> which is backward-compatible with the <em>re</em> module of the Python standard library, but has additional functionalities.</li>
@@ -501,16 +492,15 @@ <h2><a class="toc-backref" href="#toc-entry-7">Other requirements</a></h2>
501492
</div>
502493
</div>
503494
<div class="section" id="configuration">
504-
<h1><a class="toc-backref" href="#toc-entry-8">Configuration</a></h1>
505-
<p>By default, for the PDF to text conversion, the module tries the different methods in the order mentionned in the INSTALL section: it will first try to use <strong>PyMuPDF</strong>; if it fails (for example because the lib is not properly installed), then it will try to use the <strong>pdftotext python lib</strong>, if that one also fails, it will try to use <strong>pdftotext command line</strong> and, if it also fails, it will eventually try <strong>pdfplumber</strong>. If none of the 4 methods work, Odoo will display an error message.</p>
495+
<h1><a class="toc-backref" href="#toc-entry-7">Configuration</a></h1>
496+
<p>By default, for the PDF to text conversion, the module tries the different methods in the order mentionned in the INSTALL section: it will first try to use <strong>PyMuPDF</strong>; if it fails (for example because the lib is not properly installed), then it will try to use the <strong>pdftotext python lib</strong>, if that one also fails, it will try to use <strong>pdftotext command line</strong> and, if it also fails, it will eventually try <strong>pypdf</strong>. If none of the 4 methods work, Odoo will display an error message.</p>
506497
<p>If you want to force Odoo to use a specific text extraction method, go to the menu <em>Configuration &gt; Technical &gt; Parameters &gt; System Parameters</em> and create a new System Parameter:</p>
507498
<ul class="simple">
508499
<li><em>Key</em>: <strong>invoice_import_simple_pdf.pdf2txt</strong></li>
509500
<li><em>Value</em>: select the proper value for the method you want to use:<ol class="arabic">
510501
<li>pymupdf</li>
511502
<li>pdftotext.lib</li>
512503
<li>pdftotext.cmd</li>
513-
<li>pdfplumber</li>
514504
<li>pypdf</li>
515505
</ol>
516506
</li>
@@ -519,29 +509,29 @@ <h1><a class="toc-backref" href="#toc-entry-8">Configuration</a></h1>
519509
<p>You will find a full demonstration about how to configure each Vendor and import the PDF invoices in this <a class="reference external" href="https://www.youtube.com/watch?v=edsEuXVyEYE">screencast</a>.</p>
520510
</div>
521511
<div class="section" id="bug-tracker">
522-
<h1><a class="toc-backref" href="#toc-entry-9">Bug Tracker</a></h1>
512+
<h1><a class="toc-backref" href="#toc-entry-8">Bug Tracker</a></h1>
523513
<p>Bugs are tracked on <a class="reference external" href="https://github.com/OCA/edi/issues">GitHub Issues</a>.
524514
In case of trouble, please check there if your issue has already been reported.
525515
If you spotted it first, help us to smash it by providing a detailed and welcomed
526516
<a class="reference external" href="https://github.com/OCA/edi/issues/new?body=module:%20account_invoice_import_simple_pdf%0Aversion:%2014.0%0A%0A**Steps%20to%20reproduce**%0A-%20...%0A%0A**Current%20behavior**%0A%0A**Expected%20behavior**">feedback</a>.</p>
527517
<p>Do not contact contributors directly about support or help with technical issues.</p>
528518
</div>
529519
<div class="section" id="credits">
530-
<h1><a class="toc-backref" href="#toc-entry-10">Credits</a></h1>
520+
<h1><a class="toc-backref" href="#toc-entry-9">Credits</a></h1>
531521
<div class="section" id="authors">
532-
<h2><a class="toc-backref" href="#toc-entry-11">Authors</a></h2>
522+
<h2><a class="toc-backref" href="#toc-entry-10">Authors</a></h2>
533523
<ul class="simple">
534524
<li>Akretion</li>
535525
</ul>
536526
</div>
537527
<div class="section" id="contributors">
538-
<h2><a class="toc-backref" href="#toc-entry-12">Contributors</a></h2>
528+
<h2><a class="toc-backref" href="#toc-entry-11">Contributors</a></h2>
539529
<ul class="simple">
540530
<li>Alexis de Lattre &lt;<a class="reference external" href="mailto:alexis.delattre&#64;akretion.com">alexis.delattre&#64;akretion.com</a>&gt;</li>
541531
</ul>
542532
</div>
543533
<div class="section" id="maintainers">
544-
<h2><a class="toc-backref" href="#toc-entry-13">Maintainers</a></h2>
534+
<h2><a class="toc-backref" href="#toc-entry-12">Maintainers</a></h2>
545535
<p>This module is maintained by the OCA.</p>
546536
<a class="reference external image-reference" href="https://odoo-community.org"><img alt="Odoo Community Association" src="https://odoo-community.org/logo.png" /></a>
547537
<p>OCA, or the Odoo Community Association, is a nonprofit organization whose

account_invoice_import_simple_pdf/tests/test_invoice_import.py

-1
Original file line numberDiff line numberDiff line change
@@ -568,7 +568,6 @@ def _complete_import_specific_method(self, method):
568568
def test_specific_python_methods(self):
569569
# test only pure-pdf methods
570570
# because we are sure they work on the Github test environment
571-
self._complete_import_specific_method("pdfplumber")
572571
self._complete_import_specific_method("pypdf")
573572

574573
def test_test_mode(self):

0 commit comments

Comments
 (0)