Merge PR OCA#934 into 14.0

OCA-git-bot · OCA-git-bot · commit 30c26b50a4ef · 2024-03-14T08:53:20.000Z
Signed-off-by alexis-via
diff --git a/account_invoice_import_simple_pdf/README.rst b/account_invoice_import_simple_pdf/README.rst
@@ -79,10 +79,9 @@ The module supports 5 different extraction methods:
 1. `PyMuPDF <https://github.com/pymupdf/PyMuPDF>`_ which is a Python binding for `MuPDF <https://mupdf.com/>`_, a lightweight PDF toolkit/viewer/renderer published under the AGPL licence by the company `Artifex Software <https://artifex.com/>`_.
 #. `pdftotext python library <https://pypi.org/project/pdftotext/>`_, which is a python binding for the pdftotext tool.
 #. `pdftotext command line tool <https://en.wikipedia.org/wiki/Pdftotext>`_, which is based on `poppler <https://poppler.freedesktop.org/>`_, a PDF rendering library used by `xpdf <https://www.xpdfreader.com/>`_ and `Evince <https://wiki.gnome.org/Apps/Evince/FrequentlyAskedQuestions>`_ (the PDF reader of `Gnome <https://www.gnome.org/>`_).
-#. `pdfplumber <https://pypi.org/project/pdfplumber/>`_, which is a python library built on top the of the python library `pdfminer.six <https://pypi.org/project/pdfminer.six/>`_. pdfplumber is a pure-python solution, so it's very easy to install on all OSes.
 #. `pypdf <https://github.com/py-pdf/pypdf/>`_, which is one of the most common PDF lib for Python. pypdf is a pure-python solution, so it's very easy to install on all OSes.
 
-PyMuPDF and pdftotext both give a very good text output. So far, I can't say which one is best. pdfplumber and pypdf often give lower-quality text output, but their advantage is that they are pure-Python librairies, so you will always be able to install it whatever your technical environnement is.
+PyMuPDF and pdftotext both give a very good text output. So far, I can't say which one is best. pypdf often gives lower-quality text output, but its advantage is that it is a pure-Python librairy, so you will always be able to install it whatever your technical environnement is.
 
 You can choose one extraction method and only install the tools/libs for that method.
 
@@ -123,15 +122,6 @@ To install **pdftotext command line**, run:
 
   sudo apt install poppler-utils
 
-Install pdfplumber
-~~~~~~~~~~~~~~~~~~
-
-To install the **pdfplumber** python lib, run:
-
-.. code::
-
-  sudo pip3 install --upgrade pdfplumber
-
 Install pypdf
 ~~~~~~~~~~~~~
 
@@ -172,7 +162,7 @@ To force regex to version 2022.3.2, run:
 Configuration
 =============
 
-By default, for the PDF to text conversion, the module tries the different methods in the order mentionned in the INSTALL section: it will first try to use **PyMuPDF**; if it fails (for example because the lib is not properly installed), then it will try to use the **pdftotext python lib**, if that one also fails, it will try to use **pdftotext command line** and, if it also fails, it will eventually try **pdfplumber**. If none of the 4 methods work, Odoo will display an error message.
+By default, for the PDF to text conversion, the module tries the different methods in the order mentionned in the INSTALL section: it will first try to use **PyMuPDF**; if it fails (for example because the lib is not properly installed), then it will try to use the **pdftotext python lib**, if that one also fails, it will try to use **pdftotext command line** and, if it also fails, it will eventually try **pypdf**. If none of the 4 methods work, Odoo will display an error message.
 
 If you want to force Odoo to use a specific text extraction method, go to the menu *Configuration > Technical > Parameters > System Parameters* and create a new System Parameter:
 
@@ -182,7 +172,6 @@ If you want to force Odoo to use a specific text extraction method, go to the me
   1. pymupdf
   #. pdftotext.lib
   #. pdftotext.cmd
-  #. pdfplumber
   #. pypdf
 
 In this configuration, Odoo will only use the selected text extraction method and, if it fails, it will display an error message.
diff --git a/account_invoice_import_simple_pdf/__manifest__.py b/account_invoice_import_simple_pdf/__manifest__.py
@@ -14,7 +14,6 @@
     "depends": ["account_invoice_import"],
     "external_dependencies": {
         "python": [
-            "pdfplumber",
             "regex",
             "dateparser",
             "pypdf>=3.1.0",
diff --git a/account_invoice_import_simple_pdf/readme/CONFIGURE.rst b/account_invoice_import_simple_pdf/readme/CONFIGURE.rst
@@ -1,4 +1,4 @@
-By default, for the PDF to text conversion, the module tries the different methods in the order mentionned in the INSTALL section: it will first try to use **PyMuPDF**; if it fails (for example because the lib is not properly installed), then it will try to use the **pdftotext python lib**, if that one also fails, it will try to use **pdftotext command line** and, if it also fails, it will eventually try **pdfplumber**. If none of the 4 methods work, Odoo will display an error message.
+By default, for the PDF to text conversion, the module tries the different methods in the order mentionned in the INSTALL section: it will first try to use **PyMuPDF**; if it fails (for example because the lib is not properly installed), then it will try to use the **pdftotext python lib**, if that one also fails, it will try to use **pdftotext command line** and, if it also fails, it will eventually try **pypdf**. If none of the 4 methods work, Odoo will display an error message.
 
 If you want to force Odoo to use a specific text extraction method, go to the menu *Configuration > Technical > Parameters > System Parameters* and create a new System Parameter:
 
@@ -8,7 +8,6 @@ If you want to force Odoo to use a specific text extraction method, go to the me
   1. pymupdf
   #. pdftotext.lib
   #. pdftotext.cmd
-  #. pdfplumber
   #. pypdf
 
 In this configuration, Odoo will only use the selected text extraction method and, if it fails, it will display an error message.
diff --git a/account_invoice_import_simple_pdf/readme/INSTALL.rst b/account_invoice_import_simple_pdf/readme/INSTALL.rst
@@ -5,10 +5,9 @@ The module supports 5 different extraction methods:
 1. `PyMuPDF <https://github.com/pymupdf/PyMuPDF>`_ which is a Python binding for `MuPDF <https://mupdf.com/>`_, a lightweight PDF toolkit/viewer/renderer published under the AGPL licence by the company `Artifex Software <https://artifex.com/>`_.
 #. `pdftotext python library <https://pypi.org/project/pdftotext/>`_, which is a python binding for the pdftotext tool.
 #. `pdftotext command line tool <https://en.wikipedia.org/wiki/Pdftotext>`_, which is based on `poppler <https://poppler.freedesktop.org/>`_, a PDF rendering library used by `xpdf <https://www.xpdfreader.com/>`_ and `Evince <https://wiki.gnome.org/Apps/Evince/FrequentlyAskedQuestions>`_ (the PDF reader of `Gnome <https://www.gnome.org/>`_).
-#. `pdfplumber <https://pypi.org/project/pdfplumber/>`_, which is a python library built on top the of the python library `pdfminer.six <https://pypi.org/project/pdfminer.six/>`_. pdfplumber is a pure-python solution, so it's very easy to install on all OSes.
 #. `pypdf <https://github.com/py-pdf/pypdf/>`_, which is one of the most common PDF lib for Python. pypdf is a pure-python solution, so it's very easy to install on all OSes.
 
-PyMuPDF and pdftotext both give a very good text output. So far, I can't say which one is best. pdfplumber and pypdf often give lower-quality text output, but their advantage is that they are pure-Python librairies, so you will always be able to install it whatever your technical environnement is.
+PyMuPDF and pdftotext both give a very good text output. So far, I can't say which one is best. pypdf often gives lower-quality text output, but its advantage is that it is a pure-Python librairy, so you will always be able to install it whatever your technical environnement is.
 
 You can choose one extraction method and only install the tools/libs for that method.
 
@@ -19,7 +18,7 @@ Install it via pip:
 
 .. code::
 
-  sudo pip3 install --upgrade pymupdf
+  pip3 install --upgrade pymupdf
 
 Beware that *PyMuPDF* is not a pure-python library: it uses MuPDF, which is written in C language. If a python wheel for your OS, CPU architecture and Python version is available on pypi (check the `list of PyMuPDF wheels <https://pypi.org/project/PyMuPDF/#files>`_ on pypi), it will install smoothly. Otherwize, the installation via pip will require MuPDF and all its development libs to compile the binding.
 
@@ -36,7 +35,7 @@ and then install the lib via pip:
 
 .. code::
 
-  sudo pip3 install --upgrade pdftotext
+  pip3 install --upgrade pdftotext
 
 On OSes other than Debian/Ubuntu, follow the instructions on the `project page <https://github.com/jalan/pdftotext>`_.
 
@@ -49,23 +48,14 @@ To install **pdftotext command line**, run:
 
   sudo apt install poppler-utils
 
-Install pdfplumber
-~~~~~~~~~~~~~~~~~~
-
-To install the **pdfplumber** python lib, run:
-
-.. code::
-
-  sudo pip3 install --upgrade pdfplumber
-
 Install pypdf
 ~~~~~~~~~~~~~
 
 To install the **pypdf** python lib, run:
 
 .. code::
 
-  sudo pip3 install --upgrade pypdf
+  pip3 install --upgrade pypdf
 
 
 Other requirements
@@ -80,17 +70,11 @@ The dateparser lib depends itself on regex. So you can install these Python libr
 
 .. code::
 
-  sudo pip3 install --upgrade dateparser
-
-The dateparser lib is not compatible with all regex lib versions. As of September 2022, the `version requirement <https://github.com/scrapinghub/dateparser/blob/master/setup.py#L30>`_ declared by dateparser for regex is **!=2019.02.19, !=2021.8.27, <2022.3.15**. So the latest version of regex which is compatible with dateparser is **2022.3.2**. To know the version of regex installed in your environment, run:
-
-
-.. code::
+  pip3 install --upgrade dateparser
 
-  sudo pip3 show regex
+The dateparser lib is not compatible with all regex lib versions. As of February 2024, the `version requirement <https://github.com/scrapinghub/dateparser/blob/master/setup.py#L36>`_ declared by dateparser for regex is **!=2019.02.19, !=2021.8.27**. So the latest version of dateparser is currenly compatible with the latest version of regex. To know the version of regex installed in your environment, run:
 
-To force regex to version 2022.3.2, run:
 
 .. code::
 
-  sudo pip3 install regex==2022.3.2
+  pip3 show regex
diff --git a/account_invoice_import_simple_pdf/static/description/index.html b/account_invoice_import_simple_pdf/static/description/index.html
@@ -410,17 +410,16 @@ <h1 class="title">Account Invoice Import Simple PDF</h1>
 <li><a class="reference internal" href="#install-pymupdf" id="toc-entry-2">Install PyMuPDF</a></li>
 <li><a class="reference internal" href="#install-pdftotext-python-lib" id="toc-entry-3">Install pdftotext python lib</a></li>
 <li><a class="reference internal" href="#install-pdftotext-command-line" id="toc-entry-4">Install pdftotext command line</a></li>
-<li><a class="reference internal" href="#install-pdfplumber" id="toc-entry-5">Install pdfplumber</a></li>
-<li><a class="reference internal" href="#install-pypdf" id="toc-entry-6">Install pypdf</a></li>
-<li><a class="reference internal" href="#other-requirements" id="toc-entry-7">Other requirements</a></li>
+<li><a class="reference internal" href="#install-pypdf" id="toc-entry-5">Install pypdf</a></li>
+<li><a class="reference internal" href="#other-requirements" id="toc-entry-6">Other requirements</a></li>
 </ul>
 </li>
-<li><a class="reference internal" href="#configuration" id="toc-entry-8">Configuration</a></li>
-<li><a class="reference internal" href="#bug-tracker" id="toc-entry-9">Bug Tracker</a></li>
-<li><a class="reference internal" href="#credits" id="toc-entry-10">Credits</a><ul>
-<li><a class="reference internal" href="#authors" id="toc-entry-11">Authors</a></li>
-<li><a class="reference internal" href="#contributors" id="toc-entry-12">Contributors</a></li>
-<li><a class="reference internal" href="#maintainers" id="toc-entry-13">Maintainers</a></li>
+<li><a class="reference internal" href="#configuration" id="toc-entry-7">Configuration</a></li>
+<li><a class="reference internal" href="#bug-tracker" id="toc-entry-8">Bug Tracker</a></li>
+<li><a class="reference internal" href="#credits" id="toc-entry-9">Credits</a><ul>
+<li><a class="reference internal" href="#authors" id="toc-entry-10">Authors</a></li>
+<li><a class="reference internal" href="#contributors" id="toc-entry-11">Contributors</a></li>
+<li><a class="reference internal" href="#maintainers" id="toc-entry-12">Maintainers</a></li>
 </ul>
 </li>
 </ul>
@@ -433,10 +432,9 @@ <h1><a class="toc-backref" href="#toc-entry-1">Installation</a></h1>
 <li><a class="reference external" href="https://github.com/pymupdf/PyMuPDF">PyMuPDF</a> which is a Python binding for <a class="reference external" href="https://mupdf.com/">MuPDF</a>, a lightweight PDF toolkit/viewer/renderer published under the AGPL licence by the company <a class="reference external" href="https://artifex.com/">Artifex Software</a>.</li>
 <li><a class="reference external" href="https://pypi.org/project/pdftotext/">pdftotext python library</a>, which is a python binding for the pdftotext tool.</li>
 <li><a class="reference external" href="https://en.wikipedia.org/wiki/Pdftotext">pdftotext command line tool</a>, which is based on <a class="reference external" href="https://poppler.freedesktop.org/">poppler</a>, a PDF rendering library used by <a class="reference external" href="https://www.xpdfreader.com/">xpdf</a> and <a class="reference external" href="https://wiki.gnome.org/Apps/Evince/FrequentlyAskedQuestions">Evince</a> (the PDF reader of <a class="reference external" href="https://www.gnome.org/">Gnome</a>).</li>
-<li><a class="reference external" href="https://pypi.org/project/pdfplumber/">pdfplumber</a>, which is a python library built on top the of the python library <a class="reference external" href="https://pypi.org/project/pdfminer.six/">pdfminer.six</a>. pdfplumber is a pure-python solution, so it’s very easy to install on all OSes.</li>
 <li><a class="reference external" href="https://github.com/py-pdf/pypdf/">pypdf</a>, which is one of the most common PDF lib for Python. pypdf is a pure-python solution, so it’s very easy to install on all OSes.</li>
 </ol>
-<p>PyMuPDF and pdftotext both give a very good text output. So far, I can’t say which one is best. pdfplumber and pypdf often give lower-quality text output, but their advantage is that they are pure-Python librairies, so you will always be able to install it whatever your technical environnement is.</p>
+<p>PyMuPDF and pdftotext both give a very good text output. So far, I can’t say which one is best. pypdf often gives lower-quality text output, but its advantage is that it is a pure-Python librairy, so you will always be able to install it whatever your technical environnement is.</p>
 <p>You can choose one extraction method and only install the tools/libs for that method.</p>
 <div class="section" id="install-pymupdf">
 <h2><a class="toc-backref" href="#toc-entry-2">Install PyMuPDF</a></h2>
@@ -465,22 +463,15 @@ <h2><a class="toc-backref" href="#toc-entry-4">Install pdftotext command line</a
 sudo apt install poppler-utils
 </pre>
 </div>
-<div class="section" id="install-pdfplumber">
-<h2><a class="toc-backref" href="#toc-entry-5">Install pdfplumber</a></h2>
-<p>To install the <strong>pdfplumber</strong> python lib, run:</p>
-<pre class="code literal-block">
-sudo pip3 install --upgrade pdfplumber
-</pre>
-</div>
 <div class="section" id="install-pypdf">
-<h2><a class="toc-backref" href="#toc-entry-6">Install pypdf</a></h2>
+<h2><a class="toc-backref" href="#toc-entry-5">Install pypdf</a></h2>
 <p>To install the <strong>pypdf</strong> python lib, run:</p>
 <pre class="code literal-block">
 sudo pip3 install --upgrade pypdf
 </pre>
 </div>
 <div class="section" id="other-requirements">
-<h2><a class="toc-backref" href="#toc-entry-7">Other requirements</a></h2>
+<h2><a class="toc-backref" href="#toc-entry-6">Other requirements</a></h2>
 <p>This module also requires the following Python libraries:</p>
 <ul class="simple">
 <li><a class="reference external" href="https://pypi.org/project/regex/">regex</a> which is backward-compatible with the <em>re</em> module of the Python standard library, but has additional functionalities.</li>
@@ -501,16 +492,15 @@ <h2><a class="toc-backref" href="#toc-entry-7">Other requirements</a></h2>
 </div>
 </div>
 <div class="section" id="configuration">
-<h1><a class="toc-backref" href="#toc-entry-8">Configuration</a></h1>
-<p>By default, for the PDF to text conversion, the module tries the different methods in the order mentionned in the INSTALL section: it will first try to use <strong>PyMuPDF</strong>; if it fails (for example because the lib is not properly installed), then it will try to use the <strong>pdftotext python lib</strong>, if that one also fails, it will try to use <strong>pdftotext command line</strong> and, if it also fails, it will eventually try <strong>pdfplumber</strong>. If none of the 4 methods work, Odoo will display an error message.</p>
+<h1><a class="toc-backref" href="#toc-entry-7">Configuration</a></h1>
+<p>By default, for the PDF to text conversion, the module tries the different methods in the order mentionned in the INSTALL section: it will first try to use <strong>PyMuPDF</strong>; if it fails (for example because the lib is not properly installed), then it will try to use the <strong>pdftotext python lib</strong>, if that one also fails, it will try to use <strong>pdftotext command line</strong> and, if it also fails, it will eventually try <strong>pypdf</strong>. If none of the 4 methods work, Odoo will display an error message.</p>
 <p>If you want to force Odoo to use a specific text extraction method, go to the menu <em>Configuration &gt; Technical &gt; Parameters &gt; System Parameters</em> and create a new System Parameter:</p>
 <ul class="simple">
 <li><em>Key</em>: <strong>invoice_import_simple_pdf.pdf2txt</strong></li>
 <li><em>Value</em>: select the proper value for the method you want to use:<ol class="arabic">
 <li>pymupdf</li>
 <li>pdftotext.lib</li>
 <li>pdftotext.cmd</li>
-<li>pdfplumber</li>
 <li>pypdf</li>
 </ol>
 </li>
@@ -519,29 +509,29 @@ <h1><a class="toc-backref" href="#toc-entry-8">Configuration</a></h1>
 <p>You will find a full demonstration about how to configure each Vendor and import the PDF invoices in this <a class="reference external" href="https://www.youtube.com/watch?v=edsEuXVyEYE">screencast</a>.</p>
 </div>
 <div class="section" id="bug-tracker">
-<h1><a class="toc-backref" href="#toc-entry-9">Bug Tracker</a></h1>
+<h1><a class="toc-backref" href="#toc-entry-8">Bug Tracker</a></h1>
 <p>Bugs are tracked on <a class="reference external" href="https://github.com/OCA/edi/issues">GitHub Issues</a>.
 In case of trouble, please check there if your issue has already been reported.
 If you spotted it first, help us to smash it by providing a detailed and welcomed
 <a class="reference external" href="https://github.com/OCA/edi/issues/new?body=module:%20account_invoice_import_simple_pdf%0Aversion:%2014.0%0A%0A**Steps%20to%20reproduce**%0A-%20...%0A%0A**Current%20behavior**%0A%0A**Expected%20behavior**">feedback</a>.</p>
 <p>Do not contact contributors directly about support or help with technical issues.</p>
 </div>
 <div class="section" id="credits">
-<h1><a class="toc-backref" href="#toc-entry-10">Credits</a></h1>
+<h1><a class="toc-backref" href="#toc-entry-9">Credits</a></h1>
 <div class="section" id="authors">
-<h2><a class="toc-backref" href="#toc-entry-11">Authors</a></h2>
+<h2><a class="toc-backref" href="#toc-entry-10">Authors</a></h2>
 <ul class="simple">
 <li>Akretion</li>
 </ul>
 </div>
 <div class="section" id="contributors">
-<h2><a class="toc-backref" href="#toc-entry-12">Contributors</a></h2>
+<h2><a class="toc-backref" href="#toc-entry-11">Contributors</a></h2>
 <ul class="simple">
 <li>Alexis de Lattre &lt;<a class="reference external" href="mailto:alexis.delattre&#64;akretion.com">alexis.delattre&#64;akretion.com</a>&gt;</li>
 </ul>
 </div>
 <div class="section" id="maintainers">
-<h2><a class="toc-backref" href="#toc-entry-13">Maintainers</a></h2>
+<h2><a class="toc-backref" href="#toc-entry-12">Maintainers</a></h2>
 <p>This module is maintained by the OCA.</p>
 <a class="reference external image-reference" href="https://odoo-community.org"><img alt="Odoo Community Association" src="https://odoo-community.org/logo.png" /></a>
 <p>OCA, or the Odoo Community Association, is a nonprofit organization whose
diff --git a/account_invoice_import_simple_pdf/tests/test_invoice_import.py b/account_invoice_import_simple_pdf/tests/test_invoice_import.py
@@ -568,7 +568,6 @@ def _complete_import_specific_method(self, method):
     def test_specific_python_methods(self):
         # test only pure-pdf methods
         # because we are sure they work on the Github test environment
-        self._complete_import_specific_method("pdfplumber")
         self._complete_import_specific_method("pypdf")
 
     def test_test_mode(self):
diff --git a/account_invoice_import_simple_pdf/wizard/account_invoice_import.py b/account_invoice_import_simple_pdf/wizard/account_invoice_import.py
diff --git a/requirements.txt b/requirements.txt