Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[15.0][ADD] account_edi_simple_pdf #959

Draft
wants to merge 65 commits into
base: 15.0
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
b329442
Initial check-in of account_invoice_import_simple_pdf
alexis-via Jul 26, 2021
21f2178
Temporary workaround of odoo bug https://github.com/odoo/odoo/issues/…
alexis-via Aug 24, 2021
3db926e
Improve handling of start/end string cut when there are multiplue occ…
alexis-via Aug 31, 2021
5d474a1
[FIX] when the invoice has the VAT number of the supplier and also th…
alexis-via Sep 15, 2021
652a056
[FIX] Read of the "Page Analysis" parameter
alexis-via Oct 13, 2021
7e01191
Improve date extraction when month is a string with accents
alexis-via Oct 24, 2021
8110358
account_invoice_import_simple_pdf: coma -> comma (typo fix)
alexis-via Oct 25, 2021
7dcfd53
account_invoice_import_simple_pdf: easier extensibility for new fields
alexis-via Oct 25, 2021
378fa63
[UPD] Update account_invoice_import_simple_pdf.pot
oca-travis Oct 26, 2021
4b990ad
[UPD] README.rst
OCA-git-bot Oct 26, 2021
23e0a74
[FIX] account_invoice_import_simple_pdf: Remove exclude with invoice2…
etobella Nov 29, 2021
5c7b040
[FIX] account_invoice_import_simple_pdf: extract_rule position_min/po…
alexis-via Feb 8, 2022
56861f7
account_invoice_import_simple_pdf: add onchange on date_format on fie…
alexis-via Feb 8, 2022
6d9cd0b
account_invoice_import_simple_pdf: install pymupdf by Debian package …
alexis-via Feb 8, 2022
da5a3b1
same player try again
alexis-via Feb 8, 2022
c06155e
Same player try again again
alexis-via Feb 8, 2022
bb7d8f7
Same player try again again again
alexis-via Feb 8, 2022
6401fbc
account_invoice_import_simple_pdf: support multiple tools for text ex…
alexis-via Feb 12, 2022
82b0754
[FIX] access to form view of partners for users who are not accountants
alexis-via Feb 13, 2022
4ce5a06
[UPD] Update account_invoice_import_simple_pdf.pot
Feb 14, 2022
cc37ad2
[UPD] README.rst
OCA-git-bot Feb 14, 2022
c4edb73
account_invoice_import_simple_pdf 14.0.2.0.0
OCA-git-bot Feb 14, 2022
28ebc1d
[FIX] account_invoice_import_simple_pdf: Fix view replace
etobella Mar 31, 2022
fa34c70
Use fix version of dateparser
flotho Apr 26, 2022
ab48b42
account_invoice_import_simple_pdf 14.0.2.1.0
OCA-git-bot Jun 16, 2022
eab48ba
account_invoice_import_simple_pdf: add apostrophe as thousand separator
alexis-via Jun 29, 2022
ba3f2b7
account_invoice_import_simple_pdf: parse July 5th, 2022 as date
alexis-via Jun 29, 2022
49fbe22
[UPD] Update account_invoice_import_simple_pdf.pot
Jun 29, 2022
82bd8f0
account_invoice_import_simple_pdf 14.0.2.2.0
OCA-git-bot Jun 29, 2022
e2b33d0
[FIX] simple_pdf: bad string
alexis-via Jul 14, 2022
1db45b0
[UPD] Update account_invoice_import_simple_pdf.pot
Jul 15, 2022
6d4aed0
account_invoice_import_simple_pdf 14.0.2.2.1
OCA-git-bot Jul 15, 2022
9a0808b
account_invoice_import: improve handling of simple PDF invoices
alexis-via Jul 14, 2022
48c87c2
Added translation using Weblate (French)
klodr Aug 5, 2022
17db551
Translated using Weblate (French)
klodr Aug 5, 2022
55cc691
Translated using Weblate (French)
Sep 20, 2022
952c7b9
simple_pdf: add warning about constraint on regex version imposed by …
alexis-via Sep 21, 2022
93c4b6b
simple_pdf: use another invoice as test invoice
alexis-via Sep 22, 2022
7513998
simple_pdf: raise error if thousand sep = decimal sep
alexis-via Sep 27, 2022
fa1ddda
[UPD] Update account_invoice_import_simple_pdf.pot
Sep 27, 2022
c263a35
[UPD] README.rst
OCA-git-bot Sep 27, 2022
1c2ca03
account_invoice_import_simple_pdf 14.0.3.0.0
OCA-git-bot Sep 27, 2022
7532251
Update translation files
oca-transbot Sep 27, 2022
cd28586
simple_pdf: allow to match partners on additionnal fields
alexis-via Oct 8, 2022
98fde0f
[UPD] Update account_invoice_import_simple_pdf.pot
Oct 12, 2022
d2b2348
account_invoice_import_simple_pdf 14.0.3.1.0
OCA-git-bot Oct 12, 2022
108402c
Update translation files
oca-transbot Oct 12, 2022
2df2c67
Translated using Weblate (French)
klodr May 29, 2023
8c820b3
simple_pdf: add a new type 'Any Character' on invoice number parsing
alexis-via May 30, 2023
e14339c
account_invoice_import_simple_pdf 14.0.3.2.0
OCA-git-bot Jun 6, 2023
459534f
[FIX] remove pin version of dateparser
florian-dacosta Jan 24, 2023
f4adb74
[UPD] README.rst
OCA-git-bot Sep 3, 2023
ba8bac1
[UPD] README.rst
OCA-git-bot Sep 3, 2023
6ce51db
account_invoice_import_simple_pdf 14.0.3.2.1
OCA-git-bot Sep 3, 2023
8db1b69
[UPD] README.rst
OCA-git-bot Sep 3, 2023
0d6c9d7
simple_pdf: works with newer PyMuPDF versions
alexis-via Oct 20, 2023
35c64f7
[BOT] post-merge updates
OCA-git-bot Oct 24, 2023
732986a
Added translation using Weblate (Spanish)
Ivorra78 Nov 25, 2023
e48ea57
Translated using Weblate (Spanish)
Ivorra78 Nov 25, 2023
b1602c2
Translated using Weblate (Spanish)
Ivorra78 Nov 25, 2023
422f5ad
account_invoice_import_simple_pdf: remove pdfplumber
alexis-via Feb 13, 2024
7238d29
account_invoice_import_simple_pdf: update INSTALL.rst about version o…
alexis-via Feb 13, 2024
3aa3cf7
[BOT] post-merge updates
OCA-git-bot Mar 14, 2024
d4f09bd
[IMP] account_invoice_import_simple_pdf: black, isort, prettier
hbrunn Mar 29, 2024
eedc885
[ADD] account_edi_simple_pdf
hbrunn Mar 29, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
220 changes: 220 additions & 0 deletions account_edi_simple_pdf/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
=================
Import Simple PDF
=================

..
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! This file is generated by oca-gen-addon-readme !!
!! changes will be overwritten. !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! source digest: sha256:5ab1ebb7747c3603828622a11eb4d2fe9419bb3ccda3f1bbbcaa733014ddcb54
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

.. |badge1| image:: https://img.shields.io/badge/maturity-Beta-yellow.png
:target: https://odoo-community.org/page/development-status
:alt: Beta
.. |badge2| image:: https://img.shields.io/badge/licence-AGPL--3-blue.png
:target: http://www.gnu.org/licenses/agpl-3.0-standalone.html
:alt: License: AGPL-3
.. |badge3| image:: https://img.shields.io/badge/github-OCA%2Fedi-lightgray.png?logo=github
:target: https://github.com/OCA/edi/tree/15.0/account_edi_simple_pdf
:alt: OCA/edi
.. |badge4| image:: https://img.shields.io/badge/weblate-Translate%20me-F47D42.png
:target: https://translation.odoo-community.org/projects/edi-15-0/edi-15-0-account_edi_simple_pdf
:alt: Translate me on Weblate
.. |badge5| image:: https://img.shields.io/badge/runboat-Try%20me-875A7B.png
:target: https://runboat.odoo-community.org/builds?repo=OCA/edi&target_branch=15.0
:alt: Try me on Runboat

|badge1| |badge2| |badge3| |badge4| |badge5|

This module extends Odoo's vendor bill import mechanism with support for simple PDF invoices i.e. PDF invoice that don't have an embedded XML file.

* Possibility to add support for a new vendor without developper skills: the accountant can do it!
* Adding support for a new vendor is faster.
* More tolerance on vendor invoice layout changes.
* Easier to install.

Ihis module uses the following design when importing a PDF vendor bill:

1. raw text extraction of the PDF file,
2. identify the partner using the VAT number (if the VAT number is present in the raw text extraction) or some keywords,
3. use regular expressions (regex) to extract the data needed to create the vendor bill in Odoo (single line configuration).

Under the hood, regular expressions are auto-generated from the configuration made by the user in Odoo. No need to be a regex expert! But you can still write regexes to extract some fields for some very specific needs.

The module can extract the following fields:

* Total Amount with taxes
* Total Untaxed Amount
* Total Tax Amount
* Invoice Date
* Due Date
* Start Date
* End Date
* Invoice Number
* Description (for that field, you have to write a regex)

In this list, only 3 fields are required:

* Invoice Date
* 2 out of the 3 Amount fields (the 3rd can be deducted from the 2 others: Total Amount = Total Untaxed + Total Tax)

To take advantage of the fields *Start Date* and *End Date*, you need the OCA module *account_invoice_start_end_dates* from the `account-closing <https://github.com/OCA/account-closing>`_ project.

**Table of contents**

.. contents::
:local:

Installation
============

The most important technical component of this module is the tool that converts the PDF to text. Converting PDF to text is not an easy job. As outlined in this `blog post <https://dida.do/blog/how-to-extract-text-from-pdf>`_, different tools can give quite different results. The best results are usually achieved with tools based on a PDF viewer, which exclude pure-python tools. But pure-python tools are easier to install than tools based on a PDF viewer. It is important to understand that, if you change the PDF to text tool, you will certainly have a slightly different text output, which may oblige you to update the field extraction rule, which can be time-consuming if you have already configured many vendors.

The module supports 5 different extraction methods:

1. `PyMuPDF <https://github.com/pymupdf/PyMuPDF>`_ which is a Python binding for `MuPDF <https://mupdf.com/>`_, a lightweight PDF toolkit/viewer/renderer published under the AGPL licence by the company `Artifex Software <https://artifex.com/>`_.
#. `pdftotext python library <https://pypi.org/project/pdftotext/>`_, which is a python binding for the pdftotext tool.
#. `pdftotext command line tool <https://en.wikipedia.org/wiki/Pdftotext>`_, which is based on `poppler <https://poppler.freedesktop.org/>`_, a PDF rendering library used by `xpdf <https://www.xpdfreader.com/>`_ and `Evince <https://wiki.gnome.org/Apps/Evince/FrequentlyAskedQuestions>`_ (the PDF reader of `Gnome <https://www.gnome.org/>`_).
#. `pypdf <https://github.com/py-pdf/pypdf/>`_, which is one of the most common PDF lib for Python. pypdf is a pure-python solution, so it's very easy to install on all OSes.

PyMuPDF and pdftotext both give a very good text output. So far, I can't say which one is best. pypdf often gives lower-quality text output, but its advantage is that it is a pure-Python library, so you will always be able to install it whatever your technical environement is.

You can choose one extraction method and only install the tools/libs for that method.

Install PyMuPDF
~~~~~~~~~~~~~~~

Install it via pip:

.. code::

pip3 install --upgrade pymupdf

Beware that *PyMuPDF* is not a pure-python library: it uses MuPDF, which is written in C language. If a python wheel for your OS, CPU architecture and Python version is available on pypi (check the `list of PyMuPDF wheels <https://pypi.org/project/PyMuPDF/#files>`_ on pypi), it will install smoothly. Otherwize, the installation via pip will require MuPDF and all its development libs to compile the binding.

Install pdftotext python lib
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To install **pdftotext python lib**, run:

.. code::

sudo apt install build-essential libpoppler-cpp-dev pkg-config python3-dev

and then install the lib via pip:

.. code::

pip3 install --upgrade pdftotext

On OSes other than Debian/Ubuntu, follow the instructions on the `project page <https://github.com/jalan/pdftotext>`_.

Install pdftotext command line
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To install **pdftotext command line**, run:

.. code::

sudo apt install poppler-utils

Install pypdf
~~~~~~~~~~~~~

To install the **pypdf** python lib, run:

.. code::

pip3 install --upgrade pypdf


Other requirements
~~~~~~~~~~~~~~~~~~

This module also requires the following Python libraries:

* `regex <https://pypi.org/project/regex/>`_ which is backward-compatible with the *re* module of the Python standard library, but has additional functionalities.
* `dateparser <https://github.com/scrapinghub/dateparser>`_ which is a powerful date parsing library.

The dateparser lib depends itself on regex. So you can install these Python libraries via pip with the following command:

.. code::

pip3 install --upgrade dateparser

The dateparser lib is not compatible with all regex lib versions. As of February 2024, the `version requirement <https://github.com/scrapinghub/dateparser/blob/master/setup.py#L36>`_ declared by dateparser for regex is **!=2019.02.19, !=2021.8.27**. So the latest version of dateparser is currenly compatible with the latest version of regex. To know the version of regex installed in your environment, run:


.. code::

pip3 show regex

Configuration
=============

By default, for the PDF to text conversion, the module tries the different methods in the order mentioned in the INSTALL section: it will first try to use **PyMuPDF**; if it fails (for example because the lib is not properly installed), then it will try to use the **pdftotext python lib**, if that one also fails, it will try to use **pdftotext command line** and, if it also fails, it will eventually try **pypdf**. If none of the 4 methods yields any text (if that then is parsable successfully is another matter), Odoo will display an error message.

If you want to force Odoo to use a specific text extraction method, go to the menu *Configuration > Technical > Parameters > System Parameters* and create a new System Parameter:

* *Key*: **invoice_import_simple_pdf.pdf2txt**
* *Value*: select the proper value for the method you want to use:

1. pymupdf
#. pdftotext.lib
#. pdftotext.cmd
#. pypdf

In this configuration, Odoo will only use the selected text extraction method and, if it fails, it will display an error message.

You will find a full demonstration about how to configure each Vendor and import the PDF invoices in this `screencast <https://www.youtube.com/watch?v=edsEuXVyEYE>`_.

Usage
=====

- go to Invoicing -> Vendors -> Bills
- press the button Upload and upload a PDF file. Now the PDF file will be processed by account_edi_simple_pdf

In case your PDF file contains Factur-X data and you have Factur-X activated (by the core module ``account_edi_facturx`` which is autoinstalled), that functionality will be executed instead.

Bug Tracker
===========

Bugs are tracked on `GitHub Issues <https://github.com/OCA/edi/issues>`_.
In case of trouble, please check there if your issue has already been reported.
If you spotted it first, help us to smash it by providing a detailed and welcomed
`feedback <https://github.com/OCA/edi/issues/new?body=module:%20account_edi_simple_pdf%0Aversion:%2015.0%0A%0A**Steps%20to%20reproduce**%0A-%20...%0A%0A**Current%20behavior**%0A%0A**Expected%20behavior**>`_.

Do not contact contributors directly about support or help with technical issues.

Credits
=======

Authors
~~~~~~~

* Akretion
* Hunki Enterprises BV

Contributors
~~~~~~~~~~~~

* Alexis de Lattre <alexis.delattre@akretion.com>

Maintainers
~~~~~~~~~~~

This module is maintained by the OCA.

.. image:: https://odoo-community.org/logo.png
:alt: Odoo Community Association
:target: https://odoo-community.org

OCA, or the Odoo Community Association, is a nonprofit organization whose
mission is to support the collaborative development of Odoo features and
promote its widespread use.

This module is part of the `OCA/edi <https://github.com/OCA/edi/tree/15.0/account_edi_simple_pdf>`_ project on GitHub.

You are welcome to contribute. To learn how please visit https://odoo-community.org/page/Contribute.
1 change: 1 addition & 0 deletions account_edi_simple_pdf/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from . import models
31 changes: 31 additions & 0 deletions account_edi_simple_pdf/__manifest__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Copyright 2021 Akretion France (http://www.akretion.com/)
# @author: Alexis de Lattre <alexis.delattre@akretion.com>
# License AGPL-3.0 or later (http://www.gnu.org/licenses/agpl).

{
"name": "Import Simple PDF",
"version": "15.0.4.0.0",
"category": "Accounting/Accounting",
"license": "AGPL-3",
"summary": "Import simple PDF vendor bills",
"author": "Akretion,Hunki Enterprises BV,Odoo Community Association (OCA)",
"website": "https://github.com/OCA/edi",
"depends": ["account_edi"],
"external_dependencies": {
"python": [
"regex",
"dateparser",
"pypdf>=3.1.0",
],
"deb": ["libmupdf-dev", "mupdf", "mupdf-tools", "poppler-utils"],
},
"data": [
"security/ir.model.access.csv",
"views/res_partner.xml",
"views/account_invoice_import_simple_pdf_fields.xml",
"views/account_invoice_import_simple_pdf_invoice_number.xml",
],
"demo": ["demo/demo_data.xml"],
"installable": True,
"application": True,
}
62 changes: 62 additions & 0 deletions account_edi_simple_pdf/demo/demo_data.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
<?xml version="1.0" encoding="utf-8" ?>
<odoo noupdate="1">

<record id="mobile_phone" model="product.product">
<field name="name">Mobile phone</field>
<field name="categ_id" ref="product.product_category_5" />
<field name="sale_ok" eval="False" />
<field name="purchase_ok" eval="True" />
<field name="type">service</field>
</record>

<record id="bouygues_telecom" model="res.partner">
<field name="name">Bouygues Telecom</field>
<field name="is_company" eval="True" />
<field name="supplier_rank">1</field>
<field name="street">37 rue Boissière</field>
<field name="zip">75116</field>
<field name="city">Paris</field>
<field name="country_id" ref="base.fr" />
<field name="website">http://www.bouyguestelecom.fr</field>
<field name="vat">FR74397480930</field>
<field name="simple_pdf_date_format">dd-mm-y4</field>
<field name="simple_pdf_date_separator">slash</field>
<field name="simple_pdf_currency_id" ref="base.EUR" />
<field name="simple_pdf_pages">first</field>
<field name="simple_pdf_decimal_separator">comma</field>
<field name="simple_pdf_thousand_separator">space</field>
</record>

<record id="inv_number1" model="account.invoice.import.simple.pdf.invoice.number">
<field name="partner_id" ref="bouygues_telecom" />
<field name="string_type">digit</field>
<field name="occurrence_min">14</field>
<field name="occurrence_max">14</field>
</record>

<record id="inv_amount_total" model="account.invoice.import.simple.pdf.fields">
<field name="partner_id" ref="bouygues_telecom" />
<field name="name">amount_total</field>
<field name="extract_rule">max</field>
</record>

<record id="inv_amount_untaxed" model="account.invoice.import.simple.pdf.fields">
<field name="partner_id" ref="bouygues_telecom" />
<field name="name">amount_untaxed</field>
<field name="extract_rule">first</field>
<field name="start">Montant de la facture soumis à TVA</field>
</record>

<record id="inv_date" model="account.invoice.import.simple.pdf.fields">
<field name="partner_id" ref="bouygues_telecom" />
<field name="name">date</field>
<field name="extract_rule">first</field>
</record>

<record id="inv_invoice_number" model="account.invoice.import.simple.pdf.fields">
<field name="partner_id" ref="bouygues_telecom" />
<field name="name">invoice_number</field>
<field name="extract_rule">first</field>
</record>

</odoo>
Loading
Loading