> Receipt/invoice parser. Takes a list of PDF/images → pdftotext/tesseract → Anthropic Claude API for extraction → contenteditable HTML. We use the Claude Haiku model to reduce cost while maintaining high accuracy.
The receipt-parser is a tool that takes PDF files or images of receipts and invoices as input, and extracts the relevant information using optical character recognition (OCR) and the Anthropic Claude API. The extracted data is then presented in an easy-to-read, editable HTML format.
Then, restart Nautilus by running `nautilus -q` in the terminal.
After restarting Nautilus, you should be able to right-click on any PDF or image file and select "Scripts" > "parse-receipts" to run the receipt parser on the selected files.
## Contributing
Contributions are very welcome - both issues and pull requests! Please mention in your pull request that you release your work under the MPL-2.0 (see below).
See [CONTRIBUTING.md](./CONTRIBUTING.md) for a guide on what to expect when submitting a pull request or issue to this project.
If you're feeling that way inclined, the sponsor button at the top of the page (if you're on GitHub) will take you to my [Liberapay profile](https://liberapay.com/sbrl) if you'd like to donate to say an extra thank you :-)
## License
This project is released under the GNU Public License 3.0. The full license text is included in the `LICENSE` file in this repository. Tldr legal have a [great summary](https://www.tldrlegal.com/license/gnu-general-public-license-v3-gpl-3) of the license if you're interested.