# receipt-parser > Receipt/invoice parser. Takes a list of PDF/images → pdftotext/tesseract → Anthropic Claude API for extraction → contenteditable HTML. We use the Claude Haiku model to reduce cost while maintaining high accuracy. The receipt-parser is a tool that takes PDF files or images of receipts and invoices as input, and extracts the relevant information using optical character recognition (OCR) and the Anthropic Claude API. The extracted data is then presented in an easy-to-read, editable HTML format. ## System Requirements - Node.js - Bash - Tesseract (`sudo apt install tesseract`) - Anthropic API key: ## Usage First, clone the repository: `git clone https://github.com/your-username/receipt-parser.git` Then, install the required dependencies: `npm install` Obtain an Anthropic API key from Run the script list so: ```bash ./index.sh /path/to/receipt1.pdf /path/to/receipt2.pdf ... ``` ...it takes a list of files. The script will process all PDF and image (anything Tesseract supports) files to: - Extract text using OCR and Tesseract - Convert that into a machine-readable JSON object with the Anthropic Claude API - Generate an HTML file with the extracted data for each input file. ### Nautilus Script To register the script as a Nautilus script (for easy right-click access in the file manager), follow these steps: From the root of this repository, run this command: ```bash ln -s $(pwd)/src/index.sh ~/.local/share/nautilus/scripts/parse-receipts ``` Then, restart Nautilus by running `nautilus -q` in the terminal. After restarting Nautilus, you should be able to right-click on any PDF or image file and select "Scripts" > "parse-receipts" to run the receipt parser on the selected files. ## Contributing Contributions are very welcome - both issues and pull requests! Please mention in your pull request that you release your work under the MPL-2.0 (see below). See [CONTRIBUTING.md](./CONTRIBUTING.md) for a guide on what to expect when submitting a pull request or issue to this project. If you're feeling that way inclined, the sponsor button at the top of the page (if you're on GitHub) will take you to my [Liberapay profile](https://liberapay.com/sbrl) if you'd like to donate to say an extra thank you :-) ## License This project is released under the GNU Public License 3.0. The full license text is included in the `LICENSE` file in this repository. Tldr legal have a [great summary](https://www.tldrlegal.com/license/gnu-general-public-license-v3-gpl-3) of the license if you're interested.