This repository demonstrates the use of Tesseract OCR in Python for text extraction from various image formats. It processes multiple images and extracts their textual content using the pytesseract library.
.png, .jpg, .bmp, .gif, .webp, and more.Follow these steps to set up and run the project:
git clone https://github.com/LahcenEzzara/python-tesseract-explorer.git
cd python-tesseract-explorer
It is recommended to use a virtual environment to manage dependencies.
python3 -m venv venv
source venv/bin/activate # On Linux/Mac
# OR
venv\Scripts\activate # On Windows
Install the required Python libraries:
pip install -r requirements.txt
Ensure Tesseract OCR is installed on your system. For Ubuntu, you can use:
sudo apt update
sudo apt install tesseract-ocr
To process Arabic or other languages, install their respective Tesseract language data. For example:
sudo apt install tesseract-ocr-ara
The main script processes images in the images/ folder and extracts text from each. To run the script:
python main.py
The extracted text for each image will be printed in the terminal.
python-tesseract-explorer/
├── images/ # Folder containing test images
│ ├── test_ar.png
│ ├── test_la.png
│ ├── test-european.jpg
│ ├── test-small.jpg
│ ├── test.bmp
│ ├── test.gif
│ ├── test.jpg
│ ├── test.png
│ ├── test.webp
│ └── ...
├── main.py # Python script for OCR
├── requirements.txt # Python dependencies
├── README.md # Project documentation
└── LICENSE # License file
When running the script, you will see output similar to this:
Processing: images/test_ar.png
Extracted Text from test_ar.png:
السلام عليكم
----------------------------------------
Processing: images/test_la.png
Extracted Text from test_la.png:
Hello World!
----------------------------------------
...
Install them using:
pip install pytesseract pillow
Contributions are welcome! Please feel free to submit issues or pull requests.
This project is licensed under the MIT License. See the LICENSE file for details.
tesseract binary is properly installed and accessible from the command line..traineddata file from the Tesseract tessdata repository and place it in your Tesseract tessdata folder.