python-tesseract-explorer

Python Tesseract Explorer

This repository demonstrates the use of Tesseract OCR in Python for text extraction from various image formats. It processes multiple images and extracts their textual content using the pytesseract library.

Features


Installation

Follow these steps to set up and run the project:

1. Clone the Repository

git clone https://github.com/LahcenEzzara/python-tesseract-explorer.git
cd python-tesseract-explorer

2. Set Up Python Environment

It is recommended to use a virtual environment to manage dependencies.

python3 -m venv venv
source venv/bin/activate  # On Linux/Mac
# OR
venv\Scripts\activate  # On Windows

3. Install Dependencies

Install the required Python libraries:

pip install -r requirements.txt

4. Install Tesseract OCR

Ensure Tesseract OCR is installed on your system. For Ubuntu, you can use:

sudo apt update
sudo apt install tesseract-ocr

5. Install Additional Language Support

To process Arabic or other languages, install their respective Tesseract language data. For example:

sudo apt install tesseract-ocr-ara

Usage

The main script processes images in the images/ folder and extracts text from each. To run the script:

python main.py

The extracted text for each image will be printed in the terminal.


Directory Structure

python-tesseract-explorer/
├── images/                 # Folder containing test images
│   ├── test_ar.png
│   ├── test_la.png
│   ├── test-european.jpg
│   ├── test-small.jpg
│   ├── test.bmp
│   ├── test.gif
│   ├── test.jpg
│   ├── test.png
│   ├── test.webp
│   └── ...
├── main.py                 # Python script for OCR
├── requirements.txt        # Python dependencies
├── README.md               # Project documentation
└── LICENSE                 # License file

Example Output

When running the script, you will see output similar to this:

Processing: images/test_ar.png
Extracted Text from test_ar.png:
السلام عليكم

----------------------------------------
Processing: images/test_la.png
Extracted Text from test_la.png:
Hello World!

----------------------------------------
...

Dependencies

Install them using:

pip install pytesseract pillow

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.


License

This project is licensed under the MIT License. See the LICENSE file for details.


Notes