This project provides a Python library and CLI tool for interacting with Google Lens's OCR functionality via the API used in Chromium. This allows you to process images and extract text data, including full text, coordinates, and stitched text using various methods.
- Extract the full text: Extract the full text from the image.
- Coordinate Extraction: Extract the text along with its coordinates.
- Stitched text: Restore text from coordinate blocks using various methods:
- Old method: Sequential stitching of text.
- New method: Improved text stitching by calculating them line by line. It is not recommended on rotated texts. Use the past one.
- Scan images from URLs: Process images directly from URLs without downloading them manually.
- Cookie Management: Download and manage cookies from a file in Netscape format or directly through the configuration.
- Proxy Support: Supports HTTP, HTTPS, and SOCKS4/5 proxies to make requests over different networks.
PS: Lens has a problem with the way it displays full text, which is why methods have been added that stitch text from coordinates.
You can install the package using pip
:
pip install chrome-lens-py
pip install -U chrome-lens-py
pip install git+https://github.com/bropines/chrome-lens-py.git
Clone the repository and install the package:
git clone https://github.com/bropines/chrome-lens-api-py.git
cd chrome-lens-api-py
pip install -r requirements.txt
pip install .
You can use the lens_scan
command from the CLI to process images and extract text data, or you can use the Python API to integrate this functionality into your own projects.
CLI Usage
lens_scan <image_source> <data_type>
<image_source>
: Path to the image file or URL.<data_type>
: Type of data to extract (see below).
- all: Get all data (full text, coordinates, and stitched text using both methods).
- full_text_default: Get only the default full text.
- full_text_old_method: Get stitched text using the old sequential method.
- full_text_new_method: Get stitched text using the new enhanced method.
- coordinates: Get text along with coordinates.
To extract text using the new method for stitching from a local file:
lens_scan path/to/image.jpg full_text_new_method
To extract text using the new method for stitching from a URL:
lens_scan https://example.com/image.jpg full_text_new_method
To get all available data from a local file:
lens_scan path/to/image.jpg all
To get all available data from a URL:
lens_scan https://example.com/image.jpg all
You can use the -h
or --help
option to display usage information:
lens_scan -h
Programmatic API Usage
In addition to the CLI tool, this project provides a Python API that can be used in your scripts.
First, import the LensAPI
class:
from chrome_lens_py import LensAPI
-
Instantiate the API:
api = LensAPI()
-
Process an image:
-
Get all data from a local file:
result = api.get_all_data('path/to/image.jpg') print(result)
-
Get all data from a URL:
result = api.get_all_data('https://example.com/image.jpg') print(result)
-
Get the default full text from a local file:
result = api.get_full_text('path/to/image.jpg') print(result)
-
Get the default full text from a URL:
result = api.get_full_text('https://example.com/image.jpg') print(result)
-
Get stitched text using the old method from a local file:
result = api.get_stitched_text_sequential('path/to/image.jpg') print(result)
-
Get stitched text using the old method from a URL:
result = api.get_stitched_text_sequential('https://example.com/image.jpg') print(result)
-
Get stitched text using the new method from a local file:
result = api.get_stitched_text_smart('path/to/image.jpg') print(result)
-
Get stitched text using the new method from a URL:
result = api.get_stitched_text_smart('https://example.com/image.jpg') print(result)
-
Get text with coordinates from a local file:
result = api.get_text_with_coordinates('path/to/image.jpg') print(result)
-
Get text with coordinates from a URL:
result = api.get_text_with_coordinates('https://example.com/image.jpg') print(result)
-
You can customize the behavior of the LensAPI
by passing a config
dictionary when instantiating the class. This allows you to control various aspects of the API, such as headers, proxies, cookie management, debugging, and request timing.
The following keys can be used in the config
dictionary:
-
header_type
: Selects the set of headers to use for requests.'default'
: Uses the default set of headers.'custom'
: Uses a custom set of headers.
api = LensAPI(config={'header_type': 'custom'})
-
proxy
: Specifies a proxy server for making requests. Supports HTTP, HTTPS, and SOCKS proxies.api = LensAPI(config={'proxy': 'socks5://127.0.0.1:2080'})
-
cookies
: Manages cookies for the session. Can be a file path to a Netscape format cookie file, a cookie string, or a cookie dictionary.api = LensAPI(config={'cookies': '/path/to/cookie_file.txt'})
api = LensAPI(config={'cookies': '__Secure-ENID=...; NID=...'})
api = LensAPI(config={'cookies': {'__Secure-ENID': {'name': '...', 'value': '...', 'expires': ...}, 'NID': {'name': '...', 'value': '...', 'expires': ...}}})
-
sleep_time
: Sets the delay in milliseconds between consecutive API requests. This is particularly useful in batch processing to avoid overloading the server.api = LensAPI(config={'sleep_time': 500}) # Set a 500ms delay
-
debug_out
: Specifies the file path to save the raw API response for debugging purposes when the logging level is set toDEBUG
.api = LensAPI(config={'debug_out': '/path/to/response_debug.txt'})
Cookie Management
This project supports the management of cookies through various methods.
To receive cookies in Netscape format, you can use the following extensions:
- Chrome (Chromium): Cookie Editor
- Firefox: Cookie Editor
-
Loading Cookies from a Netscape Format File:
- You can load cookies from a Netscape format file by specifying the file path.
Programmatic API:
config = { 'headers': { 'cookie': '/path/to/cookie_file.txt' } } api = LensAPI(config=config)
CLI:
lens_scan path/to/image.jpg all -c /path/to/cookie_file.txt
-
Passing Cookies Directly as a String:
- You can also pass cookies directly as a string in the configuration or via CLI.
Programmatic API:
config = { 'headers': { 'cookie': '__Secure-ENID=17.SE=-dizH-; NID=511=---bcDwC4fo0--lgfi0n2-' } } api = LensAPI(config=config)
or
config = { 'headers': { 'cookie': { '__Secure-ENID': { 'name': '__Secure-ENID', 'value': '', 'expires': 1756858205, }, 'NID': { 'name': 'NID', 'value': '517=4.......', 'expires': 1756858205, } } } } api = LensAPI(config=config)
Proxy Support
You can make requests through a proxy server using the API or CLI. The library supports HTTP, HTTPS, and SOCKS4/5 proxies.
-
Set Proxy in API:
config = { 'proxy': 'socks5://127.0.0.1:2080' } api = LensAPI(config=config)
-
Set Proxy in CLI:
lens_scan path/to/image.jpg all -p socks5://127.0.0.1:2080
Programmatic API Methods
get_all_data(image_source)
: Returns all available data for the given image source (file path or URL).get_full_text(image_source)
: Returns only the full text from the image source.get_text_with_coordinates(image_source)
: Returns text along with its coordinates in JSON format from the image source.get_stitched_text_smart(image_source)
: Returns stitched text using the enhanced method from the image source.get_stitched_text_sequential(image_source)
: Returns stitched text using the basic sequential method from the image source.
Working with Coordinates
In our project, coordinates are used to define the position, size, and rotation of text on an image. Each text region is described by a set of values that help accurately determine where and how to display the text. Here's how these values are interpreted:
-
Y Coordinate: The first value in the coordinates array represents the vertical position of the top-left corner of the text region on the image. The value is expressed as a fraction of the image's total height, with
0.0
corresponding to the top edge and1.0
to the bottom. -
X Coordinate: The second value indicates the horizontal position of the top-left corner of the text region. The value is expressed as a fraction of the image's total width, where
0.0
corresponds to the left edge and1.0
to the right. -
Width: The third value represents the width of the text region as a fraction of the image's total width. This value determines how much horizontal space the text will occupy.
-
Height: The fourth value indicates the height of the text region as a fraction of the image's total height.
-
Fifth Parameter: In the current data, this parameter is always zero and appears to be unused. It might be reserved for future use or specific text modifications.
-
Sixth Parameter: Specifies the rotation angle of the text region in degrees. Positive values indicate clockwise rotation, while negative values indicate counterclockwise rotation.
Coordinates are measured from the top-left corner of the image. This means that (0.0, 0.0)
corresponds to the very top-left corner of the image, while (1.0, 1.0)
corresponds to the very bottom-right corner.
{
"text": "Sample text",
"coordinates": [
0.5,
0.5,
0.3,
0.1,
0,
-45
]
}
In this example:
0.5
— Y coordinate (50% of the image height, text centered vertically).0.5
— X coordinate (50% of the image width, text centered horizontally).0.3
— width of the text region (30% of the image width).0.1
— height of the text region (10% of the image height).0
— not used, default value (possibly reserved for future use).-45
— rotation angle of the text counterclockwise by 45 degrees.
These values are used to accurately place, scale, and display the text on the image.
You can choose the coordinate output format: percentages or pixels. By default, coordinates are output in percentages, but you can switch to pixels using the appropriate settings.
When using the command line, you can specify the coordinate format using the --coordinate-format
flag. Acceptable values are 'percent'
or 'pixels'
.
Usage Examples:
-
Output coordinates in percentages (default):
lens_scan image.jpg coordinates
-
Output coordinates in pixels:
lens_scan image.jpg coordinates --coordinate-format=pixels
When using the programmatic API, you can pass the coordinate_format
parameter to the methods of the LensAPI
class. Acceptable values are 'percent'
or 'pixels'
.
Usage Example:
from lens_api import LensAPI
api = LensAPI()
# Path to the image
image_path = 'image.jpg'
# Get data with coordinates in pixels
result = api.get_all_data(image_path, coordinate_format='pixels')
print(result)
- When selecting the
'pixels'
format, coordinates will be calculated relative to the original dimensions of the image, even if the image was resized for processing. - If the format is not specified, coordinates are output in percentages by default.
- When working with pixel coordinates, ensure you use the original image for accurate placement of text regions.
Debugging and Logging
When using the CLI tool lens_scan
, you can control the logging level using the --debug
flag. There are two levels available:
--debug=info
: Enables logging of informational messages, which include general information about the processing steps.--debug=debug
: Enables detailed debugging messages, including verbose output and the saving of the raw response from the API to a file namedresponse_debug.txt
in the current directory.
Example Usage:
-
To run with informational logging:
lens_scan path/to/image.jpg all --debug=info
-
To run with detailed debugging logging:
lens_scan path/to/image.jpg all --debug=debug
When using --debug=debug
, the library will save the raw response from the API to response_debug.txt
in the current working directory. This can be useful for deep debugging and understanding the exact response from the API.
When using the API in your Python scripts, you can control the logging level by configuring the logging module and by passing the logging_level
parameter when instantiating the LensAPI
class.
Example Usage:
import logging
from chrome_lens_py import LensAPI
# Configure logging
logging.basicConfig(level=logging.DEBUG)
# Instantiate the API with the desired logging level
api = LensAPI(logging_level=logging.DEBUG)
# Process an image
result = api.get_all_data('path/to/image.jpg')
print(result)
The logging_level
parameter accepts standard logging levels from the logging
module, such as logging.INFO
, logging.DEBUG
, etc.
When the logging level is set to DEBUG
, the library will output detailed debugging information and save the raw API response to response_debug.txt
in the current directory.
The --debug-out
flag will allow you to specify the path where to save the response from the server, in the case of the debug level DEBUG
. By default, it is saved, as described above, in the folder where the console is launched, that is, in CWD
- INFO level: Provides general information about the process, such as when requests are sent and responses are received.
- DEBUG level: Provides detailed information useful for debugging, including internal state and saved responses.
Configuration Management
When running the CLI tool lens_scan
, the application determines settings based on the following priority order (from highest to lowest):
- Command-line arguments (CLI): Options specified directly when running the command have the highest priority.
- Environment variables: If a setting is not specified in the CLI, the application will check for corresponding environment variables.
- Configuration file: If a setting is not found in the CLI arguments or environment variables, the application will look into the configuration file.
- Default values: If a setting is not specified in any of the above, default values are used.
- The default configuration file is located in the user's configuration directory, which varies by operating system:
- Windows:
C:\Users\<YourUserName>\.config\chrome-lens-py\config.json
- Unix/Linux:
/home/<YourUserName>/.config/chrome-lens-py/config.json
- macOS:
/Users/<YourUserName>/Library/Application Support/chrome-lens-py/config.json
- Windows:
-
You can specify a custom configuration file using the
--config-file
flag:lens_scan --config-file path/to/your/config.json <image_source> <data_type>
-
When a custom configuration file is specified, it is treated as read-only and will not be modified by the application.
The configuration file is a JSON file that can include the following settings:
-
proxy
: Specify a proxy server to route requests.{ "proxy": "socks5://username:[email protected]:1080" }
-
cookies
: Specify cookies to use with requests. This can be a path to a cookies file or a cookie string.{ "cookies": "path/to/your/cookie_file.txt" }
or
{ "cookies": "__Secure-ENID=17.SE=-dizH-; NID=511=---bcDwC4fo0--lgfi0n2-" }
-
coordinate_format
: Set the format of output coordinates. Acceptable values are"percent"
or"pixels"
.{ "coordinate_format": "pixels" }
-
debug
: Set the logging level. Acceptable values are"info"
or"debug"
.{ "debug": "debug" }
-
data_type
: Set the type of output data.{ "data_type": "all" }
Here is an example of a configuration file that includes all possible configuration parameters:
{
"proxy": "socks5://username:[email protected]:1080",
"cookies": "path/to/your/cookie_file.txt",
"coordinate_format": "pixels",
"debug": "debug"
}
-
To update the default configuration file with new settings from the CLI, use the
-uc
or--update-config
flag.lens_scan <image_source> <data_type> [options] -uc
-
Note: The configuration file will only be updated if it's the default configuration file (i.e., not specified via
--config-file
). -
Only specific settings will be updated:
-
Settings that can be updated:
coordinate_format
debug
data_type
-
Settings that will not be updated:
proxy
cookies
image_source
-
-
This allows you to persist certain settings across runs without affecting critical configurations like proxy settings or cookies.
-
Updating the coordinate format in the default configuration file:
lens_scan path/to/image.jpg all --coordinate-format=pixels -uc
- This command will set the coordinate format to pixels for the current run and update the default configuration file so that future runs will also use pixels as the coordinate format.
-
Using a proxy without updating the configuration file:
lens_scan path/to/image.jpg all -p socks5://127.0.0.1:2080
- The proxy setting will be used for this run but will not be saved to the configuration file.
-
Specifying a custom configuration file (read-only):
lens_scan --config-file path/to/config.json path/to/image.jpg all
- The application will use settings from the specified configuration file but will not modify it, even if the
-uc
flag is used.
- The application will use settings from the specified configuration file but will not modify it, even if the
You can also specify settings via environment variables:
-
LENS_SCAN_PROXY
: Set the proxy server.export LENS_SCAN_PROXY="socks5://username:[email protected]:1080"
-
LENS_SCAN_COOKIES
: Provide cookies.export LENS_SCAN_COOKIES="__Secure-ENID=17.SE=-dizH-; NID=511=---"
-
LENS_SCAN_CONFIG_PATH
: Specify a custom configuration file.export LENS_SCAN_CONFIG_PATH="path/to/your/config.json"
Batch Processing
This project supports batch processing of images when a directory path is provided instead of a single image file. The application will process all image files in the specified directory.
To perform batch processing via the command line, simply provide the path to the directory containing the images instead of a single image file.
lens_scan path/to/directory <data_type> [options]
path/to/directory
: Path to the directory containing image files.<data_type>
: Type of data to extract (e.g.,all
,full_text_default
, etc.).[options]
: Additional options such as--out-txt
.
Example:
lens_scan /path/to/images all --out-txt=per_file
The --out-txt
flag allows you to control how the output is saved when processing multiple images:
--out-txt=per_file
: Outputs each result to a separate text file based on the image name within the same directory.--out-txt=filename.txt
: Outputs all results into a single text file with the specified name within the same directory.- No
--out-txt
flag: By default, all results are saved into a file namedoutput.txt
within the same directory.
Examples:
-
Output to Separate Files Per Image:
lens_scan /path/to/images all --out-txt=per_file
This command processes all images in
/path/to/images
and saves each result to a separate text file named after the image (e.g.,image1.txt
,image2.txt
). -
Output All Results to a Single File:
lens_scan /path/to/images all --out-txt=results.txt
This command processes all images and saves all results into
results.txt
within the same directory. -
Default Output (output.txt):
lens_scan /path/to/images all
Without specifying
--out-txt
, the results are saved intooutput.txt
within the same directory.
When outputting to a single file (default behavior or when specifying a filename with --out-txt
), the format of the output file is:
#filename1.jpg
Extracted text from filename1.jpg
#filename2.png
Extracted text from filename2.png
...
Each image's extracted text is prefixed with a #
followed by the filename, and the text retains the original formatting, including newline characters.
To avoid overwhelming the API and to comply with rate limiting policies, the library introduces a delay between processing each image. By default, this sleep time is set to 1000 milliseconds (1 second). You can adjust this delay using the -st
or --sleep-time
flag, specifying the time in milliseconds.
Example:
lens_scan /path/to/images all -st 500
This command sets the sleep time to 500 milliseconds between processing each image.
You can also perform batch processing using the Python API by providing a directory path to the methods.
Example:
from chrome_lens_py import LensAPI
api = LensAPI(sleep_time=500) # Set sleep time to 500 milliseconds
# Path to the directory containing images
directory_path = '/path/to/images'
# Process the directory to extract full text from each image
results = api.get_full_text(directory_path)
# Iterate through the results
for filename, text in results.items():
if 'error' in text:
print(f"Error processing {filename}: {text['error']}")
else:
print(f"# {filename}")
print(text)
print()
- Supported Image Files: Only image files with supported MIME types will be processed. Non-image files or unsupported formats will be ignored.
- Adjusting Sleep Time: The sleep time between requests can be adjusted to meet your needs, but be cautious when reducing it to prevent being rate-limited by the API.
- Error Handling: If an error occurs while processing an image, the error message will be stored in the results under that filename.
- Output Files: When using
--out-txt=per_file
, the output text files will be saved in the same directory as the images, with the same base filename and a.txt
extension.
/chrome-lens-api-py
│
├── /src
│ ├── /chrome_lens_py
│ │ ├── __init__.py # Package initialization
│ │ ├── constants.py # Constants used in the project
│ │ ├── utils.py # Utility functions
│ │ ├── image_processing.py # Image processing module
│ │ ├── request_handler.py # API request handling module
│ │ ├── text_processing.py # Text processing module
│ │ ├── lens_api.py # API interface for use in other scripts
│ │ └── main.py # CLI tool entry point
│
├── setup.py # Installation setup
├── README.md # Project description and usage guide
└── requirements.txt # Project dependencies
Special thanks to dimdenGD for the method of text extraction used in this project. You can check out their work on the chrome-lens-ocr repository. This project is inspired by their approach to leveraging Google Lens OCR functionality.
- Add
scan by url
- Add output in pixels
- Move all methods from chrome-lens-ocr
- cookie!?
- Do everything beautifully, and not like 400 lines of code, cut into modules by GPT chat
- Something else very, very important...
This project is licensed under the MIT License. See the LICENSE file for more details.
This project is intended for educational purposes only. The use of Google Lens OCR functionality must comply with Google's Terms of Service. The author of this project is not responsible for any misuse of this software or for any consequences arising from its use. Users are solely responsible for ensuring that their use of this software complies with all applicable laws and regulations.