Created using Python, Javascript and image to text with ChatGPT Vision
-
Ensure you are on 18.17.0 or higher of node
-
you can use NVM to change the version of Node
nvm use 18.17.0
- Ensure Python3 is available
-
create an ENV with your openai api key or use OpenAI's Virtual Enviroment
or
- download the OpenAI vm. This will ensure evey package you might need is ready
python3 -m venv openai-env
- Then activate the enviroment using:
source openai-env/bin/activate
To begin the use of the Agent is fairly simple.
Run the following file in the terminal and you can begin chatting with the agent.
node web_agent.js
The actions the webscraping agent can perform are as follows:
- The agent will do an initial google search based on your initial question.
- Using puppeteer it take a screenshot of the current web browser and change the html to have a "red border" around links or buttons.
- On your next question an underlying process will use ChatGPT Vision to tell puppeteer which link to click next relavent to your request.
- The agent will repeat steps 2, 3 and 4 until it can no longer click on anymore links. (This is a drawback I am seeking to solve)