myGPTReader

myGPTReader is a slack bot that can read any webpage, ebook or document and summarize it with chatGPT. It can also talk to you via voice using the content in the channel.

For now it is in development, but you can try it out by join this channel.

The exciting part is that the development of this project is also paired with chatGPT. I document the development process in this CDDR file.

Features

Integrated with slack bot
- Bot replies messages in the same thread
Support web page reading with chatGPT
- Consider to use cloudflare worker to scrape the html content
  - Self-hosting Web Scraper
  - Restrict to access the web scraper, only allow the API server to access it by Cloudflare Access
- Consider to use a headless browser to scrape the web page content like twitter thread
- Consider to use OCR to scrape the web page content (Web crawler to get the screenshot, then OCR to get the text)
  - ~~Azure OCR~~
  - Google Vision
  - may use GPT4
Support RSS reading with chatGPT
- RSS is a bunch of links, so it is equivalent to reading a web page to get the content.
~~Support newsletter reading with chatGPT~~
- Most newsletters are public and can be accessed online, so we can just give the url to the slack bot.
Prompt fine-tue
- Support for custom prompt
- Show prompt templates by slack app slash commands
- Auto collect the good prompt to #gpt-prompt channel by message shortcut
Cost saving
- by caching the web page llama index
  - ~~Consider to use sqlite-vss to store and search the text embeddings~~
  - ~~Use chromadb to store and search the text embeddings~~
  - Use the llama index file to restore the index
- Consider to use sentence-transformers or txtai to generate embeddings (vectors)
  - Not good as the embeddings of OpenAI, rollback to use the OpenAI embeddings, and if enable to use the custom embeddings, the minimum of server's memory is 2GB which still increase the cost.
- Consider to fine-tue the chunk size of index node and prompt to save the cost
  - If the chunk size is too big, it will cause the index node to be too large and the cost will be high.
Bot can read historical messages from the same thread, thus providing context to chatGPT
- Changing the number of output tokens
Index fine-tune
- Use the GPTListIndex to summarize multiple URLs
- Use the GPTTreeIndex with summarize mode to summarize a single web page
Bot regularly send hot ~~summarizes(expensive cost)~~ news in the slack channel (#daily-news)
- ~~Refer to this approach~~
  - World News
    - Zhihu daily hot answers
    - V2EX daily hot topics
    - 1point3acres daily hot topics
    - Reddit world hot news
  - Dev News
    - Hacker News daily hot topics
    - Product Hunt daily hot topics
  - Invest News
    - Xueqiu daily hot topics
    - Jisilu daily hot topics
Support file reading and analysis 💥
- Considering the expensive billing, it needs to use the slack userID whitelist to restrict the access this feature
- Need to cache the file Documents to save extract cost
- EPUB
- DOCX
- MD
- TEXT
- PDF
  - Use Google Vision to handle the PDF reading
- Image
  - may use GPT4
Support voice reading ~~with self-hosting whisper~~
- (whisper -> chatGPT -> azure text2speech) to play language speaking practices 💥
- Support language
  - Chinese
  - English
    - 🇺🇸
    - 🇬🇧
    - 🇦🇺
    - 🇮🇳
  - Japanese
  - German
Integrated with Azure OpenAI Service
User access limit
- Limit the number of requests to bot per user per day to save the cost
Support discord bot ❓
Rewrite the code in Typescript ❓
Upgrade chat model (gpt-3.5-turbo) to GPT4 (gpt-4-0314) 💥
Documentation
Publish bot to make it can be used in other workspaces
- Slack marketplace

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
app		app
docs/chatGPT		docs/chatGPT
.dockerignore		.dockerignore
.env_sample		.env_sample
.gitignore		.gitignore
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
fly.toml		fly.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

myGPTReader

Features

About

Releases

Packages

Languages

License

peixiaobin/myGPTReader

Folders and files

Latest commit

History

Repository files navigation

myGPTReader

Features

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages