GPTS-Crawler-DataSet

Obtain detailed metadata for each GPT from various internet channels.
Make the collected GPTS dataset public.

https://www.topgpts.club/

README.md

en English
zh_CN 简体中文

Features

Very high success rate in crawling.
Supports retrying in case of exceptions.
Supports resuming from breakpoints.

GPTS Dataset

gizmos.jsonl file

Each line contains complete metadata of a GPT, formatted as follows:

{
   "id": "g-09h5uQiFC",
   "organization_id": "org-DBPI2J2yWFv4MX06zS0084p2",
   "short_url": "g-09h5uQiFC-ms-roxana",
   "author": {
      "user_id": "user-D1v1q4QlhTH4hw9dGQZFxH1O",
      "display_name": "robotsbuildingeducation.com",
      "link_to": "https://robotsbuildingeducation.com",
      "selected_display": "website",
      "is_verified": true
   },
   "voice": {
      "id": "ember"
   },
   "workspace_id": null,
   "model": null,
   "instructions": null,
   "settings": null,
   "display": {
      "name": "Ms. Roxana",
      "description": "The AI Mentor",
      "welcome_message": "Hello",
      "prompt_starters": [
         "Hola... let's learn 😁"
      ],
      "profile_picture_url": "https://files.oaiusercontent.com/file-qcwptAh58EBhwh7c9gs3om63?se=2123-10-15T10%3A53%3A35Z&sp=r&sv=2021-08-06&sr=b&rscc=max-age%3D31536000%2C%20immutable&rscd=attachment%3B%20filename%3DEBOOK%2520%25282%2529.png&sig=ANxSurYw7dfGjpzlehF1PWJKQB4kp2Uok3DHfAw0Trg%3D",
      "categories": []
   },
   "share_recipient": "marketplace",
   "updated_at": "2023-11-17T02:09:37.466844+00:00",
   "last_interacted_at": null,
   "tags": [
      "public",
      "reportable"
   ],
   "version": null,
   "live_version": null,
   "training_disabled": null,
   "allowed_sharing_recipients": null,
   "review_info": null,
   "appeal_info": null,
   "vanity_metrics": null
}

Crawling Data

Ensure Node.js >= 16 is installed.
Clone the project:

git clone https://github.com/ahaapple/GPTS-Crawler-Dataset

Install dependencies:

npm i

npx playwright install

Update the gpts-url-list file.
Crawl GPTS metadata:

npm start

New GPTS metadata crawled will be appended to the gizmos.jsonl file.

Crawling GitHub Issue Comment

First, modify the following configuration in the issue file:

const owner = 'airyland';
const repo = 'gptshunter.com';
const issueNumber = 1;
// you could get your token refer to  https://docs.github.com/en/[email protected]/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens
const token = 'your github token';

Then execute:

npm run issue

Deduplicate `gpts-urls` File

npm run deduplicate-urls

Deduplicate GPTS Dataset File

npm run deduplicate-gpts

Contributions Welcome

We welcome everyone to contribute to the GPTS public dataset. You can contribute in the following ways:

Comment your GPTS URL in the issue at ahaapple#1.
Directly update the gpts-url-list file with your GPTS URL.
Directly update the gizmos.jsonl file with your crawled metadata.

Roadmap

Support more data sources.
Handle cases where the gizmos.jsonl file becomes very large.

Thanks To

gpts-works: https://github.com/all-in-aigc/gpts-works
gptshunter issue data source: airyland/gptshunter.com#1
GPTHub data source: https://github.com/lencx/GPTHub/blob/main/gpthub.json

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.gitignore		.gitignore
README.md		README.md
README.zh_CN.md		README.zh_CN.md
gizmos.jsonl		gizmos.jsonl
gpts-url-list		gpts-url-list
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPTS-Crawler-DataSet

README.md

Features

GPTS Dataset

Crawling Data

Crawling GitHub Issue Comment

Deduplicate `gpts-urls` File

Deduplicate GPTS Dataset File

Contributions Welcome

Roadmap

Thanks To

About

Releases

Packages

Languages

TuongPhuong/GPTS-Crawler-Dataset

Folders and files

Latest commit

History

Repository files navigation

GPTS-Crawler-DataSet

README.md

Features

GPTS Dataset

Crawling Data

Crawling GitHub Issue Comment

Deduplicate gpts-urls File

Deduplicate GPTS Dataset File

Contributions Welcome

Roadmap

Thanks To

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Deduplicate `gpts-urls` File

Packages