Pinned Loading
-
llava-Qwen2-7B-Instruct-Chinese-CLIP
llava-Qwen2-7B-Instruct-Chinese-CLIP Public模型 llava-Qwen2-7B-Instruct-Chinese-CLIP 增强中文文字识别能力和表情包内涵识别能力,接近gpt4o、claude-3.5-sonnet的识别水平!
-
chinese-meme-description-dataset
chinese-meme-description-dataset Public为了促进小模型在图像文本描述任务上的性能提升,本研究结合两个高质量的中文表情包数据集,并利用 Gemini-1.5-pro,Gemini-1.5-flash,Gemini-1.0-pro-vision,gpt4o,claude-3.5-sonnet,Yi-Vision 六种大型语言模型 (LLM) 对数据集进行高质量的标注,生成丰富的图像-文本描述。
-
ImageText-Question-answer-pairs-58K-Claude-3.5-Sonnnet
ImageText-Question-answer-pairs-58K-Claude-3.5-Sonnnet PublicFrom the VisualGenome dataset V1.2, 21717 images were randomly selected. Using the Claude-3-opus-20240229 and Claude-3-sonnet-20240620 models, a total of 58312 question-answer pairs were generated,…
Python
-
podcast-player
podcast-player Public一个基于PyQt5开发的播客播放器,具有音频播放、字幕显示和实时翻译功能。支持Google翻译、Gemini和SiliconCloud三种翻译服务,可实时显示双语字幕,并支持单词级别的同步高亮。播放器还具有字幕缓存、历史记录等功能,为用户提供流畅的播客学习体验。
Python 4
-
VisualDataset100K
VisualDataset100K PublicVisualDataset100K: A comprehensive image question-answering dataset created using large vision-language models. It includes 100K detailed image descriptions, 100K & 58K Q&A pairs, and datasets for …
Python 1
If the problem persists, check the GitHub status page or contact support.