Skip to content

Commit

Permalink
Merge pull request eosphoros-ai#11 from csunny/dev
Browse files Browse the repository at this point in the history
架构方案添加
  • Loading branch information
csunny authored May 4, 2023
2 parents e043467 + 52698a8 commit eb48333
Show file tree
Hide file tree
Showing 4 changed files with 22 additions and 1 deletion.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ A Open Database-GPT Experiment
[DB-GPT](https://github.com/csunny/DB-GPT) 是一个实验性的开源应用程序,它基于[FastChat](https://github.com/lm-sys/FastChat),并使用[vicuna-13b](https://huggingface.co/Tribbiani/vicuna-13b)作为基础模型。此外,此程序结合了[langchain](https://github.com/hwchase17/langchain)[llama-index](https://github.com/jerryjliu/llama_index)基于现有知识库进行[In-Context Learning](https://arxiv.org/abs/2301.00234)来对其进行数据库相关知识的增强。它可以进行SQL生成、SQL诊断、数据库知识问答等一系列的工作。


## 项目方案
![](ttps://github.com/csunny/DB-GPT/blob/dev/asserts/pilot.png)

[DB-GPT](https://github.com/csunny/DB-GPT) is an experimental open-source application that builds upon the [FastChat](https://github.com/lm-sys/FastChat) model and uses vicuna as its base model. Additionally, it looks like this application incorporates langchain and llama-index embedding knowledge to improve Database-QA capabilities.

Overall, it appears to be a sophisticated and innovative tool for working with databases. If you have any specific questions about how to use or implement DB-GPT in your work, please let me know and I'll do my best to assist you.
Expand All @@ -15,6 +18,7 @@ Run on an RTX 4090 GPU (The origin mov not sped up!, [YouTube地址](https://www

![](https://github.com/csunny/DB-GPT/blob/dev/asserts/演示.gif)


- SQL生成示例
首先选择对应的数据库, 然后模型即可根据对应的数据库Schema信息生成SQL

Expand Down
Binary file added asserts/pilot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 3 additions & 1 deletion pilot/configs/model_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,11 @@
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
llm_model_config = {
"flan-t5-base": os.path.join(model_path, "flan-t5-base"),
"vicuna-13b": os.path.join(model_path, "vicuna-13b")
"vicuna-13b": os.path.join(model_path, "vicuna-13b"),
"sentence-transforms": os.path.join(model_path, "all-MiniLM-L6-v2")
}


LLM_MODEL = "vicuna-13b"
LIMIT_MODEL_CONCURRENCY = 5
MAX_POSITION_EMBEDDINGS = 2048
Expand Down
15 changes: 15 additions & 0 deletions pilot/vector_store/extract_tovec.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from pilot.model.vicuna_llm import VicunaEmbeddingLLM
# from langchain.embeddings import SentenceTransformerEmbeddings


embeddings = VicunaEmbeddingLLM()

Expand All @@ -20,4 +22,17 @@ def knownledge_tovec(filename):
return docsearch


# def knownledge_tovec_st(filename):
# """ Use sentence transformers to embedding the document.
# https://github.com/UKPLab/sentence-transformers
# """
# from pilot.configs.model_config import llm_model_config
# embeddings = SentenceTransformerEmbeddings(model=llm_model_config["sentence-transforms"])

# with open(filename, "r") as f:
# knownledge = f.read()

# text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
# texts = text_splitter(knownledge)
# docsearch = Chroma.from_texts(texts, embeddings, metadatas=[{"source": str(i)} for i in range(len(texts))])
# return docsearch

0 comments on commit eb48333

Please sign in to comment.