Convert files into markdown to help RAG or LLM understand, based on markitdown and MinerU, which could provide high quality pdf parser. In current, it support simple pdf model(use pdfminer, it is fast) and advanced pdf model (use mineru with models to parse pdf, it is slow).
FastAPI自带API文档 http://127.0.0.1:20926/docs
请求
curl -X 'POST' \
'http://127.0.0.1:20926/api/jobs' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F '[email protected];type=application/pdf' \
-F 'pdf_mode=advanced'
响应
{
"job_id": "29bbad6b-c167-41f0-8a29-99551c499263"
}
请求
curl -X 'GET' \
'http://127.0.0.1:20926/api/jobs/29bbad6b-c167-41f0-8a29-99551c499263' \
-H 'accept: application/json'
响应
{
"job_id": "29bbad6b-c167-41f0-8a29-99551c499263",
"status": "completed",
"filename": "CoA.pdf",
"params": {
"pdf_mode": "advanced"
},
"error": null
}
请求
curl -X 'GET' \
'http://127.0.0.1:20926/api/jobs/29bbad6b-c167-41f0-8a29-99551c499263/result' \
-H 'accept: application/json'
响应 文件
- 优化Mineru中输出的图像地址为本机地址
- 添加云端解析模式
- 添加简单的web页面