Skip to content

Configure a marker api endpoint and use it to convert your pdfs into markdowns with best ocr, embedded tables, images and every other benefit of marker!

License

Notifications You must be signed in to change notification settings

L3-N0X/obsidian-marker

Repository files navigation

title-banner

Maintenance GitHub issues GitHub Release GitHub License Marker API

🌟 Introduction

Welcome to this Obsidian PDF to Markdown Converter! This plugin brings the power of advanced PDF conversion directly into your Obsidian vault. By leveraging the capabilities of Marker through a self-hosted API or by using the hosted solution on datalab.to, this plugin offers a seamless way to transform your PDFs into rich, formatted Markdown files, with support for tables, formulas and more!

Important

This plugin requires a Marker API endpoint to function. Without an endpoint, the application won't work.

You can find the related repositories here:

🚀 Features

  • OCR Capabilities: Convert scanned PDFs to rich markdown
  • Formula Detection: Accurately captures and converts mathematical formulas
  • Table Extraction: Preserves table structures in your Markdown output
  • Image Handling: Extracts and saves images from your PDFs and includes them in the markdown
  • Mobile Compatibility: Works on both desktop and mobile Obsidian apps
  • Flexible Output: Choose between full content extraction or specific elements (text/images)

🛠 Why This Plugin?

  1. Superior Extraction: Utilizes the Marker project's advanced AI model for high-quality conversions
  2. Mobile Accessibility: Unlike many converters, this works seamlessly on mobile devices (when the API is accessible)
  3. Customizable: Tailor the conversion process to your specific needs
  4. Obsidian Integration: Converts PDFs directly within your Obsidian environment

♥️ Support the Project

If you enjoy this plugin, feel free to star the repository and share it with others! When you want to support the development, consider buying me a coffee:

📋 Requirements

To use this plugin, you'll need:

  1. A working Obsidian installation
  2. Access to a Marker API endpoint (self-hosted or paid service)

🔧 Setup

  1. Install the plugin in your Obsidian vault

  2. (Optional) Set up the self-hosted Marker API:

    • Use Docker on a machine with a solid GPU/CPU
    • (Optional) Make the endpoint available to other devices (e.g., using Tailscale)
    • Alternatively, host in the cloud or run the Python server as needed
  3. Configure your Marker API endpoint in the plugin settings

Which solution should I use?

Solution Pros Cons
Hosted on datalab.to (recommended) No setup required, fast and reliable, supports the developer and is easily accessible from anywhere Costs a few dollars
Self-Hosted via Docker Full control over the conversion process, no costs for the API - Requires a powerful machine, Setup can be complex for beginners
Self-Hosted via Python Easy to set up, no Docker required May be slower than the Docker solution, less control over the process

⚙️ Settings

Setting Default Description
API Endpoint 'datalab' Select the API endpoint to use, either 'Datalab' or 'Selfhosted'.
Marker API Endpoint 'localhost:8000' The endpoint to use for the Marker API. Only shown when 'Selfhosted' is selected as the API endpoint.
API Key - Enter your Datalab API key. Only shown when 'Datalab' is selected as the API endpoint.
Languages - The languages to use if OCR is needed, separated by commas. Only shown when 'Datalab' is selected as the API endpoint.
Force OCR false Force OCR (Activate this when auto-detect often fails, make sure to set the correct languages). Only shown when 'Datalab' is selected as the API endpoint.
Paginate false Add horizontal rules between each page. Only shown when 'Datalab' is selected as the API endpoint.
New Folder for Each PDF true Create a new folder for each PDF that is converted.
Move PDF to Folder false Move the PDF to the folder after conversion. Only shown when 'New Folder for Each PDF' is enabled.
Create Asset Subfolder true Create an asset subfolder for images.
Extract Content 'all' Select the content to extract from the PDF. Options: 'Extract everything', 'Text Only', 'Images Only'.
Write Metadata true Write metadata as frontmatter in the Markdown file.
Delete Original PDF false Delete the original PDF after conversion.

🙏 Acknowledgements

This plugin wouldn't be possible without the incredible work of:

  • Marker Project: The AI model powering the conversions
  • Marker API: The API that enables self-hosting of the conversion service

A huge thank you to these projects for their contributions to the community!

🐛 Troubleshooting

If you encounter issues related to the plugin itself, please open an issue in this repository. For problems with the conversion process or API, please refer to the Marker and Marker API repositories.

🤝 Contributing

Contributions, issues, and feature requests are welcome! Feel free to check the issues page.


Happy converting! 📚➡️📝


Star History Chart

About

Configure a marker api endpoint and use it to convert your pdfs into markdowns with best ocr, embedded tables, images and every other benefit of marker!

Resources

License

Stars

Watchers

Forks