This project contains a library to convert Emacs org-mode files to XML: om-to-xml.
The goal is a complete and accurate translation of the internal
org-mode
data structures to XML. This produces XML that isn’t
especially pretty, but the assumption is that downstream XML
processing tools can be used to transform it.
- Renamed default branch to
main
. - Updated dependency (the
om.el
package was renamedorg-ml
)
This projects started began with org-to-xml, but development stalled when I realized that I didn’t really have a solid understanding of the underlying Org data structures and what I really needed to write first was a library that provided a consistent API that I could use in the conversion. (This is not a criticism of the Org developers; there are still corners of Emacs lisp that I don’t understand very well.)
Lo and behold! Nate Dwarshuis has written just such a library! The
om-to-xml
library is a rewrite of my XML conversion on top of that
library.
It does a more complete and accurate job and has a few extensibility
mechanisms that org-to-xml
did not:
- Whitespace handling
- Attribute handling
- Custom element processing
Install the library somewhere. I use straight.el, but you can just download the library and put it on your load path any way you like.
Run m-x om-to-xml
in a buffer where you’re viewing an Org mode file.
An XML representation will be constructed and saved. If the Org mode
file is notes.org
, the XML file will be saved in notes.xml
.
For the curious, here’s how it works.
Consider, an org file:
#+TITLE: Some Title
#+AUTHOR: Norman Walsh
#+DATE: 2019-02-19
A paragraph with <markup> in it. This isn’t intended to be meaningful
or useful.
* First level heading
:PROPERTIES:
:CUSTOM_ID: first
:END:
** TODO This is an example TODO item.
DEADLINE: <2019-02-26 Tue +1w>
:PROPERTIES:
:CREATED: [2019-02-19 Tue 06:39]
:SRC: [[file:/projects/emacs/org-to-xml/README.md::For%20the%20curious,%20here%E2%80%99s%20how%20it%20works.]]
:END:
See [[https://orgmode.org/][org-mode]] for more information about ~org-mode~.
- First, it’s parsed by
om
, swaths of which I’ve elided:((keyword (:key "TITLE" :value "Some Title")) (keyword (:key "AUTHOR" :value "Norman Walsh")) (keyword (:key "DATE" :value "2019-02-19")) (paragraph (:parent (section (:post-affiliated 64) #1)) #("A paragraph with <markup> in it. This isn’t intended to be meaningful or useful. " 0 81 (:parent #1))) (headline (:raw-value "First level heading" :level 1 :CUSTOM_ID "first" :title (#("First level heading" 0 19 (:parent #1)))) (section (:parent #1) (property-drawer (:parent #2) (node-property (:key "CUSTOM_ID" :value "first" :parent #3)))) (headline (:raw-value "This is an example TODO item." :level 2 :todo-keyword #("TODO" 0 4 (fontified t face org-todo)) :todo-type todo :deadline (timestamp (:type active :raw-value "<2019-02-26 Tue +1w>" :year-start 2019 :month-start 2 :day-start 26 :year-end 2019 :month-end 2 :day-end 26 :repeater-type cumulate :repeater-value 1 :repeater-unit week)) :SRC "[[file:/projects/emacs/org-to-xml/README.md::For%20the%20curious,%20here%E2%80%99s%20how%20it%20works.]]" :CREATED "[2019-02-19 Tue 06:39]" :title (#("This is an example TODO item." 0 29 (:parent #2))) :parent #1) (section (:parent #2) (planning (:closed nil :deadline (timestamp (:type active :raw-value "<2019-02-26 Tue +1w>" :year-start 2019 :month-start 2 :day-start 26 :year-end 2019 :month-end 2 :day-end 26 :repeater-type cumulate :repeater-value 1 :repeater-unit week)) :parent #3)) (property-drawer (:parent #3) (node-property (:key "CREATED" :value "[2019-02-19 Tue 06:39]" :parent #4)) (node-property (:key "SRC" :value "[[file:/projects/emacs/org-to-xml/README.md::For%20the%20curious,%20here%E2%80%99s%20how%20it%20works.]]" :parent #4))) (paragraph (:parent #3) #("See " 0 4 (:parent #4)) (link (:type "https" :path "//orgmode.org/" :format bracket :raw-link "https://orgmode.org/" :parent #4) #("org-mode" 0 8 (:parent #5))) #("for more information about " 0 27 (:parent #4)) (code (:value "org-mode" :parent #4)) #(". " 0 2 (:parent #4)))))))
- A buffer is created to store the XML, and the om data is traversed emiting XML elements for each node. The node properties become attributes, except for ignored properties and properties that come from a properties drawer which are always ignored.
- If a post-processing function has been provided, it is run.
- Save the file.
<?xml version="1.0"?> <!-- Converted from org-mode to XML by om-to-xml version 0.0.1 --> <!-- See https://github.com/ndw/org-to-xml --> <document xmlns="https://nwalsh.com/ns/org-to-xml"> <keyword key="TITLE" value="Some Title"/> <keyword key="AUTHOR" value="Norman Walsh"/> <keyword key="DATE" value="2019-02-19"/> <paragraph>A paragraph with <markup> in it. This isn’t intended to be meaningful or useful. </paragraph> <headline raw-value="First level heading" level="1"><title>First level heading</title> <section> <property-drawer> <node-property key="CUSTOM_ID" value="first"/> </property-drawer> </section> <headline raw-value="This is an example TODO item." level="2" todo-keyword="TODO" todo-type="todo" SRC="[[file:/projects/emacs/org-to-xml/README.md::For%20the%20curious,%20here%E2%80%99s%20how%20it%20works.]]" CREATED="[2019-02-19 Tue 06:39]"><deadline><timestamp type="active" raw-value="<2019-02-26 Tue +1w>" year-start="2019" month-start="2" day-start="26" year-end="2019" month-end="2" day-end="26" repeater-type="cumulate" repeater-value="1" repeater-unit="week"/></deadline><title>This is an example TODO item.</title> <section><planning/> <property-drawer> <node-property key="CREATED" value="[2019-02-19 Tue 06:39]"/> <node-property key="SRC" value="[[file:/projects/emacs/org-to-xml/README.md::For%20the%20curious,%20here%E2%80%99s%20how%20it%20works.]]"/> </property-drawer> <paragraph>See <link type="https" path="//orgmode.org/" format="bracket" raw-link="https://orgmode.org/">org-mode</link> for more information about <code value="org-mode"/>. </paragraph> </section> </headline> </headline></document>
It’s been twenty years since I tried to do anything much more interesting than a keybinding in elisp. I expect the code, especially the tree walking, is embarrassingly crude. Suggestions for improvement, or simply pointers to the bits of the elisp manual I should read again, most humbly solicited.
I also confess, I’m completely winging it on current function naming/namspacing conventions.
There are two obvious ways to approach the problem of converting .org files to .xml.
- Use the ox framework.
- Do it the hard way.
My goal in this project is to have a complete dump of the org
structures in XML. That rules out the ox
framework. The ox
framework is definitely the place to start if you want to convert from
an unknown org file and extract the information that you know about.
But it flattens structures like the property drawer so that it’s
impossible to extract everything with fidelity, even the things you
don’t know about.
So this code attempts to do it the hard way. But I’m also lying when I say I want a complete dump of the org structures. I want a dump of the meaningful structures. One person’s meaning is another person’s pointless cruft, however.
Examples of structures I don’t consider meaningful:
- The
pre-blank
andpost-blank
properties that the org data structures use to encode spaces in some circumstances. - Leading blanks in code blocks.
- Leading spaces in paragraphs.
It’s likely that this list will grow as I learn more about the Org data strutures. Unless I give up on this project altogether, of course.
You can still find org-to-xml.el
in this repository’s history (at tag
0.0.5, for example). I’ve removed it from the master branch because I
really think it’s a dead end and it caused confusion for at least some
users.