Name	Name	Last commit message	Last commit date
Latest commit History 10 Commits
tests	tests
xslt	xslt
LICENSE	LICENSE
README.org	README.org
org-to-xml.el	org-to-xml.el

org-to-xml

This is a library to convert Emacs org-mode files to XML. The resulting XML isn’t especially pretty, but that’s not the goal. The goal is a complete and accurate translation of the internal org-mode data structures to XML.

The assumption is that downstream XML processing tools can be used to transform it. I plan add a few XSLT examples to this repository.

For the curious, here’s how it works.

Consider, an org file:

#+TITLE: Some Title
#+AUTHOR: Norman Walsh
#+DATE: 2019-02-19

A paragraph with <markup> in it. This isn’t intended to be meaningful
or useful.

* First level heading
  :PROPERTIES:
  :CUSTOM_ID: first
  :END:

** TODO This is an example TODO item.
   DEADLINE: <2019-02-26 Tue +1w>
   :PROPERTIES:
   :CREATED:  [2019-02-19 Tue 06:39]
   :SRC:      [[file:/projects/emacs/org-to-xml/README.md::For%20the%20curious,%20here%E2%80%99s%20how%20it%20works.]]
   :END:

See [[https://orgmode.org/][org-mode]] for more information about ~org-mode~.

First, it’s parsed by org-element-parse-buffer, swaths of which I’ve elided:

(org-data nil (section (:begin 1 :end 146 :contents-begin 1
:contents-end 145 :post-blank 1 :post-affiliated 1 :parent #0)
(keyword (:key "TITLE" :value "Some Title" :begin 1 :end 21
:post-blank 0 :post-affiliated 1 :parent #1))
(keyword (:key "AUTHOR" :value "Norman Walsh" :begin 21 :end 44
:post-blank 0 :post-affiliated 21 :parent #1))
(keyword (:key "DATE" :value "2019-02-19" :begin 44 :end 64
:post-blank 1 :post-affiliated 44 :parent #1))
(paragraph (:begin 64 :end 145 :contents-begin 64 :contents-end 145
:post-blank 0 :post-affiliated 64 :parent #1) #("A paragraph with
<markup> in it. This isn’t intended to be meaningful or useful. " 0 81
(:parent #2))))
(headline (:raw-value "First level heading" :begin 146 :end 544
:pre-blank 0 :contents-begin 168 :contents-end 544 :level 1 :priority
…
(link (:type "https" :path "//orgmode.org/" :format bracket
:raw-link "https://orgmode.org/" :application nil :search-option nil
:begin 470 :end 505 :contents-begin 494 :contents-end 502 :post-blank
1 :parent #4) #("org-mode" 0 8 (:parent #5))) #("for more information
about " 0 27 (:parent #4)) (code (:value "org-mode" :begin 532 :end
542 :post-blank 0 :parent #4)) #(". " 0 2 (:parent #4)))))

We setup a buffer to store the XML, then walk over this data structure emiting XML elements for each sub-expression. The node properties become attributes, except for the properties listed in org-to-xml-ignore-symbols or properties that come from a properties drawer which are ignored.

Finally, we do a little post-processing cleanup on the XML:

Replace occurrences of <tag …></tag> with <tag …/>.
Remove leading spaces from <paragraph> elements.
Un-indent code blocks so that they begin on the left margin.

And then save the file, swaths of which I have also elided.

<?xml version="1.0"?>
<!-- Converted from org-mode to XML by org-to-xml version 0.0.3 -->
<!-- See https://github.com/ndw/org-to-xml -->
<org-data xmlns="https://nwalsh.com/ns/org-to-xml"><section>
<keyword key="TITLE">Some Title</keyword>
<keyword key="AUTHOR">Norman Walsh</keyword>
<keyword key="DATE">2019-02-19</keyword>
<paragraph>A paragraph with &lt;markup&gt; in it. This isn’t
intended to be meaningful or useful.
</paragraph></section>
<headline level="1"><title>First level heading</title>
…
<link type="https" path="//orgmode.org/" format="bracket"
raw-link="https://orgmode.org/">org-mode</link> for more information
about <code>org-mode</code>.
</paragraph></section></headline></headline></org-data>

It’s been twenty years since I tried to do anything much more interesting than a keybinding in elisp. I expect the code, especially the tree walking, is embarrassingly crude. Suggestions for improvement, or simply pointers to the bits of the elisp manual I should read again, most humbly solicited.

I also confess, I’m completely winging it on current function naming/namspacing conventions.

Pros and Cons

There are two obvious ways to approach the problem of converting .org files to .xml.

Use the ox framework.
Do it the hard way.

My goal in this project is to have a complete dump of the org structures in XML. That rules out the ox framework. The ox framework is definitely the place to start if you want to convert from an unknown org file and extract the information that you know about. But it flattens structures like the property drawer so that it’s impossible to extract everything with fidelity, even the things you don’t know about.

So this code attempts to do it the hard way. But I’m also lying when I say I want a complete dump of the org structures. I wantg a dump of the meaningful structures. One person’s meaning is another person’s pointless cruft, however.

Examples of structures I don’t consider meaningful:

The pre-blank and post-blank properties that the org data structures use to encode spaces in some circumstances.
Leading blanks in code blocks.
Leading spaces in paragraphs.

It’s likely that this list will grow as I learn more about the org-mode data strutures. Unless I give up on this project altogether, of course.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

org-to-xml

Pros and Cons

About

Releases 1

Packages

Languages

License

ndw/org-to-xml

Folders and files

Latest commit

History

Repository files navigation

org-to-xml

Pros and Cons

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages