Org Merge Driver Implementation Notes

#+FILETAGS :note:doc:

Project Details

Contact Information

Andrew Young, youngar17 at gmail.com

Milestones [42%]

Start Project

prototype parser (limited functionality)

prototype tree construction

prototype tree change detection

prototype tree merging

finish prototype

tree data interface

org-mode parser

general merging rule interface

org mode change detection

org mode merging rules

additional features

finishing touches

documentation and distribution

Coding Conventions

elisp conventions
must sign a copyright grant for the FSF!
GNU Coding Standards
use Doxygen to document the code

Collaboration

Notes

General Merging Implementation

Elements

an element is any individual element in a file
elements are stored as a hierarchy
- this is to identify not just changes of line-by-line content, but also of relationships (e.g. data is moved)
elements are defined by their mapping entity
elements may not always have a perfectly identifying trait
- i.e. a unique object identifier
defined by:
- how to read elements
- how to write the elements
- what changes are available
- how to map them
- how to merge changes
every element has a parent, and a list of sub-elements
- the sub elements list can be empty
an ordered set of elements are represented by:
- mapping which needs identical parents
- each element is the next one’s parent

Node Identification

Node identification is mapping elements from one file to the same

node in a different file

whether or not two elements are mapped depends on the element type, and it’s mapping rules
a mapping which is missing another element should represent an element being added or removed
- this is really up to the element’s rules, it can do whatever it wants with the information
two main types of mappings:
- one which needs to have the same mapped parent
- one which does not need to have the same parent
a single mapping is from 1 file to another, but can have more then one elements from each
- this can be the case for indistinguishable elements
mapping function an element type should have access to:
- know if the element’s parents are mapped
- the element data for each
- if the compared element is already mapped, it should have access to the mapping
mapping function will not have access to:
- the coordinate of the element in the document
mapping may be affected by whether the elements are considered ‘ordered’ or ‘unordered’
- ordered mapping can use an edit script and detect differences in a sequence with exact matching rules
- unordered mapping will not take into account the order in which items appear in
mapping function can consider more elements of a
elements which are indistinguishable

Actions or Changes

Different changes to the same node can be grouped into 3 categories:
- identical changes
- non-conflicting changes
- conflicting changes
changes will only be classified after all changes have been detected
a change will be applied to a mapping
These changes apply to all elements:
- adding element
- removing element
- reordering children
Every other element will have their own set of changes
- ie: property changing, moving
- these will be detected by the program upon successful mapping between files
- at the same time as all other changes
if text has not changed in a branch of elements, then it will not be mapped?
- this will not work, if mapping can happen across different levels

rules

should merging rules should consider all changes across all elements
- or only changes to a specific element

Results

results will be printed in standard conflicting file style:

#begin_quote >>> there file === your file <<< #end_quote

the contents of a conlflicting change will have to be printed by each change pair representing a conflict

Implementation Notes

Very large file support -> stream reading files
- this could cause many slow-downs
#begin_ blocks could cause a problem
Tables
Agendas? Calendar?
meta data
- stored with the element?
- could be used to assist mapping (i.e. UIDs)
need to support writing certain text in different encoding
need to be able to add and remove features / rule conservatism
- mapping conservatism
- this will automatically affect change detection
  - will rules be needed for this?
- change merging rules
there will be some user defined values that are not defined in this file
are there any other org-mode rules that wont be in the file?

What I don’t have to do

detection of file name changes

Potential Problems

Commit and Merge timing problems

A merge will always be the same for two files. However, the order of mergin and branching can sometimes produce different files. This problem stems from the fact that ‘diff’ is the difference between two repository states, and merges aren’t transitive across a branches history. http://r6.ca/blog/20110416T204742Z.html

example master and branch

edit branch edit master merge master -> branch edit master merge master -> branch

can produce different results from a merge than

edit branch edit master edit master merge master -> branch

how to combat? I think that this can be combated by keeping unique IDs for headings, like in mobile org.

caveats? what if someone is relying on the non-transitive behavior? (i don’t know how)

Prototype

Parsing

elements: the only element in the prototype is headlines
headlines
- identified by a newline staring with stars and ending with a space
- headline level is the number of stars
- following text is the title
everything not in a title is in the text
- text in the prototype is not its own element, just a property of the heading
the file is parsed into a tree
- with all headings considered a subheading, top level headings are a subheading of the document
if a heading level is skipped (i.e. going from * to ***) it is still a child of the other
headings are all assumed to be unordered
the entire file is loaded into memory and must remain until an org_document no longer needs to exist

Mapping

mapping is the process of matching two headings in a file
a mapping can represent indistinguishable elements
it is expected that as the files are processed, they will change into
mappings are stored in a hiearchy, with the same as the elements
mapping happens with a locality heurisitic

Changes

detected by matching a heading title exactly
measuring changes from the ancestor to each of the new files
each heading will have 2 pointers to changes that affect them
if there are 2 changes to a heading, then there is a conflict
if there are two headings of the same name, it will always produce a conflict (should this be implemented?) when removing, or when 2 change text
every heading which is indistinguishable must have headings
changes are ‘add heading’, ‘remove heading’, ‘change text’
a heading must be in the same place
changes are actually applied when the document is printed to a file
- conflict messages are automatically inserted

Changes

There are only basic changes for unordered lists implemented:
Add heading
- when a new heading has been added to the current spot
Remove heading
Modify text
Move a heading

Difference Detection

use the same difference detection algorithm used in the UNIX diff program[fn:4] for ordered lists

Output

-

Org-merge-tool

Org-Mode Data Representation

Elements below a heading will be considered unordered
Numbered list
- Identified as a whole if it matches exactly

Org-mode Document Elements

Headlines

Headings will be considered unordered trees
- A heading will be matched by its UID if it has one
- headings will be matched by their title otherwise
Never ordered
- Keywords not apart of the name

Keywords

TODO, user defined, properties
perfile keywords #+TODO:

What is a perfile

Cookies: [#A]

always in square brackets?
Always in headings?
[#A], [0%], [0/0]

Plain lists

within an outline tree entry
unordered (start with ‘-‘, ‘+’, ‘*’ as bullets)
ordered (numeral followed by
description (i.e. ::) (considered unordered items)
ends at two blank lines or less indented
must have the same indentation level
can have checkboxes

Blocks #BEGIN_.. blocks

#+STARTUP

Drawers

:DRAWERNAME: this dra :END:
time stamp note in a drawer (C-c C-z)
drawers need to be defined in variable org-drawers
- or by file basis: #+DRAWERS: HIDDEN PROPERTIES STATE
PROPERTIES for properties

Properties

always in PROPERTIES drawer

Can Properties be consolidated with drawers?

Timestamps

in < > or [ ] brackets at the end of a headline
- Can actually be placed almost anywhere
- Can be inside

Footnotes

[[fn:2]
[fn:1] footnotes!
footnote inline definitions[fn:3]
- [fn:: this is inline]
there is a function to automatically
footnote defintions do not need to be ordered
- see org-footnote-auto-label

Tables (needs more work)

table row order might matter
’/’ must be first field,
- only 1?
Table references will be updated to match

Define Table Elt

Calculations? Exist?

Hooks?

org-footnote-auto-adjust fixes footnote defintion order

Command Line Interface

The cli will be written using argp and will conform to the gnu standards. Existing mergetool interfaces will be used a basis.

Options

Command Line Output

File Output

Merge Strategies

TODO-List

Lexer

Responsibilities

The Lexer is responsible for:

Complete Analysis of the input files
- There is no longer any parser preforming intermediate analysis.

The Lexer is NOT responsible for:

Establishing the file stream.
There is no longer

Make explicit the Lexer’s responsibilities

Especially relating to the parser.

Implementation Overview

The Lexer is implemented with flex.
- Documentation for flex can be found here.

Tokens

Org Elements

The document elements are placed into a tree for processing

aeuaoeu

List of Org Elements

Headings

Headings level is

Matching Heading Elements

Heading elements may be matched by text if their parents are matched
- this allows parents to be matched by UID.
Heading elements may be matched to any element with the same UID
Heading elements may only match elements of the same type.

Modification Guesser?

look for differences in the files
1. this seems like it might have to be O(n^2)
create a list of changes (ordered?)
1. tree of changes (first element is nothing)
2. find conflicting changes
process the conflicting changes, applying generic rules

Modification Merger

if A and B both add a heading with the same name in the same place, should it conflict or should both be added?
if a heading is moved in A, and B adds a subheading, should this conflict as B is editing a removed heading in A, or should it merge with the new subheading under the moved heading in A

Footnotes and Bibiliography

git-merge-changelogprovides a good example on how to write a merge driver
Three-way XML Merge

[fn:4] The basic algorithm is described in “An O(ND) Difference Algorithm and its Variations”, Eugene W. Myers, ‘Algorithmica’ Vol. 1 No. 2, 1986, pp. 251-266; and in “A File Comparison Program”, Webb Miller and Eugene W. Myers, ‘Software–Practice and Experience’ Vol. 15 No. 11, 1985, pp. 1025-1040. The algorithm was independently discovered as described in “Algorithms for Approximate String

Files

implementation.org

Latest commit

History

implementation.org

File metadata and controls

Org Merge Driver Implementation Notes

Project Details

Contact Information

Milestones [42%]

Start Project

prototype parser (limited functionality)

prototype tree construction

prototype tree change detection

prototype tree merging

finish prototype

tree data interface

org-mode parser

general merging rule interface

org mode change detection

org mode merging rules

additional features

finishing touches

documentation and distribution

Coding Conventions

Collaboration

Notes

General Merging Implementation

Elements

Node Identification

Actions or Changes

rules

Results

Implementation Notes

What I don’t have to do

Potential Problems

Commit and Merge timing problems

Prototype

Parsing

Mapping

Changes

Changes

Difference Detection

Output

Org-merge-tool

Org-Mode Data Representation

Org-mode Document Elements

Headlines

Keywords

What is a perfile

Cookies: [#A]

Tags

Plain lists

Blocks #BEGIN_.. blocks

Drawers

Properties

Can Properties be consolidated with drawers?

Timestamps

Footnotes

Tables (needs more work)

Define Table Elt

Hooks?

Command Line Interface

Options

Command Line Output

File Output

Merge Strategies

TODO-List

Lexer

Responsibilities

Make explicit the Lexer’s responsibilities

Implementation Overview

Tokens

Org Elements

aeuaoeu

List of Org Elements

Headings

Matching Heading Elements

Modification Guesser?

Modification Merger