Skip to content

Latest commit

 

History

History
executable file
·
351 lines (318 loc) · 12.3 KB

implementation.org

File metadata and controls

executable file
·
351 lines (318 loc) · 12.3 KB

Org Merge Driver Implementation Notes

#+FILETAGS :note:doc:

Project Details

Contact Information

Andrew Young, youngar17 at gmail.com

Milestones [42%]

Start Project

prototype parser (limited functionality)

prototype tree construction

prototype tree change detection

prototype tree merging

finish prototype

tree data interface

org-mode parser

general merging rule interface

org mode change detection

org mode merging rules

additional features

finishing touches

documentation and distribution

Coding Conventions

Collaboration

Notes

General Merging Implementation

Elements

  • an element is any individual element in a file
  • elements are stored as a hierarchy
    • this is to identify not just changes of line-by-line content, but also of relationships (e.g. data is moved)
  • elements are defined by their mapping entity
  • elements may not always have a perfectly identifying trait
    • i.e. a unique object identifier
  • defined by:
    • how to read elements
    • how to write the elements
    • what changes are available
    • how to map them
    • how to merge changes
  • every element has a parent, and a list of sub-elements
    • the sub elements list can be empty
  • an ordered set of elements are represented by:
    • mapping which needs identical parents
    • each element is the next one’s parent

Node Identification

  • Node identification is mapping elements from one file to the same

node in a different file

  • whether or not two elements are mapped depends on the element type, and it’s mapping rules
  • a mapping which is missing another element should represent an element being added or removed
    • this is really up to the element’s rules, it can do whatever it wants with the information
  • two main types of mappings:
    • one which needs to have the same mapped parent
    • one which does not need to have the same parent
  • a single mapping is from 1 file to another, but can have more then one elements from each
    • this can be the case for indistinguishable elements
  • mapping function an element type should have access to:
    • know if the element’s parents are mapped
    • the element data for each
    • if the compared element is already mapped, it should have access to the mapping
  • mapping function will not have access to:
    • the coordinate of the element in the document
  • mapping may be affected by whether the elements are considered ‘ordered’ or ‘unordered’
    • ordered mapping can use an edit script and detect differences in a sequence with exact matching rules
    • unordered mapping will not take into account the order in which items appear in
  • mapping function can consider more elements of a
  • elements which are indistinguishable

Actions or Changes

  • Different changes to the same node can be grouped into 3 categories:
    • identical changes
    • non-conflicting changes
    • conflicting changes
  • changes will only be classified after all changes have been detected
  • a change will be applied to a mapping
  • These changes apply to all elements:
    • adding element
    • removing element
    • reordering children
  • Every other element will have their own set of changes
    • ie: property changing, moving
    • these will be detected by the program upon successful mapping between files
    • at the same time as all other changes
  • if text has not changed in a branch of elements, then it will not be mapped?
    • this will not work, if mapping can happen across different levels

rules

  • should merging rules should consider all changes across all elements
    • or only changes to a specific element

Results

  • results will be printed in standard conflicting file style:

#begin_quote >>> there file === your file <<< #end_quote

  • the contents of a conlflicting change will have to be printed by each change pair representing a conflict

Implementation Notes

  • Very large file support -> stream reading files
    • this could cause many slow-downs
  • #begin_ blocks could cause a problem
  • Tables
  • Agendas? Calendar?
  • meta data
    • stored with the element?
    • could be used to assist mapping (i.e. UIDs)
  • need to support writing certain text in different encoding
  • need to be able to add and remove features / rule conservatism
    • mapping conservatism
    • this will automatically affect change detection
      • will rules be needed for this?
    • change merging rules
  • there will be some user defined values that are not defined in this file
  • are there any other org-mode rules that wont be in the file?

What I don’t have to do

  • detection of file name changes

Potential Problems

Commit and Merge timing problems

A merge will always be the same for two files. However, the order of mergin and branching can sometimes produce different files. This problem stems from the fact that ‘diff’ is the difference between two repository states, and merges aren’t transitive across a branches history. http://r6.ca/blog/20110416T204742Z.html

example master and branch

edit branch edit master merge master -> branch edit master merge master -> branch

can produce different results from a merge than

edit branch edit master edit master merge master -> branch

how to combat? I think that this can be combated by keeping unique IDs for headings, like in mobile org.

caveats? what if someone is relying on the non-transitive behavior? (i don’t know how)

Prototype

Parsing

  • elements: the only element in the prototype is headlines
  • headlines
    • identified by a newline staring with stars and ending with a space
    • headline level is the number of stars
    • following text is the title
  • everything not in a title is in the text
    • text in the prototype is not its own element, just a property of the heading
  • the file is parsed into a tree
    • with all headings considered a subheading, top level headings are a subheading of the document
  • if a heading level is skipped (i.e. going from * to ***) it is still a child of the other
  • headings are all assumed to be unordered
  • the entire file is loaded into memory and must remain until an org_document no longer needs to exist

Mapping

  • mapping is the process of matching two headings in a file
  • a mapping can represent indistinguishable elements
  • it is expected that as the files are processed, they will change into
  • mappings are stored in a hiearchy, with the same as the elements
  • mapping happens with a locality heurisitic

Changes

  • detected by matching a heading title exactly
  • measuring changes from the ancestor to each of the new files
  • each heading will have 2 pointers to changes that affect them
  • if there are 2 changes to a heading, then there is a conflict
  • if there are two headings of the same name, it will always produce a conflict (should this be implemented?) when removing, or when 2 change text
  • every heading which is indistinguishable must have headings
  • changes are ‘add heading’, ‘remove heading’, ‘change text’
  • a heading must be in the same place
  • changes are actually applied when the document is printed to a file
    • conflict messages are automatically inserted

Changes

  • There are only basic changes for unordered lists implemented:
  • Add heading
    • when a new heading has been added to the current spot
  • Remove heading
  • Modify text
  • Move a heading

Difference Detection

  • use the same difference detection algorithm used in the UNIX diff program[fn:4] for ordered lists

Output

-

Org-merge-tool

Org-Mode Data Representation

  • Elements below a heading will be considered unordered
  • Numbered list
    • Identified as a whole if it matches exactly

Org-mode Document Elements

Headlines

  • Headings will be considered unordered trees
    • A heading will be matched by its UID if it has one
    • headings will be matched by their title otherwise

    Never ordered

    • Keywords not apart of the name

Keywords

  • TODO, user defined, properties
  • perfile keywords #+TODO:
What is a perfile

Cookies: [#A]

  • always in square brackets?
  • Always in headings?
  • [#A], [0%], [0/0]

Tags

  • only for headlines
  • #+filetags for tags for everytihng in the file

Plain lists

  • within an outline tree entry
  • unordered (start with ‘-‘, ‘+’, ‘*’ as bullets)
  • ordered (numeral followed by
  • description (i.e. ::) (considered unordered items)
  • ends at two blank lines or less indented
  • must have the same indentation level
  • can have checkboxes

Blocks #BEGIN_.. blocks

  • #+STARTUP

Drawers

  • :DRAWERNAME: this dra :END:
  • time stamp note in a drawer (C-c C-z)
  • drawers need to be defined in variable org-drawers
    • or by file basis: #+DRAWERS: HIDDEN PROPERTIES STATE
  • PROPERTIES for properties

Properties

  • always in PROPERTIES drawer
Can Properties be consolidated with drawers?

Timestamps

  • in < > or [ ] brackets at the end of a headline
    • Can actually be placed almost anywhere
    • Can be inside

Footnotes

  • [[fn:2]
  • [fn:1] footnotes!
  • footnote inline definitions[fn:3]
    • [fn:: this is inline]
  • there is a function to automatically
  • footnote defintions do not need to be ordered
    • see org-footnote-auto-label

Tables (needs more work)

  • table row order might matter
  • ’/’ must be first field,
    • only 1?
  • Table references will be updated to match
Define Table Elt
  • Calculations? Exist?

Hooks?

  • org-footnote-auto-adjust fixes footnote defintion order

Command Line Interface

The cli will be written using argp and will conform to the gnu standards. Existing mergetool interfaces will be used a basis.

Options

Command Line Output

File Output

Merge Strategies

TODO-List

Lexer

Responsibilities

The Lexer is responsible for:

  • Complete Analysis of the input files
    • There is no longer any parser preforming intermediate analysis.

The Lexer is NOT responsible for:

  • Establishing the file stream.
  • There is no longer

Make explicit the Lexer’s responsibilities

  • Especially relating to the parser.

Implementation Overview

  • The Lexer is implemented with flex.
    • Documentation for flex can be found here.

Tokens

Org Elements

  • The document elements are placed into a tree for processing

aeuaoeu

List of Org Elements

Headings

Headings level is

Matching Heading Elements

  • Heading elements may be matched by text if their parents are matched
    • this allows parents to be matched by UID.
  • Heading elements may be matched to any element with the same UID
  • Heading elements may only match elements of the same type.

Modification Guesser?

  1. look for differences in the files
    1. this seems like it might have to be O(n^2)
  2. create a list of changes (ordered?)
    1. tree of changes (first element is nothing)
    2. find conflicting changes
  3. process the conflicting changes, applying generic rules

Modification Merger

  • if A and B both add a heading with the same name in the same place, should it conflict or should both be added?
  • if a heading is moved in A, and B adds a subheading, should this conflict as B is editing a removed heading in A, or should it merge with the new subheading under the moved heading in A

Footnotes and Bibiliography

[fn:4] The basic algorithm is described in “An O(ND) Difference Algorithm and its Variations”, Eugene W. Myers, ‘Algorithmica’ Vol. 1 No. 2, 1986, pp. 251-266; and in “A File Comparison Program”, Webb Miller and Eugene W. Myers, ‘Software–Practice and Experience’ Vol. 15 No. 11, 1985, pp. 1025-1040. The algorithm was independently discovered as described in “Algorithms for Approximate String

General Algorithm