Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Character counts (concatenated XML files) FIO #2

Open
DavidHaslam opened this issue Oct 24, 2017 · 7 comments
Open

Character counts (concatenated XML files) FIO #2

DavidHaslam opened this issue Oct 24, 2017 · 7 comments

Comments

@DavidHaslam
Copy link

DavidHaslam commented Oct 24, 2017

FIO. The attached text file is a character frequency count for the 1361 xml files concatenated from the chap folder (NB. Analysis now includes Romans, and excludes bogus copyright lines.)

merged.xml.character.frequency.txt

The XML entity & occurs 2091 times. There are no other entities.

Of particular interest are the non-ASCII letters and characters:

U+00B6	¶	2,977	PILCROW SIGN
U+00C6	Æ	1	LATIN CAPITAL LETTER AE
U+00E6	æ	7	LATIN SMALL LETTER AE
U+00FE	þ	204	LATIN SMALL LETTER THORN
U+0101	ā	5	LATIN SMALL LETTER A WITH MACRON
U+0113	ē	36	LATIN SMALL LETTER E WITH MACRON
U+014D	ō	153	LATIN SMALL LETTER O WITH MACRON
U+016B	ū	6	LATIN SMALL LETTER U WITH MACRON

It's evident that the source web-site must not have made any systematic attempt to use the following letter that was in the original KJV of 1611.

U+017F	ſ	LATIN SMALL LETTER LONG S

Reverse engineering a fix for this discrepancy would not be a simple task.
Even so, the long s might only have been present in the translators' added words that were styled with Roman typeface; and also the chapter descriptions in head elements and the page titles in fw elements.

cf. The main text of the KJV was in blackletter typeface.

@DavidHaslam
Copy link
Author

Aside: Even in modern editions, the last Pilcrow sign in the KJV occurs in Acts 20:36.
It's conjectured that the 1611 printers simply ran out of moveable type for this character,
and that all subsequent editions simply followed suit.

btw. Modern editions are largely descendants of Benjamin Blayney's 1769 Oxford University Press Edition, albeit with minor textual differences for those published by the Cambridge University Press.

@DavidHaslam
Copy link
Author

DavidHaslam commented Oct 24, 2017

All 204 instances of the letter thorn are in these two words:

  • þe (191 times)
  • þt (13 times)

The same English words spelled without the thorn are far more numerous. An intriguing inconsistency.

@lb42
Copy link
Owner

lb42 commented Oct 25, 2017

As elsewhere, the characters you get in the XML are the characters in the HTML source I used. These are for the most part pretty faithful to the KJV 1611 source, judging by the page images provided, but occasional spots of roman within the black letter (see e.g. Heb 7.20) and the long-s glyph variant don't seem to have been systematically recorded.

@DavidHaslam
Copy link
Author

Heb 7.20 is a nice example of roman text with a word containing the long s, namely Prieſt, as is the next verse that has Prieſts. The latter is an interesting example of where the 1611 edition has it as an added word in Roman typeface, yet the modern editions do not have the word Priests styled in italics.

btw. Modern editions do not have the word priest (singular or plural) capitalised here either.

I wonder when these changes were made, and whether they were noted by F H A Scrivener?

@DavidHaslam
Copy link
Author

btw. Another surprise was the change of spelling from othe to oath in the space of two consecutive verses!

@DavidHaslam
Copy link
Author

Interesting to observe that none of the possessives in 1611 ended with apostrophe & letter s (or vice versa).

In fact the sole apostrophe (\x22) occurs in the word wing'd found in Ezekiel 17:3, thus:

<ab n="3">And say, Thus saith the Lord God, A great eagle with great wings, long wing'd, full of feathers, which had diuers colours, camevnto Lebanon, and tooke the highest branch of the Cedar.<note> Hebr. embroydering.</note></ab>

@DavidHaslam
Copy link
Author

Refer to https://en.wikipedia.org/wiki/Apostrophe#Typographic_form

Should we replace the typewriter apostrophe by the single right quotation mark U+2019 ?

Here's what the verse looks like:

screenshot 2017-10-31 14 55 34

@lb42

Aside: At least the transcribers didn't have greateagle or offeathers !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants