-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Character counts (concatenated XML files) FIO #2
Comments
Aside: Even in modern editions, the last Pilcrow sign in the KJV occurs in Acts 20:36. btw. Modern editions are largely descendants of Benjamin Blayney's 1769 Oxford University Press Edition, albeit with minor textual differences for those published by the Cambridge University Press. |
All 204 instances of the letter thorn are in these two words:
The same English words spelled without the thorn are far more numerous. An intriguing inconsistency. |
As elsewhere, the characters you get in the XML are the characters in the HTML source I used. These are for the most part pretty faithful to the KJV 1611 source, judging by the page images provided, but occasional spots of roman within the black letter (see e.g. Heb 7.20) and the long-s glyph variant don't seem to have been systematically recorded. |
Heb 7.20 is a nice example of roman text with a word containing the long s, namely btw. Modern editions do not have the word I wonder when these changes were made, and whether they were noted by F H A Scrivener? |
btw. Another surprise was the change of spelling from |
Interesting to observe that none of the possessives in 1611 ended with apostrophe & letter s (or vice versa). In fact the sole apostrophe (
|
Refer to https://en.wikipedia.org/wiki/Apostrophe#Typographic_form Should we replace the typewriter apostrophe by the single right quotation mark Here's what the verse looks like: Aside: At least the transcribers didn't have |
FIO. The attached text file is a character frequency count for the 1361 xml files concatenated from the chap folder (NB. Analysis now includes Romans, and excludes bogus copyright lines.)
merged.xml.character.frequency.txt
The XML entity
&
occurs 2091 times. There are no other entities.Of particular interest are the non-ASCII letters and characters:
It's evident that the source web-site must not have made any systematic attempt to use the following letter that was in the original KJV of 1611.
Reverse engineering a fix for this discrepancy would not be a simple task.
Even so, the long s might only have been present in the translators' added words that were styled with Roman typeface; and also the chapter descriptions in head elements and the page titles in fw elements.
cf. The main text of the KJV was in blackletter typeface.
The text was updated successfully, but these errors were encountered: