Skip to content

neilpa/emma-ghent-curtis_the-administratrix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Administratrix
Emma Ghent Curtis - 1889

Harvested scans from Google Books from the provided PDF
https://www.google.com/books/edition/The_Administratrix/Ug5FAQAAMAAJ

Already requested and obtained copyright clearance from PG via rule 1

Extracted each page as a PNG via built-in macOS functionality
- Once at 300dpi for OCR
- Once at 150dpi as source for the uploaded images

Used Rescribe (Tesseract wrapper) to OCR the text
- Enabled "Autmoatically clean image sides" which avoided the "Digitized by Google" footer

I did not run this through guiprep, getting that working on macOS is a pain

There are no illustrations, only the cover and back cover of the book
I've included that and all the blank pages since I wasn't sure if those should be removed

Still todo
- replace a few images that were overly cropped
- cleanup blank/cover page text files
- run everything through pngcrush

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published