Skip to content

Simple tool for fixing common misspellings, typos in source code

License

Notifications You must be signed in to change notification settings

avallete/misspell-fixer

Repository files navigation

Misspell Fixer

Build Status Coverage Status Circle CI Build Status Issue Count

==============

Utility to fix common misspellings, typos in source code. There are lots of typical misspellings in program code. Typically they are more eye-catching in the living code but they can easily hide in comments, examples, samples, notes and documentation. With this utility you can fix a large number of them very quickly.

Be aware that the utility does not check or fix file names. It can easily happen that a misspelled word is fixed in a file name in a program's code, but the file itself will not be renamed by this utility.

Also important to be very careful when fixing public APIs!

A manual review is always needed to verify that nothing has been broken.

Synopsis

misspell_fixer	[OPTION] target[s]

Options, Arguments

target[s] can be any file[s] or directory/ies.

Main options:

  • -r Real run mode: Overwrites the original files with the fixed one. Without this option the originals will be untouched.
  • -n Disable backups. (By default the modified files' originals will be saved with the .$$.BAK suffix.)
  • -P n Enable processing on n forks. For example: -P 4 processes the files in 4 threads. (-s option is not supported)
  • -f Fast mode. (Equivalent with -P4)
  • -h Help. Displays this usage.

Performance note: -s, -v or the lack of -n or -r use a slower processing internal loop. So usually -frn without -s and -v are the highest performing combination.

Output control options:

  • -s Shows diffs of changes.
  • -v Verbose mode: shows the iterated files. (Without the prefiltering step)
  • -o Verbose mode: shows progress (prints a dot for each file scanned, a comma for each file fix iteration/file.)
  • -d Debug mode: shows all steps of the core logic.

By default only a subset of rules are enabled (around 100). You can enable more rules with the following options:

  • -u Enable less safe rules. (Manual review's importance is more significatnt...) (Around ten rules.)
  • -g Enable rules to convert British English to US English. (These rules aren't exactly typos but sometimes they can be useful.) (Around ten rules.)
  • -R Enable rare rules. (Few hundred rules.)
  • -V Enable very rare rules. (Mostly from the wikipedia article.) (More than four thousand rules.)
  • -D Enable rules based on lintian.debian.org ( git:ebac9a7, ~2300 )

The processing speed decreases as you activate more rules. But with newer greps this is much less significant.

File filtering options:

  • -N Enable file name filtering. For example: -N '*.cpp' -N '*.h'
  • -i Walk through source code management system's internal directories. (do not ignore .git, .svn, .hg, CVS)
  • -b Process binary, generated files. (do not ignore *.gif, *.jpg, *.jpeg, *.png, *.zip, *.gz, *.bz2, *.xz, *.rar, *.po, *.pdf, *.woff, yarn.lock, package-lock.json, composer.lock, *.mo)
  • -m Disable file size checks. Default is to ignore files > 1MB. (usually csv, compressed JS, ..)

Sample usage

By default nothing important will happen

$ misspell_fixer.sh target

Fixing the files with displaying each fixed file:

$ misspell_fixer.sh -rv target

Showing only the diffs without modifying the originals:

$ misspell_fixer.sh -sv target

Showing the diffs with progress and fixing the found typos:

$ misspell_fixer.sh -rsv target

Fast mode example, no backups: (highest performance)

$ misspell_fixer.sh -frn target

The previous with all rules enabled:

$ misspell_fixer.sh -frunRVD target

It is based on the following sources for common misspellings:

Dependencies - "On the shoulders of giants"

The script itself is just a misspelling database and some glue in bash between grep and sed. grep's -F combined with sed's line targeting makes the script quite efficient. -F enables parallel pattern matching with the https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm . Unfortunately this seem to work well with -w only in the newer (3+) versions.

A little more comprehensive list:

  • bash
  • find
  • sed
  • grep
  • diff
  • sort
  • tee
  • cut
  • rm
  • xargs

Authors

  • Veres Lajos
  • ka7

Original source

https://github.com/vlajos/misspell_fixer

Feel free to use!

About

Simple tool for fixing common misspellings, typos in source code

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • sed 94.3%
  • Shell 5.2%
  • Other 0.5%