Correct commonly misspelled English words... quickly.
$ misspell all.html your.txt important.md files.go
your.txt:42:10 found "langauge" a misspelling of "language"
# ^ file, line, column
You'll need golang 1.5 or newer installed to compile it. But after that it's a standalone binary.
If people want pre-compiled binaries, file a ticket please.
- Automatic Corrections
- Converting UK spellings to US
- Using pipes and stdin
- Golang special support
- gometalinter support
- Changing output format
- Checking a folder recursively
- Performance
- Known Issues
- Debugging
- False Negatives and missing words
- Origin of Word Lists
- Software License
- Problem statement
- Other spelling correctors
- Other ideas
Just add the -w
flag!
$ misspell -w all.html your.txt important.md files.go
your.txt:9:21:corrected "langauge" to "language"
# ^booyah
Add the -locale US
flag!
$ misspell -locale US important.txt
important.txt:10:20 found "colour" a misspelling of "color"
Add the -locale UK
flag!
$ echo "My favorite color is blue" | misspell -locale UK
stdin:1:3:found "favorite color" a misspelling of "favourite colour"
Help is appreciated as I'm neither British nor an expert in the English language.
You can run misspell recursively using the following shell tricks:
misspell directory/**/*
or
find . -name '*' | xargs misspell
Yes!
Print messages to stderr
only:
$ echo "zeebra" | misspell
stdin:1:0:found "zeebra" a misspelling of "zebra"
Print messages to stderr
, and corrected text to stdout
:
$ echo "zeebra" | misspell -w
stdin:1:0:corrected "zeebra" to "zebra"
zebra
Only print the corrected text to stdout
:
$ echo "zeebra" | misspell -w -q
zebra
Yes! If the file ends in .go
, then misspell will only check spelling in
comments.
If you want to force a file to be checked as a golang source, use -source=go
on the command line. Conversely, you can check a golang source as if it were
pure text by using -source=text
. You might want to do this since many
variable names have misspellings in them!
I'm told the using -source=go
works well for ruby, javascript, java, c and
c++.
It doesn't work well for python and bash.
gometalinter runs
multiple golang linters. Starting on 2016-06-12
gometalinter supports misspell
natively but it is disabled by default.
# update your copy of gometalinter
go get -u github.com/alecthomas/gometalinter
# install updates and misspell
gometalinter --install --update
To use, just enable misspell
gometalinter --enable misspell ./...
Note that gometalinter only checks golang files, and uses the default options
of misspell
You may wish to run this on your plaintext (.txt) and/or markdown files too.
Using the -f template
flag you can pass in a
golang text template to format the output.
The built-in template uses everything, including the js
function to escape
the original text.
{{ .Filename }}:{{ .Line }}:{{ .Column }}:corrected "{{ js .Original }}" to "{{ js .Corrected }}"
To just print probable misspellings:
-f '{{ .Original }}'
This corrects commonly misspelled English words in computer source
code, and other text-based formats (.txt
, .md
, etc).
It is designed to run quickly so it can be used as a pre-commit hook with minimal burden on the developer.
It does not work with binary formats (e.g. Word, etc).
It is not a complete spell-checking program nor a grammar checker.
Some other misspelling correctors:
- https://github.com/vlajos/misspell_fixer
- https://github.com/lyda/misspell-check
- https://github.com/lucasdemarchi
They all work but had problems that prevented me from using them at scale:
- slow, all of the above check one misspelling at a time (i.e. linear) using regexps
- not MIT/Apache2 licensed (or equivalent)
- have dependencies that don't work for me (python3, bash, linux sed, etc)
- don't understand American vs. British English and sometimes makes unwelcome "corrections"
That said, they might be perfect for you and many have more features than this project!
Misspell is Easily 100x to 1000x faster than other spelling correctors. You should be able to check and correct 1000 files in under 250ms.
This uses the mighty power of golang's strings.Replacer which is a implementation or variation of the Aho–Corasick algorithm. This makes multiple substring matches simultaneously
In addition this uses multiple CPU cores to work on multiple files.
Unlike the other projects, this doesn't know what a "word" is. There may be more false positives and false negatives due to this. On the other hand, it sometimes catches things others don't.
Either way, please file bugs and we'll fix them!
Since it operates in parallel to make corrections, it can be non-obvious to determine exactly what word was corrected.
Run using -debug
flag on the file you want. It should then print what word
it is trying to correct. Then file a
bug describing the problem.
Thanks!
The matching function is case-sensitive, so variable names that are multiple
worlds either in all-upper or all-lower case sometimes can cause false
positives. For instance a variable named bodyreader
could trigger a false
positive since yrea
is in the middle that could be corrected to year
.
Other problems happen if the variable name uses a English contraction that
should use an apostrophe. The best way of fixing this is to use the
Effective Go naming
conventions and use
camelCase for variable names. You
can check your code using golint
It started with a word list from Wikipedia. Unfortunately, this list had to be highly edited as many of the words are obsolete or based from mistakes on mechanical typewriters (I'm guessing).
Additional words were added based on actually mistakes seen in the wild (meaning self-generated).
Variations of UK and US spellings are based on many sources including:
- http://www.tysto.com/uk-us-spelling-list.html (with heavy editing, many are incorrect)
- http://www.oxforddictionaries.com/us/words/american-and-british-spelling-american (excellent site but incomplete)
- Diffing US and UK scowl dictionaries
American English is more accepting of spelling variations than is British English, so "what is American or not" is subject to opinion. Corrections and help welcome.
### What are some other enhancements that could be done?Here's some ideas for enhancements:
Capitalization of proper nouns could be done (e.g. weekday and month names, country names, language names)
Opinionated US spellings US English has a number of words with alternate spellings. Think adviser vs. advisor. While "advisor" is not wrong, the opinionated US locale would correct "advisor" to "adviser".
Versioning Some type of versioning is needed so reporting mistakes and errors is easier.
Feedback Mistakes would be sent to some server for agregation and feedback review.