Skip to content

gmarma23/WordUnscrambler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Word Unscrambler

Unscramble provided character strings to form derived words.

Core Implementation

A character string is encoded as a vector holding the number of occurrences of each character in the string, resulting in permutations of the same characters having identical vectors. Then the vector is mapped to a unique signature-integer using an injective function approximation. Distinct integers correspond to distinct vectors and vice versa.

A plain txt wordlist file is used to generate a json dictionary with integer-signatures as keys and the list of words from the initial wordlist that have the same integer-signature as value of that key. This action is performed only once for each plain wordlist and the resulting mapped wordlist can be used multiple times from now on.

Whenever a character string needs to be unscrambled, firstly it's integer-signature is calculated and then a search is performed in an already mapped wordlist for that integer-signature as dictionary key. The corresponding dictionary value is then returned as the results list, populated with derived words.

Two already mapped wordlists can be found in %project_dir%/wordlists/mapped/ directory:

  • english_wordlist_simple.json (DEFAULT MAPPED WORDLIST)
  • english_wordlist_complex.json

Usage

Run main.py and use:

  1. unscramble option to unscramble character strings.
  • Use --chars (or -c) argument to provide a single character string.
  • Use --file (or -f) argument to provide multiple character strings via a txt file (one string per line). The results are outputted inside %project_dir%/output/ directory as a txt file named %initial_filename%_unscrambled.txt.

    Optional:

  • Use --subsets (or -s) argument to unscramble all subsets of provided character string as well.
  • Use --wordlist (or -w) argument to provide a path to a custom mapped wordlist. If a custom mapped wordlist is not provided then default english_wordlist_simple.json will be used.

  1. map option to generate a custom mapped wordlist.
  • Use --wordlist (or -w) argument to specify the path to a plain txt wordlist file.
  • Use --valid-chars (or -v) argument to provide a string of characters that are permitted to be present in words of the new custom mapped wordlist. During the creation of a mapped wordlist, a word from the plain wordlist is not registered if it contains invalid characters.


Examples:

python main.py unscramble -c "ogd"

ogd --> ['dog', 'god']

python main.py unscramble -c "ogd" -s

og --> ['go']
od --> ['do']
ogd --> ['dog', 'god']

python main.py unscramble -f "%path_to_input_txt_file%"

python main.py map -w "%path_to_plain_txt_wordlist%" -v "abcdefghijklmnopqrstuvwxyz-"

About

Unscramble character strings to form derived words

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages