Skip to content

Commit

Permalink
Switch infer spaces from doubles to floats
Browse files Browse the repository at this point in the history
This change will consume about half the memory that the previous
doubles consumed. The change in precision shouldn't affect the results
of word splitting.
  • Loading branch information
iethree committed Dec 7, 2017
1 parent 3c4267b commit 538d981
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions src/metabase/util/infer_spaces.clj
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
(let [log-count (Math/log (count words))]
(into (sorted-map)
(map-indexed (fn [idx word]
[(hash word) (Math/log (* (inc idx) log-count))])
[(hash word) (float (Math/log (* (inc idx) log-count)))])
words))))

;; # Build arrays for a cost lookup, assuming Zipf's law and cost = -math.log(probability).
Expand All @@ -35,9 +35,9 @@
"Array of word hash values, ordered by that hash value"
(int-array (keys sorted-words)))

(def ^:private ^"[D" word-cost
"Array of word cost doubles, ordered by the hash value for that word"
(double-array (vals sorted-words)))
(def ^:private ^"[F" word-cost
"Array of word cost floats, ordered by the hash value for that word"
(float-array (vals sorted-words)))

;; maxword = max(len(x) for x in words)
(def ^:private max-word
Expand Down Expand Up @@ -97,7 +97,7 @@
[input]
(let [s (s/lower-case input)
cost (build-cost-array s)]
(loop [i (double (count s))
(loop [i (float (count s))
out []]
(if-not (pos? i)
(reverse out)
Expand Down

0 comments on commit 538d981

Please sign in to comment.