It's fast to add elements to the head of a list and the tail of a vector. Clojure takes advantage of this by adding elements to the optimal position of a sequence based on it's type. Being totally oblivious to this fact, I spent quite a while trying to pin down the difference in behavior between my Python reference implementation and my own version. Once I discovered the source of the program, I had a classic hacker aha moment and a great deal of satisfaction.
Note that this program stays in the realm of immutable data 100% of the time. I believe this may be the reason that it's much slower than the Python version, but I could be wrong. For whatever reason, there's a pretty large speed difference between the two programs. If anyone has any suggestions, I'll be happy to try them.
(ns markov (use clojure.contrib.str-utils)) (set! *warn-on-reflection* true) (defn flatten "Takes any nested combination of sequential things (lists, vectors, etc.) and returns their contents as a single, flat sequence. (flatten nil) returns nil." [x] (filter (complement sequential?) (rest (tree-seq sequential? seq x)))) (defn rand-elt "Return a random element of this seq" [s] (nth s (rand-int (count s)))) (defn clean [txt] "clean given txt for symbols disruptive to markov chains" (let [new-txt (re-gsub #"[:;,^\"()]" "" txt) new-txt (re-gsub #"'(?!(d|t|ve|m|ll|s|de|re))" "" new-txt)] new-txt)) (defn chain-lengths [markov-chain] "return a set of lengths for each element in the collection" (let [markov-keys (map keys markov-chain)] (set (for [x markov-keys] (count x))))) (defn max-chain-length [markov-chain] "return the length lf the longest chain" (apply max (chain-lengths markov-chain))) (defn chain "Take a list of words and build a markov chain out of them. The length is the size of the key in number of words." ([words] (chain words 3)) ([words length] (loop [markov-chain {} keychain (for [x (range length)] nil) words (map clean words)] (let [first-word (first words)] (if (seq words) (recur (assoc markov-chain keychain (cons first-word (get markov-chain keychain))) (concat (rest keychain) [first-word]) (rest words)) (assoc markov-chain keychain [])))))) (defn split-sentence [text] "Convert a string to a collection on common boundaries" (filter seq (re-split #"[,.!?()\d]+\s*" text))) (defn file-chain "Create a markov chain from the contents of a given file" ([file] (file-chain file 3)) ([file length] (let [sentences (split-sentence (slurp file)) flatten-list (fn [& x] (flatten (list x)))] (loop [markov-chain {} words sentences] (if (seq words) (recur (merge-with flatten-list markov-chain (chain (re-split #"\s+" (first words)))) (rest words)) markov-chain))))) (defn construct-sentence "Build a sentence from a markov chain structure. Given a Markov chain (any size key), Seed (to start the sentence) and Proc (a function for choosing the next word), returns a sentence composed until is reaches the end of a chain (an end of sentence)." ([markov-chain] (construct-sentence markov-chain nil rand-elt)) ([markov-chain seed] (construct-sentence markov-chain seed rand-elt)) ([markov-chain seed proc] (loop [words (if seed seed (rand-elt (keys markov-chain))) sentence (str-join " " (filter identity words))] (if (seq (markov-chain words)) (let [word-new (proc (markov-chain words))] (recur (concat (rest words) [word-new]) (str-join " " (into [sentence] [word-new])))) sentence))))Here's some example usage:
(ns main (use markov)) (def markov (file-chain "corpus.txt")) (doseq [x (range 100)] (doseq [x (range 3)] (println (construct-sentence markov))) (println))
Cool, looks good. It's always fun to see how much you can do with a little amount of code in Clojure. I had a go at this myself back in January, you can see the code on my blog.
ReplyDeleteHi, i am looking for a markov rewriter myself. can you pls give the python code. my website is http://www.chainsrewriting.com
ReplyDelete