framed.external.core

->set

(->set path)
Construct an ExternalSet from a file path

aggregate-by

(aggregate-by key-func coll)(aggregate-by key-func map-func coll)(aggregate-by key-func map-func batchsize coll)
Groups items in coll according to (key-func item) and uses map-func to
transform the final partition values. Returns a lazy seq of partitions,
of the form [k (map-func vs)]

Invariant: Output will be sorted with respect to key-func

Ex:
  (aggregate-by
    :user
    (fn [events] (count events))
    10000
    [{:user "a" :event "Login"}
     {:user "b" :event "Signup"}
     {:user "a" :event "Purchase"}
     {:user "c" :event "Add To Cart"}])
  ; => (["a" 2] ["b" 1] ["c" 1])

distinct

(distinct coll)
Return a sequence of distinct elements in coll.
Order of returned elements is undefined.

intersection

(intersection s0 s1)(intersection batchsize s0 s1)
Return the intersection of two comparable sequences as a lazy seq
Ex:
  (def s1 (external/set [1 2 3]))
  (def s2 (external/set [2 3 4 5]))
  (e/intersection s1 s2)
  ; => (2 3)

keyed-partition-by

(keyed-partition-by f coll)
Applies f to each value in coll, splitting on each different f value.
Returns a lazy seq of pairs, each containing the split value and the seq
of values in the partition

Note: lazy in the partitions produced. To accomplish this, f is applied
twice to each element in partitions, so is unsuitable when f is expensive or
has side-effects.

See http://stackoverflow.com/questions/24738261/lazy-partition-by

set

(set coll)
Construct an ExternalSet from coll

shuffle

(shuffle coll)(shuffle rng coll)
Shuffle coll in constant space and return a lazy seq of the results

rng - optional java.util.Random for deterministic testing

sort

(sort batchsize coll)(sort batchsize compx coll)
Sorts coll in constant space and return a lazy seq of the results
batchsize - Integer representing max number of elements to write to a file
compx - 3-way comparator to sort on (returning int, not boolean)
        (Defaults to clojure.core/compare)

sort-by

(sort-by batchsize keyfn coll)
Sort a coll in constant space, where sort order is determined
by (keyfn item)