framed.external.core
->set
(->set path)
Construct an ExternalSet from a file path
aggregate-by
(aggregate-by key-func coll)
(aggregate-by key-func map-func coll)
(aggregate-by key-func map-func batchsize coll)
Groups items in coll according to (key-func item) and uses map-func to
transform the final partition values. Returns a lazy seq of partitions,
of the form [k (map-func vs)]
Invariant: Output will be sorted with respect to key-func
Ex:
(aggregate-by
:user
(fn [events] (count events))
10000
[{:user "a" :event "Login"}
{:user "b" :event "Signup"}
{:user "a" :event "Purchase"}
{:user "c" :event "Add To Cart"}])
; => (["a" 2] ["b" 1] ["c" 1])
distinct
(distinct coll)
Return a sequence of distinct elements in coll.
Order of returned elements is undefined.
intersection
(intersection s0 s1)
(intersection batchsize s0 s1)
Return the intersection of two comparable sequences as a lazy seq
Ex:
(def s1 (external/set [1 2 3]))
(def s2 (external/set [2 3 4 5]))
(e/intersection s1 s2)
; => (2 3)
keyed-partition-by
(keyed-partition-by f coll)
Applies f to each value in coll, splitting on each different f value.
Returns a lazy seq of pairs, each containing the split value and the seq
of values in the partition
Note: lazy in the partitions produced. To accomplish this, f is applied
twice to each element in partitions, so is unsuitable when f is expensive or
has side-effects.
See http://stackoverflow.com/questions/24738261/lazy-partition-by
set
(set coll)
Construct an ExternalSet from coll
shuffle
(shuffle coll)
(shuffle rng coll)
Shuffle coll in constant space and return a lazy seq of the results
rng - optional java.util.Random for deterministic testing
sort
(sort batchsize coll)
(sort batchsize compx coll)
Sorts coll in constant space and return a lazy seq of the results
batchsize - Integer representing max number of elements to write to a file
compx - 3-way comparator to sort on (returning int, not boolean)
(Defaults to clojure.core/compare)
sort-by
(sort-by batchsize keyfn coll)
Sort a coll in constant space, where sort order is determined
by (keyfn item)