To qualify the question with an example, consider the problem of generating the power set of a set of integers (or some subset of that set).

For example, consider the set of integers from 1 up to 30 and then consider generating all the possible subsets of some given length, from that set. Start with subsets of 1 element, then generate all subsets with 2 elements etc up to some number k of elements, where k can be a number lower than 30, say up to 5. Then consider a set of 31 elements and again generate subsets from 1 through 5 or six (a ‘reasonable’ number). And so on.

Now, if you try to do that and conditional upon the power of your machine, I predict that at some point you will hit a wall. At some point, there will be a k such that trying to generate all subsets of size k+1 will be impractical. Where, by ‘impractical’ you may read ‘freezing’/’stalling’ system.

Please note that during the process (loop) of generating the subsets, no other operation is performed-just the Length[Subsets[set,{k}]] instruction is executed (I used Length[] under the impression that as soon as Subsets[] returned, the Length[] function would make the list garbage collected-if that is a correct term-thereby freeing memory resources for the following subsets).

Also note, that what consistently appeared during several trials was the fact that although the cpu cores were roughly idling, the system ram was seriously devoted to the task (almost 100% of memory at various points of the tests).

The code I used is the following:

csubs = Compile[

{{sample, _Integer, 1}, {k, _Integer}},

Length[Subsets[sample, {k}]],

RuntimeAttributes -> {Listable},

Parallelization -> True,

CompilationTarget -> “C”

]

subs[sample_, k_] := subs[sample, k] = csubs[sample, k]

Map[(

{T, k, S} = #;

tag = Map[

StringJoin,

MapAt[

ToString, {{“Sample size = “, T}, {“Breaks = “, k}, {“Subsets = “, S}},

{All, 2}]];

tmp = PrintTemporary[tag];

If[

NumericQ[timing[T, k]],

Null,

{timing[T, k], junk} = RepeatedTiming[subs[Take[RANGE, T], k], 10]

];

NotebookDelete[tmp];

{T, k, S, timing[T, k]}

) &, selection]

where selection is a list of {sample size, subsets, Binomial[sample size, subsets]} and RANGE=Range[100]

So, to reiterate the question, does something like what is illustrated in the code above require more cores or more memory to run sufficiently (or a different strategy of attacking the prob altogether?). Please take into consideration, that all those subsets generated would be further processed, in more realistic versions of this code ie the code inside the compiled function would contain more elaborate operations, than mere calls to Length[].

=================

Related: Alternative to Subsets to generate k-combinations

– jkuczm

Oct 9 at 19:08

=================

1 Answer

1

=================

[A bit long for a comment]

This specific task, extracting all subsets of some modest size from a larger set, does require memory. Even if you pack the result it will take up space.

ss=Subsets[Range[31],{6}];

Developer`PackedArrayQ[ss]

(* Out[3]= False *)

ByteCount[ss]

(* Out[4]= 141366032 *)

We can of course pack this but still it’s only a constant compression factor.

ss2 = Developer`ToPackedArray[ss];

Developer`PackedArrayQ[ss2]

(* Out[7]= True *)

ByteCount[ss2]

(* Out[8]= 35341640 *)

The upshot is if you want to do processing on these, it might make sense to swap speed for memory and just generate them one (or a modest batch) at a time, process, generate next, process…

Of course if there is a reasonable limit on how large the set of subsets will be, then generating all in one go is likely to have the best performance.

edit

One can specify a range for which subsets are desired, according to the ordering produced (lexicographic, but that’s probably not important).

Subsets[Range[7], {3}, {1, 4}]

Subsets[Range[7], {3}, {5, 8}]

(* Out[6]= {{1, 2, 3}, {1, 2, 4}, {1, 2, 5}, {1, 2, 6}}

Out[7]= {{1, 2, 7}, {1, 3, 4}, {1, 3, 5}, {1, 3, 6}} *)

The upshot is that this provided one means to break the problem into chunks. There are other ways of course; for example just think of how one might compute subsets recursively.

In terms of speed, packing after the fact will not help for the particular example where all that was computed are subset lengths. Where it might be useful is when the subsequent processing can benefit from input that is packed.

end edit

Is there a way to break up something like Subsets[<69>,{7}]? Does mma have something analogous to python’s generators? This computation has about a billion lists as output and I think it is ‘chunks’ of large Subsets[] that cause the bottleneck. This is a regression of Log[timing] on sample size and subset size without and with packed arrays respectively !fitted models output. There doesn’t seem to be much difference.

– user42582

Oct 9 at 6:30

I responded to chunking and packing vs not speed issue in an edit.

– Daniel Lichtblau

Oct 9 at 19:01