A method for selecting a subset of best items w.r.t. a black-box reward function: Given π items, the algorithm searches for an optimal π π’ππ ππ‘ of a maximum number of π items, which maximizes a blackbox reward function π(π π’ππ ππ‘).
The algorithm generates new subset candidates based on the novelty of the items while also considering what is the ratio of presence of an item that exist among the top previously tried subsets (by sorting them wrt rewards achieved).
The algorithm is not yet perfect but just takes few thousand trials to find 4 out of 5 best items from a set of 100 items where one would need to make up to few millions of trials with random search to achieve the same local maximum.
Example Use-case: Select best combination of k stocks out of all US stocks for training an asset allocation model.