Position: Leverage Foundational Models for Black-Box Optimization

This paper explores the use of Large Language Models (LLMs) to address challenges in Black Box Optimization (BBO), particularly multi-modality and task generalization. The authors propose framing BBO around sequence-based foundation models, leveraging LLMs’ capabilities to retrieve information from various modalities resulting in superior optimization strategies.

Motivation: Traditional BBO techniques struggle with multi-modality and task generalization

The position paper by [Son24P] advocates using LLM-based foundation models for Black Box Optimization (BBO). The goal of BBO is to optimize an objective function given only the evaluations of the function (i.e., no gradients or other second-order information about the function). A common example of a BBO task is neural network architecture search, where the objective is to maximize classification accuracy based on different architectures. Classical BBO approaches include grid search, random search, and Bayesian optimization.

Figure 1. [Son24P], Figure 1. Foundation Models can learn priors from a wide variety of sources, such as world knowledge, domain-specific documents, and actual experimental evaluations. Such models can then perform black-box optimization over various search spaces (e.g. hyperparameters, code, natural language) and feedbacks (numeric values, categorical ratings, and subjective sentiment).

More recent BBO algorithms typically try to incorporate inductive biases or priors into the search problem, e.g., domain knowledge, parameter constraints, the search history etc. One particular goal of these approaches is to perform meta-learning, i.e., to develop algorithms that can automatically provide priors for various tasks from different domains without additional task-specific training. However, constructing reliable priors that work across multiple tasks and can take in data from multiple modalities (values, text, images) is challenging.

Position: LLMs can process multi-model data and be fine-tuned to different tasks

Figure 2. [Son24P], Figure 2. Black-box optimization loop with sequential foundation models. Using metadata $m$ and history $h$, the model proposes candidates $x$ which are checked for feasibility, evaluated, and then appended to the history.

The central point made in this paper is that LLMs are a promising candidate for tackling this challenge (Figure 1). The key idea is to interpret BBO as a problem of learning a sequence: Given a search space $\mathcal{X}$ containing hyperparameter setting $x \in \mathcal{X}$ and a sequence or history $h_{1:t-1}$ of previous settings $x_{1:t-1}$ and corresponding objective function values $y_{1:t-1}$, the goal is to predict the next element in the sequence, i.e., a new hyperparameter setting $x_t$.

Transformer-based LLMs exceed at sequence learning and meet several critical requirements for foundational BBO:

  • Multi-modality: They can process large amounts of data from various modalities.
  • Pre-training: They can be pre-trained to acquire extensive world knowledge.
  • Fine-tuning: They can be fine-tuned with task-specific information.

The workflow for using LLM-based foundation models for BBO is visualized in Figure 2.

The authors also give an overview of common techniques for BBO, summarizing the increasing capabilities as we move from hand-crafted genetic algorithms, model-based BBO and feature-based meta-learning to sequence-based, attention-based, token-based and finally LLM-based algorithms (table 1, paper section 3.2).

Table 1. [Son24P], Table 2. Classes of methods organized by their capabilities. Note: Method names based on increasing development order - e.g. “Attention-based” can consist of techniques up to their development such as meta-learning, but not LLMs.

Finally, the authors collect a set of challenges and open questions for BBO with LLMs (paper section 4). They argue that there is a need for

  1. better data representation and multimodality datasets for training models on multi-modal tasks
  2. a common guideline or format for encoding BBO (meta)-data to be processed by LLMs
  3. large open-source evaluation datasets
  4. better generalization and customization of LLMs for different tasks
  5. new benchmarks for metadata-rich BBO to better test the capabilities of LLMs

This paper is an interesting read and provides a comprehensive overview of the limitations of classical BBO methods and the possibilities of Large Language Models for Black Box Optimization.

References

In this series