Motivation: Traditional BBO techniques struggle with multi-modality and task generalization
The position paper by [Son24P] advocates using LLM-based foundation models for Black Box Optimization (BBO). The goal of BBO is to optimize an objective function given only the evaluations of the function (i.e., no gradients or other second-order information about the function). A common example of a BBO task is neural network architecture search, where the objective is to maximize classification accuracy based on different architectures. Classical BBO approaches include grid search, random search, and Bayesian optimization.
More recent BBO algorithms typically try to incorporate inductive biases or priors into the search problem, e.g., domain knowledge, parameter constraints, the search history etc. One particular goal of these approaches is to perform meta-learning, i.e., to develop algorithms that can automatically provide priors for various tasks from different domains without additional task-specific training. However, constructing reliable priors that work across multiple tasks and can take in data from multiple modalities (values, text, images) is challenging.
Position: LLMs can process multi-model data and be fine-tuned to different tasks
The central point made in this paper is that LLMs are a promising candidate for tackling this challenge (Figure 1). The key idea is to interpret BBO as a problem of learning a sequence: Given a search space $\mathcal{X}$ containing hyperparameter setting $x \in \mathcal{X}$ and a sequence or history $h_{1:t-1}$ of previous settings $x_{1:t-1}$ and corresponding objective function values $y_{1:t-1}$, the goal is to predict the next element in the sequence, i.e., a new hyperparameter setting $x_t$.
Transformer-based LLMs exceed at sequence learning and meet several critical requirements for foundational BBO:
- Multi-modality: They can process large amounts of data from various modalities.
- Pre-training: They can be pre-trained to acquire extensive world knowledge.
- Fine-tuning: They can be fine-tuned with task-specific information.
The workflow for using LLM-based foundation models for BBO is visualized in Figure 2.
The authors also give an overview of common techniques for BBO, summarizing the increasing capabilities as we move from hand-crafted genetic algorithms, model-based BBO and feature-based meta-learning to sequence-based, attention-based, token-based and finally LLM-based algorithms (table 1, paper section 3.2).
Finally, the authors collect a set of challenges and open questions for BBO with LLMs (paper section 4). They argue that there is a need for
- better data representation and multimodality datasets for training models on multi-modal tasks
- a common guideline or format for encoding BBO (meta)-data to be processed by LLMs
- large open-source evaluation datasets
- better generalization and customization of LLMs for different tasks
- new benchmarks for metadata-rich BBO to better test the capabilities of LLMs
This paper is an interesting read and provides a comprehensive overview of the limitations of classical BBO methods and the possibilities of Large Language Models for Black Box Optimization.