When it comes to developing and refining machine learning models, hyperparameter optimization plays a crucial role in ensuring optimal performance for specific tasks. Among the various methods available, the Bayesian optimization algorithm stands out for its efficiency and effectiveness. Unlike traditional methods like random search and grid search, Bayesian optimization leverages results from previous evaluations, reducing the time needed to find the best parameters.
This method systematically refines hyperparameters, focusing on combinations most likely to yield better performance and enhancing the model’s generalization ability on test data.
In this article, we will delve into model-based search using Bayesian optimization with a practical example from Serokell’s practice.
What is a hyperparameter?
Hyperparameters in machine learning are parameters set manually before the learning process to configure the model’s structure or aid in learning. Unlike model parameters, hyperparameters are provided in advance to optimize performance.
Examples of hyperparameters include activation functions and layer architecture in neural networks, as well as the number of trees and features in random forests. The choice of hyperparameters significantly impacts model performance, leading to overfitting or underfitting.

The goal of hyperparameter optimization in machine learning is to find the hyperparameters of a given ML algorithm that deliver the best performance as measured on a validation set.
Below are examples of hyperparameters for two algorithms, random forest, and gradient boosting machine (GBM):
Algorithm | Hyperparameters |
Random forest |
|
Gradient Boosting Machine (GBM) |
|
For a more detailed explanation of hyperparameters, refer to this article.
Hyperparameter optimization approaches
There are four main approaches to hyperparameter optimization.
- Manual search: Hyperparameters are set manually, and evaluations of the model’s training are made iteratively.
- Grid search: This method automates the process by predefining a set of hyperparameters for systemic evaluation.
- Random search: This approach also automates the process, doing so by randomly sampling parameter values for evaluation.
- Model-based search: By combining automation and feedback, this approach adjusts hyperparameter values based on previous trial results.
Random search typically outperforms grid search in quickly finding the best settings. While manual search requires constant involvement, it benefits from learning with each attempt. Grid search operates independently without learning from past results. Random search is automatic but doesn’t improve over time. Model-based search combines the best features of the three approaches: it’s automated and learns from each trial.

In the upcoming sections, we’ll explore model-based search in detail.
Model-based search using Bayesian optimization
Bayesian optimization is a highly efficient strategy for optimizing objective functions that are costly to evaluate. It’s particularly beneficial in scenarios where function evaluation takes a long time, such as tuning hyperparameters for machine learning models. Bayesian optimization works by constructing a probabilistic model that maps input parameters to a probability distribution of possible outcomes. This model is then used to make informed guesses about where in the parameter space the objective function might achieve its optimal value.
The process involves two main components: the surrogate model and the acquisition function.
- The surrogate model, often a Gaussian process, estimates the objective function and continuously updates as new data points are evaluated. This model provides a measure of uncertainty or confidence in the predictions at unexplored points in the parameter space.
- The acquisition function uses the surrogate model’s predictions to determine which point in the parameter space should be evaluated next. It balances exploring new areas with high uncertainty and exploiting areas known to offer promising results.
Bayesian optimization iteratively selects the next points to evaluate by optimizing the acquisition function, assesses the objective function at these points, and updates the surrogate model with the new findings. This cycle continues until a stopping criterion is met, such as a maximum number of iterations or a satisfactory level of optimization. This method is highly effective in finding the global optimum of complex functions with a relatively small number of function evaluations, making it ideal for hyperparameter tuning and other optimization tasks where function evaluations are costly.
Surrogate model
The surrogate model (SM) is a probabilistic estimator that can fit the observed data points and quantify the uncertainty of unobserved areas. The SM approximates the unknown black-box function f(x).
Acquisition function
The acquisition function determines which areas in the domain of f(x) are worth exploiting and which are worth exploring. It assigns high values in optimal or unexplored areas, and low values in suboptimal or already sampled areas.

Example
We start by generating a binary classification dataset with 2000 samples and 25 features using scikit-learn. This dataset includes seven informative features crucial for our binary classification task, as well as ten intentionally redundant features. While designed for binary classification, this dataset can be adapted for regression analyses as well.
We then split the dataset, reserving 20% for testing and using the remaining 80% for training our model.
The next step involves optimizing the hyperparameters of a random forest model to enhance its accuracy. We define an objective function that measures our model’s performance using the negative accuracy score, turning the goal of maximizing accuracy into a minimization problem that fits our optimization framework. We outline the search space for hyperparameters, including criteria for splitting, number of trees, maximum depth, and maximum features.
Finally, we employ the Bayesian optimization algorithm using the fmin
function to systematically explore our defined search space based on performance feedback from our objective function.
This iterative process records each trial to refine our search and find the optimal hyperparameter settings. The aim is to identify the configuration that minimizes our objective function, leading to the highest possible accuracy for the random forest model on our task.
You can utilize sklear hyperopt for your own tasks.
For more information on fmin
and other aspects of Bayesian optimization, we recommend referring to this paper.
How to improve Bayesian optimization?
Two significant enhancements can significantly boost the efficiency of the Bayesian optimization algorithm.
Cold start problem
The first enhancement addresses the “cold start problem,” where the initial phase of the algorithm relies on randomly sampling hyperparameters to build the surrogate model. A more effective initiation strategy can notably reduce the iterations needed to converge to optimal values. This requires an approach that minimizes reliance on random sampling, aiming to start the optimization process closer to the most promising hyperparameter values.
The initial challenge lies in the significant impact of starting points on the time and resources required for model convergence.

Hyperparameter recommendation system
The second enhancement involves creating a hyperparameter recommendation system that functions similarly to recommendation algorithms used by streaming services. This system uses matrix factorization techniques to suggest suitable hyperparameters based on past data.
To develop such a recommender system, offline datasets described by meta-features reflecting characteristics such as data distributions, sample sizes, and column counts were used. By training models on these datasets with various hyperparameters and identifying the best-performing ones, a database linking dataset meta-features to ideal hyperparameters was built. This knowledge base guides the selection of hyperparameters for new datasets, enhancing the optimization process.
The hyperparameter recommendation system improves model optimization by analyzing dataset characteristics and comparing them to a comprehensive knowledge base. It calculates meta-features of the current dataset, such as data distribution, sample size, and feature count, and recommends hyperparameters based on the similarity to datasets with known optimal hyperparameters. This approach provides well-chosen initial hyperparameters, effectively addressing the “cold start” challenge and eliminating the need for random sampling to initiate the optimization process.
To further streamline the optimization process, a method was implemented to reduce the computational intensity of training models with various hyperparameters. Instead of direct training, a predictive model was introduced to estimate the performance of any given hyperparameter set. This predictive model was trained on a diverse collection of tasks using an offline dataset pairing hyperparameters with performance metrics. Ensemble learning can be used to build this predictor with data from various sources if such a dataset does not exist initially, ensuring a broad and robust foundation. Upon deployment, the system fine-tuned this predictor using data from its early recommendations, enhancing its accuracy for future hyperparameter suggestions.
Conclusion
In conclusion, the Bayesian optimization algorithm is a highly effective method for hyperparameter optimization in machine learning models. By utilizing a probabilistic model to guide the search for the optimal set of hyperparameters, Bayesian optimization significantly reduces the number of trials needed to find an optimal solution. This not only saves time and computational resources but also enables the development of more accurate and robust machine learning models, as demonstrated in the example discussed.