Scratch work only.

Inspired by Austin Kleon’s Show Your Work, I am recording notes, snippets, and curiosities from my Data Science work. You won’t find any finished projects here, only scratch work.

Links to day job

Perioperative Data Scientist

UAB Assistant Professor

Scientific publications

Notes

Ryan Melvin 3/12/21 Ryan Melvin 3/12/21

Hyperopt, part 3 (conditional parameters)

The (shockingly) little Hyperopt documentation that exists mentions conditional hyperparameter tuning. (For example, I only need a degree parameter if my SVM has a polynomial kernel). However, after trying three different examples of how to use conditional parameters, I was ready to give up — because none of them worked! Then, I found a Kaggle tutorial that explained I have to unpack the conditions whenever they apply. Scikit-learn, for example, can’t do that for me. So, here is a working (for me at least) example of how to use conditional hyperparameters in Hyperopt with scikit-learn classifiers.

Ryan Melvin 3/11/21 Ryan Melvin 3/11/21

Scikit-optimize

I’m continuing to explore tools that automate hyperparameter tuning for machine learning. Today’s explorations brought me to scikit-optimize, which has thorough documentation, unlike Hyperopt. Additionally, there is a drop-in replacement method — BayesSearchCV — for GridSearchCV() or from RandomizedSearchCV() from scikit-learn. Despite the nearly unforgiveable lack of documentation, it does seem that Hyperopt is something of a standard of practice in the Machine Learning world. And its extensibility does make it powerful and broadly applicable (i.e., not just to scikit-learn models).

Ryan Melvin 3/11/21 Ryan Melvin 3/11/21

Hyperopt, part 2

As expected, the base Hyperopt package for tuning machine learning parameters gives much more control than Hyperopt-sklearn. However, there is virtually no documentation for it. It is only thanks to a Kaggle tutorial and a Medium post that I was able to figure out how to do anything with Hyperopt.

Ryan Melvin 3/9/21 Ryan Melvin 3/9/21

R boot

I’ve used the boot library in R before, but I did not realize just how simple it could be. I worried about creating grids of synthetic data to sample from. Today I learned, thanks to this post on R-bloggers, that all you really need is a wrapped around your model call. I don’t think I could make a better example than they did, so I’ll simply suggest you go read it.

Ryan Melvin 3/8/21 Ryan Melvin 3/8/21

Hyperopt

I am interested in automating hyperparameter tuning for machine learning models. I favor grid-searches, manually expanding the grid whenever a “best” parameter falls on the edge (see Jason Brownlee’s post on the topic). Today, I came across the Python package Hyperopt and its Scikit-Learn-Specific wrapper Hyperopt-sklearn, which have built-in algorithms for strategically searching a space of parameters. In a quick test, though, Hyperopt-sklearn returned a higher accuracy model than my own “manual” tuning on a random forest. Though, I cannot find in the Hyperopt-sklearn documentation how to specify a cross-validaiton method. So the comparison is likely not a fair one. Hyperopt (non-Sklearn) seems to give more control over the space and cross-validation method employed. I plan to play with that soon to see what things look like with more control.