Posts

Showing posts from August, 2017

Cluster-Robust Regression in Python

Image
This is a short blog post about handling scenarios where one is investigating a correlation across some domain while controlling for an expected correlation across some other domain. If that sentence made no sense to you don't worry, here is a simple example: Research Question: To determine if Instagram users are more likely to also post on Facebook or to Twitter. Analysis Plan: Perform a t-Test to look at the difference in means of how often Instagram users are posting on Facebook vs. Instagram users posting on Twitter. Problem: Some users have Facebook accounts (Group A), some users have Twitter accounts (Group B) and some users have both (Group A/B). So we can't really use a Related Samples t-Test or an Independent Samples t-Test. Also, some users post many photos on Instagram and others only make the occasional post. We could therefore expect that the difference between users may be larger than the difference between what we actually want to measure which is Facebo

Parallel Hyper-Parameter Optimization in Python

Image
Tuning the specific hyper-parameters used in many machine learning algorithms is as much of an art as it is a science. Thankfully, we can use a few tools to increase our ability to do it effectively. One of which is  Grid Search , which is the process of creating a "Grid" of possible hyper-parameter values and then testing each possible combination of values via  k-folds Cross Validation  and choosing the "best" combination based on performance on a user-defined metric such as accuracy, area under the roc curve or sensitivity. This process is very computationally expensive, especially as the number of hyper-parameters involved increases. We can significantly reduce the time taken to perform grid search by using  parallel computing  if we have a multi-core CPU or a CPU that supports hyper-threading. The idea of parallel computing is sometimes intimidating to even veteran programmers, thankfully the work of parallel scaling can be done automatically through  SK-Le