Box Cox Transformations in Python
Many common machine learning algorithms assume data is normally distributed.
But what if your data isn't?
I experienced this frustration first hand during my undergraduate thesis, I was attempting to predict the category of online slot-machine a customer was using based on some information about their bet size, speed of play etc. Unfortunately no matter what algorithm I used or what hyper-parameter I modified, I still couldn't achieve accuracy over ~60%.
I experienced this frustration first hand during my undergraduate thesis, I was attempting to predict the category of online slot-machine a customer was using based on some information about their bet size, speed of play etc. Unfortunately no matter what algorithm I used or what hyper-parameter I modified, I still couldn't achieve accuracy over ~60%.
Nearing the end of the school semester I was reading about improving classifier performance when I had my "Eureka!" moment, of course non of these algorithms were performing well. When people play slot machines, the vast majority will bet the minimum stakes with only the most adventurous and financially well-off people betting significantly more. My data was indeed not normally distributed. A quick google search for "How to fix non-normally distributed data" revealed the Box Cox Transformation. A seemingly simple way to transform data to be closer to a normal distribution. After writing a simple script to perform the transformation my accuracy measures jumped to nearly 80%, an incredible 20% increase.
The Transformation
The transformation relies primarily on a lambda (ƛ) variable that holds a value between -5 and 5 that is automatically calculated to be optimal for your data. Specifically, the data is transformed in the following way:
Note: this does not hold for negative values, however; a second formulation can be used instead. Read more
Writing Code
While the transformation is a tad easier in R, we can still perform it relatively easily in Python using the SciPy Library. I will use some sample data from the Beurea of Transportation Statistics, specifically flight duration. My specific dataset is available here.
Lets begin by loading the data and visualizing it as a histogram:
Output:
This data, while it isn't horrible, is significantly skewed. Lets see if we can improve the shape a little.
Output:
The transformed data is now much more regularized and ready to be used or transformed further.
Conclusion
Performing Box Cox transformations is a powerful and elegant way of normalizing skewed data and can lead to significant improvements in machine learning performance. Our sample data transformation shows this:
ReplyDeleteI want this type of one.beacuse in recent days i searched this type of blog finally i got.thanks for this blog.
ccna Training in Chennai
ccna Training institute in Chennai
Python Training in Chennai
Python Classes in Chennai
Angularjs Training in Chennai
ccna Training in OMR
ccna Training in Porur
The development of artificial intelligence (AI) has propelled more programming architects, information scientists, and different experts to investigate the plausibility of a vocation in machine learning. Notwithstanding, a few newcomers will in general spotlight a lot on hypothesis and insufficient on commonsense application. machine learning projects for final year In case you will succeed, you have to begin building machine learning projects in the near future.
DeleteProjects assist you with improving your applied ML skills rapidly while allowing you to investigate an intriguing point. Furthermore, you can include projects into your portfolio, making it simpler to get a vocation, discover cool profession openings, and Final Year Project Centers in Chennai even arrange a more significant compensation.
Data analytics is the study of dissecting crude data so as to make decisions about that data. Data analytics advances and procedures are generally utilized in business ventures to empower associations to settle on progressively Python Training in Chennai educated business choices. In the present worldwide commercial center, it isn't sufficient to assemble data and do the math; you should realize how to apply that data to genuine situations such that will affect conduct. In the program you will initially gain proficiency with the specialized skills, including R and Python dialects most usually utilized in data analytics programming and usage; Python Training in Chennai at that point center around the commonsense application, in view of genuine business issues in a scope of industry segments, for example, wellbeing, promoting and account.
what should be done when there are negative values
ReplyDeleteThank you for your efforts and I am inspiried with your written style.
ReplyDeleteIELTS Coaching in Chennai
Best IELTS Coaching centres in Chennai
German Language Classes in Chennai
Japanese Language Classes in Chennai
Best Spoken English Classes in Chennai
TOEFL Classes in Chennai
content writing training in chennai
spanish language classes in chennai
IELTS Coaching in Tnagar
IELTS Coaching in OMR
Great post. keep sharing such a worthy information
ReplyDeleteSoftware Testing Training in Chennai
Software Testing Training in Bangalore
Software Testing Training in Coimbatore
Software Testing Training in Madurai
Best Software Testing Institute in Bangalore
Software Testing Course in Bangalore
Software Testing Training Institute in Bangalore
Selenium Course in Bangalore
Your good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this.
ReplyDeleteangular js training in chennai
angular js online training in chennai
angular js training in bangalore
angular js training in hyderabad
angular js training in coimbatore
angular js training
angular js online training
Machine Learning Projects for Final Year machine learning projects for final year
ReplyDeleteDeep Learning Projects assist final year students with improving your applied Deep Learning skills rapidly while allowing you to investigate an intriguing point. Furthermore, you can include Deep Learning projects for final year into your portfolio, making it simpler to get a vocation, discover cool profession openings, and Deep Learning Projects for Final Year even arrange a more significant compensation.
Python Training in Chennai Python Training in Chennai Angular Training
I am really enjoying reading your well written articles. It looks like you spend a lot of effort and time on your blog. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work.
ReplyDeletepython training in chennai
Nice blog post so thanks a lot for sharing this great blog post.. keep more post for sharing.. have a nice day.Notary Public Lawyer in Cambridge
ReplyDeleteMicrosoft Office 2007 Free Download With Full Product Key. Tools for designing and drawing are included as well as animations, transitions, slideshow formats,.MS Office 2007 Download With Crack
ReplyDelete