CaltechTHESIS
  A Caltech Library Service

Boosting Boosting

Citation

Appel, Ron (2017) Boosting Boosting. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/Z9N29V0J. https://resolver.caltech.edu/CaltechTHESIS:06052017-221842127

Abstract

Machine learning is becoming prevalent in all aspects of our lives. For some applications, there is a need for simple but accurate white-box systems that are able to train efficiently and with little data.

"Boosting" is an intuitive method, combining many simple (possibly inaccurate) predictors to form a powerful, accurate classifier. Boosted classifiers are intuitive, easy to use, and exhibit the fastest speeds at test-time when implemented as a cascade. However, they have a few drawbacks: training decision trees is a relatively slow procedure, and from a theoretical standpoint, no simple unified framework for cost-sensitive multi-class boosting exists. Furthermore, (axis-aligned) decision trees may be inadequate in some situations, thereby stalling training; and even in cases where they are sufficiently useful, they don't capture the intrinsic nature of the data, as they tend to form boundaries that overfit.

My thesis focuses on remedying these three drawbacks of boosting. Ch.III outlines a method (called QuickBoost) that trains identical classifiers at an order of magnitude faster than before, based on a proof of a bound. In Ch.IV, a unified framework for cost-sensitive multi-class boosting (called REBEL) is proposed, both advancing theory and demonstrating empirical gains. Finally, Ch.V describes a novel family of weak learners (called Localized Similarities) that guarantee theoretical bounds and outperform decision trees and Neural Nets (as well as several other commonly used classification methods) on a range of datasets.

The culmination of my work is an easy-to-use, fast-training, cost-sensitive multi-class boosting framework whose functionality is interpretable (since each weak learner is a simple comparison of similarity), and whose performance is better than Neural Networks and other competing methods. It is the tool that everyone should have in their toolbox and the first one they try.

Item Type:Thesis (Dissertation (Ph.D.))
Subject Keywords:Boosting
Degree Grantor:California Institute of Technology
Division:Engineering and Applied Science
Major Option:Electrical Engineering
Thesis Availability:Public (worldwide access)
Research Advisor(s):
  • Perona, Pietro
Thesis Committee:
  • Perona, Pietro (chair)
  • Abu-Mostafa, Yaser S.
  • Bruck, Jehoshua
  • Hassibi, Babak
  • Yue, Yisong
Defense Date:22 May 2017
Record Number:CaltechTHESIS:06052017-221842127
Persistent URL:https://resolver.caltech.edu/CaltechTHESIS:06052017-221842127
DOI:10.7907/Z9N29V0J
Related URLs:
URLURL TypeDescription
http://resolver.caltech.edu/CaltechAUTHORS:20131007-134123893Related DocumentArticle adapted for Ch. 3
http://resolver.caltech.edu/CaltechAUTHORS:20160922-125241569Related DocumentArticle adapted for Ch. 4
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:10292
Collection:CaltechTHESIS
Deposited By: Ron Appel
Deposited On:07 Jun 2017 17:41
Last Modified:08 Nov 2023 00:44

Thesis Files

[img]
Preview
PDF - Final Version
See Usage Policy.

8MB

Repository Staff Only: item control page