Paper Summary-XGBoost: A Scalable Tree Boosting System

3 min readNov 9, 2020

eXtreme Gradient Boosting(“XGBoost”) is famous in data competitions. It provides a method of using Tree Ensembles but also improving the computing performance on how to store data in both disc and memory. This article is a summary of what I learned from the paper(Ref.1).

XGBoost provides three split finding algorithms which are exact greedy, approximate global, and approximate local. The exact greedy algorithm would enumerate all possible split points. Therefore, it consumes time and also costs the resource of memory. Approximate global and approximate local are two different approaches to replace the exact greedy algorithm when the exact greedy algorithm couldn’t work for you or you need to have a much more efficient way to find the split point. For the approximate global/local algorithm, the key step is how to select candidates being the split point when you don’t enumerate all possibilities. One existing algorithm, quantile sketch, can only treat each instance equally. In XGBoost, it’s a weighted loss function. Therefore, a novel distributed weighted quantile sketch algorithm is introduced to solve this. It supports merge and prune operations to maintain the accuracy level.

Thinking of the sparsity, three common reasons are mentioned in the paper which are missing values, the frequency of the zero value, as well as one-hot encoding applied. Therefore, XGBoos has the awareness of sparsity to help users with this. XGBoost adds a default direction for each node. The below result of the experiment captured from the paper shows that the performance of XGBoost is much better than the basic algorithm.

In the previous paragraphs, it summarizes how XGBoost handles different algorithms and the improvement of the method. The following is about the resource of how it arranges and implements parallel computing.

XGBoost stores sorted feature values into a block, in-memory unit, for each column. The way can help XGBoost only need to read data one time and finish the calculation job for all statistics of the split candidates. Therefore, collecting the statistics per column can help in doing parallel computing.

The concept is not just applied in memory but also in the ways of storing data in disc. This is called the out-of-core setting in the paper. Two main techniques are applied to utilize disc space. One is Block Compression which the block is compressed by column. The other one is Block Sharing which is to share the data onto multiple disks.

In this paper, four data sets are used for running experiments. The authors implemented experiments to prove XGBoost functions and show how XGBoost contributes to the community. In experiments, authors compare XGBoost with popular packges, including pGBRT, Spark MLLib, H2O, Scikit-learn, and GBR from R language. In the experiment design, both a small dataset(473k rows with 700 features) and a big dataset(10M rows with 4,227 features) are applied for comparison. The output of these experiments covers the time consumption to see how these packages perform under a different number of threads or training examples. It proves the theorems mentioned in the previous sections of the paper.

Ref. 1: Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785–794).

Other references:

Slides from the author for another speak:

http://datascience.la/xgboost-workshop-and-meetup-talk-with-tianqi-chen/ https://files.speakerdeck.com/presentations/5c6dab45648344208185d2b1ab4fdc95/XGBoost-Newest.pdf

2. Slides from Martin Jullum in 2018

https://static1.squarespace.com/static/59f31b56be42d6ba6ad697b2/t/5a72f3ee8165f596c6ec1ee7/1517482994580/Presentatation+BI+lunch+XGBoost.pdf

Written by Claire Chiang