From 3f817658bbd4c21367f16fc5767668eb00393903 Mon Sep 17 00:00:00 2001 From: Philip Cho Date: Wed, 29 May 2019 00:40:42 +0000 Subject: [PATCH] Add ACM SIGIR 2019 paper --- block_distributed.html | 73 ++++++++++++++++++++++++++++++++++++++++++ index.html | 13 ++++++++ 2 files changed, 86 insertions(+) create mode 100644 block_distributed.html diff --git a/block_distributed.html b/block_distributed.html new file mode 100644 index 0000000..0482d4a --- /dev/null +++ b/block_distributed.html @@ -0,0 +1,73 @@ + + + + + + + + + + + Treelite: toolbox for decision tree deployment + + + +
+

[← Go back to profile]

+

+ Block-distributed Gradient Boosted Trees +

+

+with +Theodore Vasiloudis and +Henrik Boström +

+

+Paper presented at +ACM SIGIR (2019) +

+

Download

+ +

Synopsis

+

+ The Gradient Boosted Tree (GBT) algorithm is one of + the most popular machine learning algorithms used in production, for + tasks that include Click-Through Rate (CTR) prediction and + learning-to-rank. To deal with the massive datasets available today, + many distributed GBT methods have been proposed. However, they all + assume a row-distributed dataset, addressing scalability only with + respect to the number of data points and not the number of features, + and increasing communication cost for high-dimensional data. In order + to allow for scalability across both the data point and feature + dimensions, and reduce communication cost, we propose + block-distributed GBTs. We achieve communication + efficiency by making full use of the data sparsity and adapting the + Quickscorer algorithm to the block-distributed setting. We evaluate our + approach using datasets with millions of features, and demonstrate that + we are able to achieve multiple orders of magnitude reduction in + communication cost for sparse data, with no loss in accuracy, while + providing a more scalable design. As a result, we are able to reduce + the training time for high-dimensional data, and allow more + cost-effective scale-out without the need for expensive network + communication. +

+

Publication Details

+ +

[← Go back to profile]

+
+ + diff --git a/index.html b/index.html index 56e6c5d..e57bf6c 100644 --- a/index.html +++ b/index.html @@ -86,6 +86,19 @@

Experience

Peer-reviewed publications