Welcome to the page of StreamSVM!


This page is for the code of StreamSVM. StreamSVM Ver 1.0.1 is now published here.

What is StreamSVM?

StreamSVM is the fastest implementation to learn linear SVM with large dataset that cannot fit in memory in your computer.


the code requires the following
intel tbb
tested with the following versions of the packages
- kyotocabinet-1.2.72.tar.gz
- libboost 1.33.1
- intel tbb40_20120408oss


---- StreamSVM Ver 1.0.0 streamsvm.tar.gz (18KB) streamsvm.zip (19KB)

---- StreamSVM Ver 1.0.1 streamsvm.tar.gz (19KB) streamsvm.zip (24KB)


This is the result of training SVM with "webspam" dataset. we split the data and used the first 80% of dataset the size of which is about 19GB.

The standard package of liblinear cannot treat with a big data unless we have sufficient memory. Otherwise memory swapping will make the training almost impossible. Selective block minimization scheme(SBM), which is implemented in liblinear-cdblock, is the alternative when data is larger than available memory capacity. Our implementation of Dual Cached Loops scheme, StreamSVM, performs much better than SBM, and that is competitive to the performance of liblinear with sufficient memory. (I would like to simulate the performance of liblinear when we restrict the use of memory, but I have no idea to do it.)

For more detail you can look at the referred paper.


"Linear Support Vector Machines via Dual Cached Loops" @KDD2012, S. Matsushima, S.V.N. Vishwanathan, A. J. Smola