Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance Bug Report] Specific input cause time difference in SparseVector.Java #179

Open
SomeoneAlice opened this issue Aug 17, 2021 · 1 comment

Comments

@SomeoneAlice
Copy link

We are working on the Algorithmic Complexity Denial-of-Service problem and detected a performance bug from your code.

We didn’t create a pull request because we're not sure whether this bug can be triggered from the user interface. We also do not understand the functionality of this code snippet as you do. Thanks for your understanding.

Outcome

In SparseVector.Java the method sortImpl would take more time for specific input.

Reasons

The sortImpl function uses quick sort algorithm. sortIndice has checked if the input array is sorted to avoid unnecessary time consumption, but there are still some problems. The sortImpl function implements a quick sort algorithm with center-most pivot, it can matigate the complexity problem in most cases. But when a well-designed malicious case was input, the ACDoS would be triggered with algorithm complexity increased to O(n^2).

Suggestion

It's inevitable that quick sort alogrithm takes O(n^2) time in specific case. It may be useful to limit the input size to avoid the ACDoS attack.

Repeatability

A simplified test case is provided here.

import java.nio.ByteBuffer;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.TreeMap;
import java.util.function.BiConsumer;

/**
 * A sparse vector represented by an indices array and a values array.
 */
public class SparseVector{

	private static final long serialVersionUID = -3756357155623064883L;
	/**
	 * Size of the vector. n = -1 indicates that the vector size is undetermined.
	 * <p>
	 * <p>Package private to allow access from {@link MatVecOp} and {@link BLAS}.
	 */
	int n;

	/**
	 * Column indices.
	 * <p>
	 * Package private to allow access from {@link MatVecOp} and {@link BLAS}.
	 */
	int[] indices;

	/**
	 * Column values.
	 * <p>
	 * Package private to allow access from {@link MatVecOp} and {@link BLAS}.
	 */
	double[] values;

	/**
	 * Construct an empty sparse vector with undetermined size.
	 */
	public SparseVector() {
		this(-1);
	}

	/**
	 * Construct an empty sparse vector with determined size.
	 */
	public SparseVector(int n) {
		this.n = n;
		this.indices = new int[0];
		this.values = new double[0];
	}

	/**
	 * Construct a sparse vector with the given indices and values.
	 *
	 * @throws IllegalArgumentException If size of indices array and values array differ.
	 * @throws IllegalArgumentException If n >= 0 and the indices are out of bound.
	 */
	public SparseVector(int n, int[] indices, double[] values) {
		this.n = n;
		this.indices = indices;
		this.values = values;
		checkSizeAndIndicesRange();
	}

	/**
	 * Construct a sparse vector with given indices to values map.
	 *
	 * @throws IllegalArgumentException If n >= 0 and the indices are out of bound.
	 */
	public SparseVector(int n, Map <Integer, Double> kv) {
		this.n = n;
		int nnz = kv.size();
		int[] indices = new int[nnz];
		double[] values = new double[nnz];

		int pos = 0;
		for (Map.Entry <Integer, Double> entry : kv.entrySet()) {
			indices[pos] = entry.getKey();
			values[pos] = entry.getValue();
			pos++;
		}

		this.indices = indices;
		this.values = values;
		checkSizeAndIndicesRange();

	}

	/**
	 * Check whether the indices array and values array are of the same size,
	 * and whether vector indices are in valid range.
	 */
	private void checkSizeAndIndicesRange() {
		if (indices.length != values.length) {
			throw new IllegalArgumentException("Indices size and values size should be the same.");
		}
		for (int i = 0; i < indices.length; i++) {
			if (indices[i] < 0 || (n >= 0 && indices[i] >= n)) {
				throw new IllegalArgumentException("Index out of bound.");
			}
		}
	}

	/**
	 * Sort the indices and values using quick sort.
	 */
	private static void sortImpl(int[] indices, double[] values, int low, int high) {
		int pivotPos = (low + high) / 2;
		int pivot = indices[pivotPos];
		indices[pivotPos] = indices[high];
		indices[high] = pivot;
		double t = values[pivotPos];
		values[pivotPos] = values[high];
		values[high] = t;

		int pos = low - 1;
		for (int i = low; i <= high; i++) {
			if (indices[i] <= pivot) {
				pos++;
				int tempI = indices[pos];
				double tempD = values[pos];
				indices[pos] = indices[i];
				values[pos] = values[i];
				indices[i] = tempI;
				values[i] = tempD;
			}
		}
		if (high > pos + 1) {
			sortImpl(indices, values, pos + 1, high);
		}
		while (pos - 1 > low && indices[pos - 1] == pivot) {
			pos--;
		}
		if (pos - 1 > low) {
			sortImpl(indices, values, low, pos - 1);
		}
	}
	
	private void sortIndices() {
		boolean outOfOrder = false;
		for (int i = 0; i < this.indices.length - 1; i++) {
			if (this.indices[i] > this.indices[i + 1]) {
				outOfOrder = true;
				break;
			}
		}
		if (outOfOrder) {
			sortImpl(this.indices, this.values, 0, this.indices.length - 1);
		}
	}
	
	public static void main(String[] args) {
		int[] indices1 = new int[]{2,4,6,8,10,11,3,7,1,9,5}; //malicious input 
		int[] indices2 = new int[]{1,3,9,2,7,8,4,11,10,6,5}; //random input
		double[] values = new double[]{5,2,3,7,3,7,5,8,3,4,6};
		SparseVector MyVector1 = new SparseVector(12,indices1,values);
		SparseVector MyVector2 = new SparseVector(12,indices2,values);
		long startTime,endTime;
		startTime=System.nanoTime();   
		MyVector1.sortIndices();
		endTime=System.nanoTime();   
		System.out.println("malicious input: "+(endTime-startTime)+"ns");  
		startTime=System.nanoTime();     
		MyVector2.sortIndices();
		endTime=System.nanoTime();   
		System.out.println("random input: "+(endTime-startTime)+"ns"); 
	}
	
}

@weibozhao
Copy link
Collaborator

weibozhao commented Aug 27, 2021

Hi, someoneAlice

thanks for your suggestion to sparseVector performance.

I run the code you provided. The stability of this code is not very well. If I change the test order, the performance will vary greatly.

Below I give a test code, The performance difference between the two data is less than 20%.

  @Test
public void testPerformance() {
	final int N = 1000000;
	SparseVector[] malicious_vecs = new SparseVector[N];
	for (int i = 0; i < N; ++i) {
		malicious_vecs[i] = new SparseVector(12,
			new int[] {2, 4, 6, 8, 10, 11, 3, 7, 1, 9, 5},
			new double[] {5, 2, 3, 7, 3, 7, 5, 8, 3, 4, 6});
	}
	SparseVector[] random_vecs = new SparseVector[N];
	for (int i = 0; i < N; ++i) {
		random_vecs[i] = new SparseVector(12,
			new int[] {1, 3, 9, 2, 7, 8, 4, 11, 10, 6, 5},
			new double[] {5, 2, 3, 7, 3, 7, 5, 8, 3, 4, 6});
	}
	long startTime, endTime;
	startTime = System.nanoTime();
	for (int i = 0; i < N; ++i) {
		random_vecs[i].sortIndices();
	}
	endTime = System.nanoTime();
	System.out.println("random input: " + (endTime - startTime) + "ms");

	startTime = System.nanoTime();
	for (int i = 0; i < N; ++i) {
		malicious_vecs[i].sortIndices();
	}
	endTime = System.nanoTime();
	System.out.println("malici input: " + (endTime - startTime) + "ms");
}

In addition, in the field of machine learning, the indices of SparseVector have a certain randomness, so the problem of quick sort algorithm maybe not very serious.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants