java.lang.Object
- com.qfs.vector.impl.AVector
- - com.qfs.vector.impl.AReadOnlyVector
  - - com.qfs.vector.impl.ADelegateReadOnlyVector
    - - com.activeviam.risk.core.vector.impl.ReadOnlySparseVector

All Implemented Interfaces:

com.qfs.chunk.IArray, com.qfs.chunk.IArrayReader, com.qfs.chunk.IArrayWriter, com.qfs.chunk.IWritableArray, com.qfs.monitoring.memory.IMemoryMonitored, com.qfs.vector.IDelegateVector, com.qfs.vector.IReleasableVector, com.qfs.vector.IVector, Serializable

Direct Known Subclasses:

ReadOnlySparseVector.SparseRandomAccessVector
```
public class ReadOnlySparseVector
extends com.qfs.vector.impl.ADelegateReadOnlyVector
```
Read-only IVector implementation of a view of an underlying IVector who contains a special, compact, representation of an allegedly-sparse vector.
A "sparse" vector is a vector where a given single value (so called "implicit value") appears many time in the vector. The more times such value appears the more sparse the vector is considered. The other values in the vector are called the "explicit values".
From an external (caller) code perspective, this read-only vector implementation provides natural access to the original vector values w/o requiring any knowledge of the internal compact representation of those values.
The point of using such implementation as opposed to a naive "dense" representation of the original vector is to spare memory, since the internal compact representation gives us a memory footprint that is very close to the footprint of the explicit values only as long as the original vector size is greater than 100.
The overhead of the compact representation is so small that there very little chance that we end up consuming more memory than if using a naive representation unless the original vector is very dense. E.g., for original vector size = 100, we will consume more only if explicit values account for more than 96% of all values. For size = 10, this will happen if explicit values account for more than 70% of all values.
A good formula to estimate memory consumption by the internal representation (S values) vs. the naive representation (N values), considering a "density ratio" = # explicit values / # values, is: S/N = density ratio + 0.016 + 3/N.
One must use generateSparseVectorInternalRepresentation(double[], double) to create an adequate compact representation, based on a given vanilla (expanded) representation of any vector (allegedly sparse). Alternatively, one may use generateSparseVectorInternalRepresentation(int, double, int[], double[]) to generate such compact representation. The compact representation can then be given to ActivePivot for transfer into an off-heap vector (custom code shall not create the off-heap vector directly because it doesn't know on which NUMA node to locate it). Lastly, once ActivePivot has created the proper off-heap vector, one must use custom code to wrap that vector into an instance of ReadOnlySparseVector. This is what guarantees, ultimately, that other components can access the sparse vector content transparently (as per usual vector API), w/o having to know anything about the specifics of the internal compact representation at work.
Hence, a typical pattern for loading sparse vectors in ActivePivot would be:
1. Get vanilla (expanded) vector representation from source.
2. Feed to Datastore (in a vector field) the internal (compact) representation of that vector obtained via generateSparseVectorInternalRepresentation(double[], double) (... or, alternatively, feed the vanilla representation and use insertion-time update-where procedure to do the conversion later on).
3. Use a commit-time update-where procedure to retrieve the vector automatically created by the core product (which now wraps the double[] that contains the internal compact representation) in the vector field value, and then wrap it into a ReadOnlySparseVector instance and set this as the new the vector field value.
The internal, compact, representation consists in several double values, where:
- The first double value must be interpreted as a long value that represents the packing of two int values, and where:
  - The first int value is the size of the original vector ('vanilla' representation).
  - The second int value is the size (in 'words') of the bitmap that tracks implicit values.
- The second double value is the implicit value used to consider sparsity of the original vector.
- The B next double values must be interpreted as long values that constitute the bitmap words to track the positioning of implicit values in the original vector.
- The remaining double values are the explicit values from the original sparse vector.
See Also:

Serialized Form

Nested Class Summary

Nested Classes
Modifier and Type	Class	Description
`static class`	`ReadOnlySparseVector.ReadOnlySparseVectorBinding`	Custom binding to use more efficient custom ReadOnlySparseVector methods for aggregation methods: plus, minus and copyFrom.
`protected static class`	`ReadOnlySparseVector.SparseRandomAccessVector`	Random access version of a `ReadOnlySparseVector`.

Field Summary
- Fields inherited from class com.qfs.vector.impl.ADelegateReadOnlyVector
  underlying

Constructor Summary

Constructors
Constructor Description

ReadOnlySparseVector(com.qfs.vector.IVector underlying)

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`double`	`average()`	Mean of a vector contains the arithmetic average of its components.
`com.qfs.util.IPrimitiveIterator`	`bottomK(int k)`	Returns an `iterator` of a collection composed of the k smallest elements of the vector.
`int[]`	`bottomKIndices(int kArg)`	Return the k indices of the k smallest elements of the vector sorted in ascending order.
`void`	`copyTo(double[] dst)`
`static double[]`	`generateSparseVectorInternalRepresentation(double[] vector, double implicitValue)`	Generate an internal compact representation of a (allegedly) sparse vector.
`static double[]`	`generateSparseVectorInternalRepresentation(int vectorSize, double implicitValue, int[] explicitValueIndexes, double[] explicitValues)`	Generate an internal compact representation of a (allegedly) sparse vector.
`int`	`getBindingId()`
`int`	`getComponentType()`
`double[]`	`getDoubleArray()`	Will return the full double array containing whole SparseVector.
`protected int`	`nearestRank(int length, double r)`	Convenient method for quantile using the Nearest Rank definition of quantile
`double`	`quantileDouble(double r)`	Compute the quantile of order `r` using the Nearest Rank definition of quantile with rounding.
`int`	`quantileIndex(double r)`	Compute the quantile index of order `r` using the Nearest Rank definition of quantile with rounding.
`Object`	`read(int position)`
`double`	`readDouble(int position)`
`int`	`size()`
`com.qfs.vector.ITransientVector`	`sort()`
`com.qfs.vector.IVector`	`subVector(int start, int length)`
`com.qfs.util.IPrimitiveIterator`	`topK(int k)`	Returns an `iterator` of a collection composed of the k biggest elements of the vector.
`int[]`	`topKIndices(int kArg)`	Return the k indices of the k biggest elements of the vector sorted in ascending order.
`com.qfs.vector.IRandomAccessVector`	`toRandomAccessVector()`
`double`	`variance()`	Variance of a vector contains the arithmetic variance of its components.

Methods inherited from class com.qfs.vector.impl.ADelegateReadOnlyVector
acquireReference, cloneOnHeap, collect, getMemoryStatistic, getUnderlying, hashCode, releaseReference

Methods inherited from class com.qfs.vector.impl.AReadOnlyVector
addDouble, addFloat, addInt, addLong, copyFrom, divide, divide, minus, minusNegativeValues, minusPositiveValues, plus, plusNegativeValues, plusPositiveValues, scale, scale, scale, scale, translate, translate, translate, translate, write, writeBoolean, writeDouble, writeFloat, writeInt, writeLong

Methods inherited from class com.qfs.vector.impl.AVector
applyAsDouble, applyAsFloat, applyAsInt, applyAsLong, checkIndex, checkIndex, copyFrom, copyFrom, copyFrom, copyFrom, copyFrom, copyTo, copyTo, copyTo, copyTo, equals, isNull, quantileFloat, quantileInt, quantileLong, readBoolean, readFloat, readInt, readLong, sumDouble, sumFloat, sumInt, sumLong, toArray, toDoubleArray, toFloatArray, toIntArray, toLongArray, toString, toString

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

Methods inherited from interface com.qfs.chunk.IArrayReader
isNull, readBoolean, readFloat, readInt, readLong

Methods inherited from interface com.qfs.chunk.IArrayWriter
addDouble, addFloat, addInt, addLong, writeBoolean, writeDouble, writeFloat, writeInt, writeLong

Methods inherited from interface com.qfs.vector.IVector
applyAsDouble, applyAsFloat, applyAsInt, applyAsLong, copyFrom, copyFrom, copyFrom, copyFrom, copyFrom, copyFrom, copyTo, copyTo, copyTo, copyTo, divide, divide, equals, minus, minusNegativeValues, minusPositiveValues, plus, plusNegativeValues, plusPositiveValues, quantileFloat, quantileInt, quantileLong, scale, scale, scale, scale, sumDouble, sumFloat, sumInt, sumLong, toArray, toDoubleArray, toFloatArray, toIntArray, toLongArray, translate, translate, translate, translate, write

- Constructor Detail
  - ReadOnlySparseVector
```
public ReadOnlySparseVector(com.qfs.vector.IVector underlying)
```
- Method Detail
  - generateSparseVectorInternalRepresentation
```
public static double[] generateSparseVectorInternalRepresentation(double[] vector,
                                                                  double implicitValue)
```
    Generate an internal compact representation of a (allegedly) sparse vector.
    We use Double.compare(double, double) to judge whether or not a given vector value is equal to the specified implicitValue.
    What is returned by this utility method is expected to be wrapped into a vector of double elements, which itself must be wrapped into a ReadOnlySparseVector.
    Refer to ReadOnlySparseVector class Javadoc for more details about the internal representation format and a typical design pattern for loading such data in ActivePivot.
    
    Parameters:
    
    vector - The vanilla (expanded) representation of the vector.
    
    implicitValue - The value to be considered as implicit when looking at vector sparsity.
    
    Returns:
    
    Internal compact representation of the (allegedly) sparse vector.
  - generateSparseVectorInternalRepresentation
```
public static double[] generateSparseVectorInternalRepresentation(int vectorSize,
                                                                  double implicitValue,
                                                                  int[] explicitValueIndexes,
                                                                  double[] explicitValues)
```
    Generate an internal compact representation of a (allegedly) sparse vector.
    What is returned by this utility method is expected to be wrapped into a vector of double elements, which itself must be wrapped into a ReadOnlySparseVector.
    Refer to ReadOnlySparseVector class Javadoc for more details about the internal representation format and a typical design pattern for loading such data in ActivePivot.
    
    Parameters:
    
    vectorSize - The size of the vector.
    
    implicitValue - The value to be considered as implicit when looking at vector sparsity.
    
    explicitValueIndexes - The indexes, by ascending order, where we find explicit values. Might be null if no explicit value at all.
    
    explicitValues - The explicit values (ordered consistently w.r.t. explicitValueIndexes). Might be null if no explicit value at all.
    
    Returns:
    
    Internal compact representation of the (allegedly) sparse vector.
  - getComponentType
```
public int getComponentType()
```
    Specified by:
    
    getComponentType in interface com.qfs.vector.IVector
    
    Overrides:
    
    getComponentType in class com.qfs.vector.impl.ADelegateReadOnlyVector
  - getBindingId
```
public int getBindingId()
```
  - size
```
public int size()
```
    Specified by:
    
    size in interface com.qfs.vector.IVector
    
    Overrides:
    
    size in class com.qfs.vector.impl.ADelegateReadOnlyVector
  - read
```
public Object read(int position)
```
    Specified by:
    
    read in interface com.qfs.chunk.IArrayReader
    
    Overrides:
    
    read in class com.qfs.vector.impl.AVector
  - readDouble
```
public double readDouble(int position)
```
    Specified by:
    
    readDouble in interface com.qfs.chunk.IArrayReader
    
    Overrides:
    
    readDouble in class com.qfs.vector.impl.AVector
  - toRandomAccessVector
```
public com.qfs.vector.IRandomAccessVector toRandomAccessVector()
```
  - subVector
```
public com.qfs.vector.IVector subVector(int start,
                                        int length)
```
  - copyTo
```
public void copyTo(double[] dst)
```
    Specified by:
    
    copyTo in interface com.qfs.vector.IVector
    
    Overrides:
    
    copyTo in class com.qfs.vector.impl.AVector
  - sort
```
public com.qfs.vector.ITransientVector sort()
```
  - topK
```
public com.qfs.util.IPrimitiveIterator topK(int k)
```
    Returns an iterator of a collection composed of the k biggest elements of the vector. The iterator iterates over the k biggest elements from the smallest element to the biggest element. The algorithm used for this version of topK is a dynamic based algorithm. Because most of the vector is expected to be the implicit value we only need to iterate over the explicit values to find the topK. We should expect the performance of this algorithm to be O(e + k). e is the total count of explicits that exist in this sparse vector and k is the value passed as a parameter for the number of elements to return. If you expect that your implicits will be smaller then your "bottom" most topK then you can also expect this algorithm to just have a performance of O(e). This is because we skip iterating over implicit values entirely in this case. Additionally some temporary memory is used in this algorithm. The memory used will be of size O(k) where k is the value passed as a parameter for the number of elements to return.
    
    Parameters:
    
    k - The number of elements to return.
    
    Returns:
    
    An iterator over the k biggest elements.
  - topKIndices
```
public int[] topKIndices(int kArg)
```
    Return the k indices of the k biggest elements of the vector sorted in ascending order. The algorithm used for this version of topKIndices is a dynamic based algorithm. One of the challenges of finding indices for values, implicit or explicit, is that the only way to know what index a value originally was at is to query the internal bitmap of the underlying array. This requires the sparse vector to iterate through the bitmap to find the necessary explicit and implicit indexes for values. Because this algorithm finds the original indexes for values in the bitmap the performance size of this algorithm is O(n) where n is the original size of the vector uncompressed. However it should be noted that is the worst case scenario performance and can only be achieved in a few ways. One such way is if k = n then o(n) will always be expected. Additionally some temporary memory is used in this algorithm. The memory used will be of size O(k) where k is the value passed as a parameter for the number of elements to return.
    
    Parameters:
    
    kArg - The number of elements to return.
    
    Returns:
    
    The k indices of the k biggest elements.
  - bottomK
```
public com.qfs.util.IPrimitiveIterator bottomK(int k)
```
    Returns an iterator of a collection composed of the k smallest elements of the vector. The iterator iterates over the k smallest elements from the biggest element to the smallest element. The algorithm used for this version of bottomK is a dynamic based algorithm. Because most of the vector is expected to be the implicit value we only need to iterate over the explicit values to find the topK. We should expect the performance of this algorithm to be O(e + k). e is the total count of explicits that exist in this sparse vector and k is the value passed as a parameter for the number of elements to return. If you expect that your implicits will be greater then your "top" most bottomK then you can also expect this algorithm to just have a performance of O(e). This is because we skip iterating over implicit values entirely in this case. Additionally some temporary memory is used in this algorithm. The memory used will be of size O(k) where k is the value passed as a parameter for the number of elements to return.
    
    Parameters:
    
    k - The number of elements to return.
    
    Returns:
    
    An iterator over the k smallest elements.
  - bottomKIndices
```
public int[] bottomKIndices(int kArg)
```
    Return the k indices of the k smallest elements of the vector sorted in ascending order. The algorithm used for this version of bottomKIndices is a dynamic based algorithm. One of the challenges of finding indices for values, implicit or explicit, is that the only way to know what index a value originally was at is to query the internal bitmap of the underlying array. This requires the sparse vector to iterate through the bitmap to find the necessary explicit and implicit indexes for values. Because this algorithm finds the original indexes for values in the bitmap the performance size of this algorithm is O(n) where n is the original size of the vector uncompressed. However it should be noted that is the worst case scenario performance and can only be achieved in a few ways. One such way is if k = n then o(n) will always be expected. Additionally some temporary memory is used in this algorithm. The memory used will be of size O(k) where k is the value passed as a parameter for the number of elements to return.
    
    Parameters:
    
    kArg - The number of elements to return.
    
    Returns:
    
    The k indices of the k smallest elements.
  - quantileDouble
```
public double quantileDouble(double r)
```
    Compute the quantile of order r using the Nearest Rank definition of quantile with rounding.
    For instance, if r = 1/2, the function returns the median. For r = 1/4, the first quartile. For r = 3/4, the third quartile.
    This method is only supported for vectors that can retrieve their content as double without information loss. This implementation for the SparseVector uses bottomK or topK depending on if the rank is higher, equal or less then 0.5. The performance growth of this method is equal to the performance of either of those methods though we never have to iterate more then half of the entire underlying array in practice.
    
    Specified by:
    
    quantileDouble in interface com.qfs.vector.IVector
    
    Overrides:
    
    quantileDouble in class com.qfs.vector.impl.AVector
    
    Parameters:
    
    r - the order of the quantile, a double in ]0.0, 1.0]
    
    Returns:
    
    the quantile of order r
  - quantileIndex
```
public int quantileIndex(double r)
```
    Compute the quantile index of order r using the Nearest Rank definition of quantile with rounding.
    For instance, if r = 1/2, the function returns the median. For r = 1/4, the first quartile. For r = 3/4, the third quartile.
    This method is only supported for vectors that can retrieve their content as double without information loss. This implementation for the SparseVector uses bottomKIndices or topKIndices depending on if the rank is higher, equal or less then 0.5. The performance growth of this method is equal to the performance of either of those methods though we never have to iterate more then half of the entire underlying array in practice.
    
    Parameters:
    
    r - the order of the quantile, a double in ]0.0, 1.0]
    
    Returns:
    
    the quantile of order r
  - nearestRank
```
protected int nearestRank(int length,
                          double r)
```
    Convenient method for quantile using the Nearest Rank definition of quantile
    
    Parameters:
    
    length - The length of the vector.
    
    r - The radix, as a number between 0 and 1.
    
    Returns:
    
    The nearest rank.
  - average
```
public double average()
```
    Mean of a vector contains the arithmetic average of its components. The algorithm used for this version of average is a simple iterative based algorithm. The performance size of this algorithm should be expected to be Î¸(e) where e is the total count of explicits that exist in this sparse vector.
    
    Specified by:
    
    average in interface com.qfs.vector.IVector
    
    Overrides:
    
    average in class com.qfs.vector.impl.ADelegateReadOnlyVector
    
    Returns:
    
    the mean of the vector
  - variance
```
public double variance()
```
    Variance of a vector contains the arithmetic variance of its components. The algorithm used for this version of average is a Shifted Data based algorithm. In such an algorithm we take a value that should be close to the mean (the implicit value) and shift all of our explicit values before summing or squaring them. The larger our vector is the closer to the true variance this algorithm becomes. The performance size of this algorithm should be expected to be Î¸(e) where e is the total count of explicits that exist in this sparse vector.
    
    Returns:
    
    the variance of the vector
  - getDoubleArray
```
public double[] getDoubleArray()
```
    Will return the full double array containing whole SparseVector. This is more efficient than calling getDouble for every value in the vector.
    
    Returns:
    
    double array containing whole sparse vector

Class ReadOnlySparseVector

Nested Class Summary

Field Summary

Fields inherited from class com.qfs.vector.impl.ADelegateReadOnlyVector

Constructor Summary

Method Summary

Methods inherited from class com.qfs.vector.impl.ADelegateReadOnlyVector

Methods inherited from class com.qfs.vector.impl.AReadOnlyVector

Methods inherited from class com.qfs.vector.impl.AVector

Methods inherited from class java.lang.Object

Methods inherited from interface com.qfs.chunk.IArrayReader

Methods inherited from interface com.qfs.chunk.IArrayWriter

Methods inherited from interface com.qfs.vector.IVector

Constructor Detail

ReadOnlySparseVector

Method Detail

generateSparseVectorInternalRepresentation

generateSparseVectorInternalRepresentation

getComponentType

getBindingId

size

read

readDouble

toRandomAccessVector

subVector

copyTo

sort

topK

topKIndices

bottomK

bottomKIndices

quantileDouble

quantileIndex

nearestRank

average

variance

getDoubleArray