Class KMeansCluster

java.lang.Object
com.linkedin.dagli.util.cloneable.AbstractCloneable<S>
com.linkedin.dagli.producer.AbstractChildProducer<R,​I,​S>
com.linkedin.dagli.transformer.AbstractPreparableTransformer1<double[],​com.linkedin.dagli.vector.ScoredVector,​NearestDoubleArray,​KMeansCluster>
com.linkedin.dagli.clustering.KMeansCluster
All Implemented Interfaces:
com.linkedin.dagli.producer.ChildProducer<com.linkedin.dagli.vector.ScoredVector>, com.linkedin.dagli.producer.Producer<com.linkedin.dagli.vector.ScoredVector>, com.linkedin.dagli.producer.ProducerType<com.linkedin.dagli.vector.ScoredVector,​com.linkedin.dagli.transformer.PreparableTransformer<com.linkedin.dagli.vector.ScoredVector,​NearestDoubleArray>>, com.linkedin.dagli.transformer.PreparableTransformer<com.linkedin.dagli.vector.ScoredVector,​NearestDoubleArray>, com.linkedin.dagli.transformer.PreparableTransformer1<double[],​com.linkedin.dagli.vector.ScoredVector,​NearestDoubleArray>, com.linkedin.dagli.transformer.Transformer<com.linkedin.dagli.vector.ScoredVector>, com.linkedin.dagli.transformer.Transformer1<double[],​com.linkedin.dagli.vector.ScoredVector>, com.linkedin.dagli.transformer.TransformerWithInputBound<double[],​com.linkedin.dagli.vector.ScoredVector>, com.linkedin.dagli.util.named.Named, java.io.Serializable, java.lang.Cloneable

@ValueEquality
public class KMeansCluster
extends com.linkedin.dagli.transformer.AbstractPreparableTransformer1<double[],​com.linkedin.dagli.vector.ScoredVector,​NearestDoubleArray,​KMeansCluster>
Clusters vectors into k groups via the KMeans++ algorithm and generates the [0, k-1] cluster assignment for each vector.
See Also:
Serialized Form
  • Nested Class Summary

    Nested classes/interfaces inherited from class com.linkedin.dagli.transformer.AbstractPreparableTransformer1

    com.linkedin.dagli.transformer.AbstractPreparableTransformer1.InternalAPI
  • Field Summary

    Fields 
    Modifier and Type Field Description
    protected com.linkedin.dagli.producer.Producer<? extends double[]> _input1  
  • Constructor Summary

    Constructors 
    Constructor Description
    KMeansCluster()
    Creates a new KMeans clusterer with k = 10 and unlimited iterations.
    KMeansCluster​(int k)
    Creates a new KMeans clusterer with the specified value of k and unlimited iterations.
  • Method Summary

    Modifier and Type Method Description
    protected KMeansCluster clone()  
    protected boolean computeEqualsUnsafe​(KMeansCluster arg0)  
    protected int computeHashCode()  
    boolean equals​(java.lang.Object arg0)  
    protected com.linkedin.dagli.reducer.ClassReducerTable getClassReducerTable()  
    protected java.lang.String getDefaultName()  
    protected java.lang.String getDefaultShortName()  
    protected java.util.Collection<? extends com.linkedin.dagli.reducer.Reducer<? super KMeansCluster>> getGraphReducers()  
    protected com.linkedin.dagli.handle.ProducerHandle<KMeansCluster> getHandle()  
    protected com.linkedin.dagli.producer.Producer<? extends double[]> getInput1()  
    protected java.util.List<com.linkedin.dagli.producer.Producer<?>> getInputList()  
    java.lang.String getName()  
    protected com.linkedin.dagli.clustering.KMeansCluster.Preparer getPreparer​(com.linkedin.dagli.preparer.PreparerContext context)  
    protected java.lang.reflect.Type getResultSupertype()  
    java.lang.String getShortName()  
    protected boolean handleEquality​(KMeansCluster arg0)  
    protected int handleHashCode()  
    protected boolean hasAlwaysConstantResult()  
    boolean hasConstantResult()  
    int hashCode()  
    protected boolean hasName()  
    com.linkedin.dagli.transformer.internal.PreparableTransformer1InternalAPI<double[],​com.linkedin.dagli.vector.ScoredVector,​NearestDoubleArray,​KMeansCluster> internalAPI()  
    protected com.linkedin.dagli.dag.Graph<java.lang.Object> subgraph()  
    java.lang.String toString()  
    protected KMeansCluster withAllInputs​(com.linkedin.dagli.producer.Producer<? extends double[]> arg0)  
    com.linkedin.dagli.input.DenseFeatureVectorInput<KMeansCluster> withInput()  
    KMeansCluster withInput​(com.linkedin.dagli.producer.Producer<? extends com.linkedin.dagli.math.vector.DenseVector> vectorInput)
    Creates a copy of this instance that will obtain DenseVector inputs from the specified Producer.
    protected KMeansCluster withInput1​(com.linkedin.dagli.producer.Producer<? extends double[]> arg0)  
    KMeansCluster withInputArray​(com.linkedin.dagli.producer.Producer<? extends double[]> arrayInput)
    Creates a copy of this instance that will obtain its inputs from the specified Producer of double arrays.
    KMeansCluster withK​(int k)
    Sets the number of clusters that will be computed.
    KMeansCluster withMaxIterations​(int maxIterations)
    Returns a copy of this KMeansCluster transformer with the specified number of maximum iterations that will be used to optimize the clusters.
    KMeansCluster withName​(java.lang.String arg0)  
    KMeansCluster withSeed​(long seed)
    Sets the random seed (by default, 0) used for initialization.

    Methods inherited from class com.linkedin.dagli.transformer.AbstractPreparableTransformer1

    createInternalAPI, hasIdempotentPreparer

    Methods inherited from class com.linkedin.dagli.producer.AbstractChildProducer

    validate

    Methods inherited from class com.linkedin.dagli.util.cloneable.AbstractCloneable

    clone

    Methods inherited from class java.lang.Object

    finalize, getClass, notify, notifyAll, wait, wait, wait

    Methods inherited from interface com.linkedin.dagli.transformer.PreparableTransformer1

    internalAPI

    Methods inherited from interface com.linkedin.dagli.producer.Producer

    getName, getShortName, hasConstantResult, validate
  • Field Details

    • _input1

      protected com.linkedin.dagli.producer.Producer<? extends double[]> _input1
  • Constructor Details

    • KMeansCluster

      public KMeansCluster()
      Creates a new KMeans clusterer with k = 10 and unlimited iterations.
    • KMeansCluster

      public KMeansCluster​(int k)
      Creates a new KMeans clusterer with the specified value of k and unlimited iterations.
      Parameters:
      k - the number of clusters to be computed
  • Method Details

    • withInput

      public KMeansCluster withInput​(com.linkedin.dagli.producer.Producer<? extends com.linkedin.dagli.math.vector.DenseVector> vectorInput)
      Creates a copy of this instance that will obtain DenseVector inputs from the specified Producer. The highest non-zero element index must be less than Integer.MAX_VALUE.
      Parameters:
      vectorInput - a producer that will provide DenseVector inputs to this transformer
      Returns:
      a copy of this instance that will get its inputs from the specified Producer
    • withInput

      public com.linkedin.dagli.input.DenseFeatureVectorInput<KMeansCluster> withInput()
      Returns:
      a configurator for specifying the input(s) comprising the vector to be clustered
    • withInputArray

      public KMeansCluster withInputArray​(com.linkedin.dagli.producer.Producer<? extends double[]> arrayInput)
      Creates a copy of this instance that will obtain its inputs from the specified Producer of double arrays.
      Parameters:
      arrayInput - a producer that will provide double[] inputs to this transformer
      Returns:
      a copy of this instance that will get its inputs from the specified Producer
    • withK

      public KMeansCluster withK​(int k)
      Sets the number of clusters that will be computed. The default number of clusters is 10.
      Parameters:
      k - the number of clusters
      Returns:
      a copy of this KMeansCluster with the specified k
    • withSeed

      public KMeansCluster withSeed​(long seed)
      Sets the random seed (by default, 0) used for initialization. Having fixed seeds ensures that results are consistent from run-to-run.
      Parameters:
      seed - the random seed to use
      Returns:
      a copy of this KMeansCluster with the specified seed
    • withMaxIterations

      public KMeansCluster withMaxIterations​(int maxIterations)
      Returns a copy of this KMeansCluster transformer with the specified number of maximum iterations that will be used to optimize the clusters. -1 indicates "unlimited". The default is unlimited.
      Parameters:
      maxIterations - the maximum number of iterations that will be performed, or -1 for no limit
      Returns:
      a copy of this KMeansCluster, modified to use the specified maximum number of iterations.
    • getPreparer

      protected com.linkedin.dagli.clustering.KMeansCluster.Preparer getPreparer​(com.linkedin.dagli.preparer.PreparerContext context)
      Specified by:
      getPreparer in class com.linkedin.dagli.transformer.AbstractPreparableTransformer1<double[],​com.linkedin.dagli.vector.ScoredVector,​NearestDoubleArray,​KMeansCluster>
    • getInputList

      protected java.util.List<com.linkedin.dagli.producer.Producer<?>> getInputList()
    • getInput1

      protected com.linkedin.dagli.producer.Producer<? extends double[]> getInput1()
    • withAllInputs

      protected KMeansCluster withAllInputs​(com.linkedin.dagli.producer.Producer<? extends double[]> arg0)
    • withInput1

      protected KMeansCluster withInput1​(com.linkedin.dagli.producer.Producer<? extends double[]> arg0)
    • getName

      public java.lang.String getName()
      Specified by:
      getName in interface com.linkedin.dagli.util.named.Named
      Specified by:
      getName in interface com.linkedin.dagli.producer.Producer<R extends java.lang.Object>
    • getShortName

      public java.lang.String getShortName()
      Specified by:
      getShortName in interface com.linkedin.dagli.util.named.Named
      Specified by:
      getShortName in interface com.linkedin.dagli.producer.Producer<R extends java.lang.Object>
    • hasName

      protected boolean hasName()
    • getDefaultName

      protected java.lang.String getDefaultName()
    • getDefaultShortName

      protected java.lang.String getDefaultShortName()
    • withName

      public KMeansCluster withName​(java.lang.String arg0)
    • getGraphReducers

      protected java.util.Collection<? extends com.linkedin.dagli.reducer.Reducer<? super KMeansCluster>> getGraphReducers()
    • getClassReducerTable

      protected com.linkedin.dagli.reducer.ClassReducerTable getClassReducerTable()
    • hasAlwaysConstantResult

      protected boolean hasAlwaysConstantResult()
    • hasConstantResult

      public final boolean hasConstantResult()
      Specified by:
      hasConstantResult in interface com.linkedin.dagli.producer.Producer<R extends java.lang.Object>
    • subgraph

      protected com.linkedin.dagli.dag.Graph<java.lang.Object> subgraph()
    • hashCode

      public final int hashCode()
      Overrides:
      hashCode in class java.lang.Object
    • equals

      public final boolean equals​(java.lang.Object arg0)
      Overrides:
      equals in class java.lang.Object
    • computeEqualsUnsafe

      protected boolean computeEqualsUnsafe​(KMeansCluster arg0)
    • computeHashCode

      protected int computeHashCode()
    • handleEquality

      protected boolean handleEquality​(KMeansCluster arg0)
    • handleHashCode

      protected int handleHashCode()
    • internalAPI

      public com.linkedin.dagli.transformer.internal.PreparableTransformer1InternalAPI<double[],​com.linkedin.dagli.vector.ScoredVector,​NearestDoubleArray,​KMeansCluster> internalAPI()
      Specified by:
      internalAPI in interface com.linkedin.dagli.producer.Producer<R extends java.lang.Object>
    • clone

      protected KMeansCluster clone()
      Overrides:
      clone in class com.linkedin.dagli.util.cloneable.AbstractCloneable<S extends com.linkedin.dagli.producer.AbstractProducer<R,​I,​S>>
    • getHandle

      protected final com.linkedin.dagli.handle.ProducerHandle<KMeansCluster> getHandle()
    • toString

      public java.lang.String toString()
      Overrides:
      toString in class java.lang.Object
    • getResultSupertype

      protected java.lang.reflect.Type getResultSupertype()