Package com.linkedin.dagli.clustering
Class KMeansCluster
java.lang.Object
com.linkedin.dagli.util.cloneable.AbstractCloneable<S>
com.linkedin.dagli.producer.AbstractChildProducer<R,I,S>
com.linkedin.dagli.transformer.AbstractPreparableTransformer1<double[],com.linkedin.dagli.vector.ScoredVector,NearestDoubleArray,KMeansCluster>
com.linkedin.dagli.clustering.KMeansCluster
- All Implemented Interfaces:
com.linkedin.dagli.producer.ChildProducer<com.linkedin.dagli.vector.ScoredVector>,com.linkedin.dagli.producer.Producer<com.linkedin.dagli.vector.ScoredVector>,com.linkedin.dagli.producer.ProducerType<com.linkedin.dagli.vector.ScoredVector,com.linkedin.dagli.transformer.PreparableTransformer<com.linkedin.dagli.vector.ScoredVector,NearestDoubleArray>>,com.linkedin.dagli.transformer.PreparableTransformer<com.linkedin.dagli.vector.ScoredVector,NearestDoubleArray>,com.linkedin.dagli.transformer.PreparableTransformer1<double[],com.linkedin.dagli.vector.ScoredVector,NearestDoubleArray>,com.linkedin.dagli.transformer.Transformer<com.linkedin.dagli.vector.ScoredVector>,com.linkedin.dagli.transformer.Transformer1<double[],com.linkedin.dagli.vector.ScoredVector>,com.linkedin.dagli.transformer.TransformerWithInputBound<double[],com.linkedin.dagli.vector.ScoredVector>,com.linkedin.dagli.util.named.Named,java.io.Serializable,java.lang.Cloneable
@ValueEquality public class KMeansCluster extends com.linkedin.dagli.transformer.AbstractPreparableTransformer1<double[],com.linkedin.dagli.vector.ScoredVector,NearestDoubleArray,KMeansCluster>
Clusters vectors into k groups via the KMeans++ algorithm and generates the [0, k-1] cluster assignment for each
vector.
- See Also:
- Serialized Form
-
Nested Class Summary
-
Field Summary
Fields Modifier and Type Field Description protected com.linkedin.dagli.producer.Producer<? extends double[]>_input1 -
Constructor Summary
Constructors Constructor Description KMeansCluster()Creates a new KMeans clusterer with k = 10 and unlimited iterations.KMeansCluster(int k)Creates a new KMeans clusterer with the specified value of k and unlimited iterations. -
Method Summary
Modifier and Type Method Description protected KMeansClusterclone()protected booleancomputeEqualsUnsafe(KMeansCluster arg0)protected intcomputeHashCode()booleanequals(java.lang.Object arg0)protected com.linkedin.dagli.reducer.ClassReducerTablegetClassReducerTable()protected java.lang.StringgetDefaultName()protected java.lang.StringgetDefaultShortName()protected java.util.Collection<? extends com.linkedin.dagli.reducer.Reducer<? super KMeansCluster>>getGraphReducers()protected com.linkedin.dagli.handle.ProducerHandle<KMeansCluster>getHandle()protected com.linkedin.dagli.producer.Producer<? extends double[]>getInput1()protected java.util.List<com.linkedin.dagli.producer.Producer<?>>getInputList()java.lang.StringgetName()protected com.linkedin.dagli.clustering.KMeansCluster.PreparergetPreparer(com.linkedin.dagli.preparer.PreparerContext context)protected java.lang.reflect.TypegetResultSupertype()java.lang.StringgetShortName()protected booleanhandleEquality(KMeansCluster arg0)protected inthandleHashCode()protected booleanhasAlwaysConstantResult()booleanhasConstantResult()inthashCode()protected booleanhasName()com.linkedin.dagli.transformer.internal.PreparableTransformer1InternalAPI<double[],com.linkedin.dagli.vector.ScoredVector,NearestDoubleArray,KMeansCluster>internalAPI()protected com.linkedin.dagli.dag.Graph<java.lang.Object>subgraph()java.lang.StringtoString()protected KMeansClusterwithAllInputs(com.linkedin.dagli.producer.Producer<? extends double[]> arg0)com.linkedin.dagli.input.DenseFeatureVectorInput<KMeansCluster>withInput()KMeansClusterwithInput(com.linkedin.dagli.producer.Producer<? extends com.linkedin.dagli.math.vector.DenseVector> vectorInput)Creates a copy of this instance that will obtainDenseVectorinputs from the specifiedProducer.protected KMeansClusterwithInput1(com.linkedin.dagli.producer.Producer<? extends double[]> arg0)KMeansClusterwithInputArray(com.linkedin.dagli.producer.Producer<? extends double[]> arrayInput)Creates a copy of this instance that will obtain its inputs from the specifiedProducerof double arrays.KMeansClusterwithK(int k)Sets the number of clusters that will be computed.KMeansClusterwithMaxIterations(int maxIterations)Returns a copy of this KMeansCluster transformer with the specified number of maximum iterations that will be used to optimize the clusters.KMeansClusterwithName(java.lang.String arg0)KMeansClusterwithSeed(long seed)Sets the random seed (by default, 0) used for initialization.Methods inherited from class com.linkedin.dagli.transformer.AbstractPreparableTransformer1
createInternalAPI, hasIdempotentPreparer
-
Field Details
-
_input1
protected com.linkedin.dagli.producer.Producer<? extends double[]> _input1
-
-
Constructor Details
-
KMeansCluster
public KMeansCluster()Creates a new KMeans clusterer with k = 10 and unlimited iterations. -
KMeansCluster
public KMeansCluster(int k)Creates a new KMeans clusterer with the specified value of k and unlimited iterations.- Parameters:
k- the number of clusters to be computed
-
-
Method Details
-
withInput
public KMeansCluster withInput(com.linkedin.dagli.producer.Producer<? extends com.linkedin.dagli.math.vector.DenseVector> vectorInput)Creates a copy of this instance that will obtainDenseVectorinputs from the specifiedProducer. The highest non-zero element index must be less thanInteger.MAX_VALUE.- Parameters:
vectorInput- a producer that will provideDenseVectorinputs to this transformer- Returns:
- a copy of this instance that will get its inputs from the specified
Producer
-
withInput
- Returns:
- a configurator for specifying the input(s) comprising the vector to be clustered
-
withInputArray
public KMeansCluster withInputArray(com.linkedin.dagli.producer.Producer<? extends double[]> arrayInput)Creates a copy of this instance that will obtain its inputs from the specifiedProducerof double arrays.- Parameters:
arrayInput- a producer that will provide double[] inputs to this transformer- Returns:
- a copy of this instance that will get its inputs from the specified
Producer
-
withK
Sets the number of clusters that will be computed. The default number of clusters is 10.- Parameters:
k- the number of clusters- Returns:
- a copy of this KMeansCluster with the specified k
-
withSeed
Sets the random seed (by default, 0) used for initialization. Having fixed seeds ensures that results are consistent from run-to-run.- Parameters:
seed- the random seed to use- Returns:
- a copy of this KMeansCluster with the specified seed
-
withMaxIterations
Returns a copy of this KMeansCluster transformer with the specified number of maximum iterations that will be used to optimize the clusters. -1 indicates "unlimited". The default is unlimited.- Parameters:
maxIterations- the maximum number of iterations that will be performed, or -1 for no limit- Returns:
- a copy of this KMeansCluster, modified to use the specified maximum number of iterations.
-
getPreparer
protected com.linkedin.dagli.clustering.KMeansCluster.Preparer getPreparer(com.linkedin.dagli.preparer.PreparerContext context)- Specified by:
getPreparerin classcom.linkedin.dagli.transformer.AbstractPreparableTransformer1<double[],com.linkedin.dagli.vector.ScoredVector,NearestDoubleArray,KMeansCluster>
-
getInputList
protected java.util.List<com.linkedin.dagli.producer.Producer<?>> getInputList() -
getInput1
protected com.linkedin.dagli.producer.Producer<? extends double[]> getInput1() -
withAllInputs
protected KMeansCluster withAllInputs(com.linkedin.dagli.producer.Producer<? extends double[]> arg0) -
withInput1
-
getName
public java.lang.String getName()- Specified by:
getNamein interfacecom.linkedin.dagli.util.named.Named- Specified by:
getNamein interfacecom.linkedin.dagli.producer.Producer<R extends java.lang.Object>
-
getShortName
public java.lang.String getShortName()- Specified by:
getShortNamein interfacecom.linkedin.dagli.util.named.Named- Specified by:
getShortNamein interfacecom.linkedin.dagli.producer.Producer<R extends java.lang.Object>
-
hasName
protected boolean hasName() -
getDefaultName
protected java.lang.String getDefaultName() -
getDefaultShortName
protected java.lang.String getDefaultShortName() -
withName
-
getGraphReducers
protected java.util.Collection<? extends com.linkedin.dagli.reducer.Reducer<? super KMeansCluster>> getGraphReducers() -
getClassReducerTable
protected com.linkedin.dagli.reducer.ClassReducerTable getClassReducerTable() -
hasAlwaysConstantResult
protected boolean hasAlwaysConstantResult() -
hasConstantResult
public final boolean hasConstantResult()- Specified by:
hasConstantResultin interfacecom.linkedin.dagli.producer.Producer<R extends java.lang.Object>
-
subgraph
protected com.linkedin.dagli.dag.Graph<java.lang.Object> subgraph() -
hashCode
public final int hashCode()- Overrides:
hashCodein classjava.lang.Object
-
equals
public final boolean equals(java.lang.Object arg0)- Overrides:
equalsin classjava.lang.Object
-
computeEqualsUnsafe
-
computeHashCode
protected int computeHashCode() -
handleEquality
-
handleHashCode
protected int handleHashCode() -
internalAPI
public com.linkedin.dagli.transformer.internal.PreparableTransformer1InternalAPI<double[],com.linkedin.dagli.vector.ScoredVector,NearestDoubleArray,KMeansCluster> internalAPI()- Specified by:
internalAPIin interfacecom.linkedin.dagli.producer.Producer<R extends java.lang.Object>
-
clone
- Overrides:
clonein classcom.linkedin.dagli.util.cloneable.AbstractCloneable<S extends com.linkedin.dagli.producer.AbstractProducer<R,I,S>>
-
getHandle
-
toString
public java.lang.String toString()- Overrides:
toStringin classjava.lang.Object
-
getResultSupertype
protected java.lang.reflect.Type getResultSupertype()
-