public final class SparkUtils extends Object
| Modifier and Type | Method and Description |
|---|---|
static co.cask.cdap.api.data.format.StructuredRecord.Builder |
cloneRecord(co.cask.cdap.api.data.format.StructuredRecord record,
co.cask.cdap.api.data.schema.Schema outputSchema,
String predictionField)
Creates a builder based off the given record.
|
static Map<Integer,Integer> |
getCategoricalFeatureInfo(co.cask.cdap.api.data.schema.Schema inputSchema,
String featuresToInclude,
String featuresToExclude,
String labelField,
String cardinalityMapping)
Get the feature to cardinality mapping provided by the user.
|
static Map<String,Integer> |
getFeatureList(co.cask.cdap.api.data.schema.Schema inputSchema,
String featuresToInclude,
String featuresToExclude,
String predictionField)
Get the feature list of the features that have to be used for training/prediction depending on the
featuresToInclude or featuresToInclude list.
|
static List<String> |
getInputFieldValue(co.cask.cdap.api.data.format.StructuredRecord input,
String inputField,
com.google.common.base.Splitter splitter)
Gets the input field for feature generation.
|
static co.cask.cdap.api.data.schema.Schema |
getOutputSchema(co.cask.cdap.api.data.schema.Schema inputSchema,
String predictionField) |
static void |
validateConfigParameters(co.cask.cdap.api.data.schema.Schema inputSchema,
String featuresToInclude,
String featuresToExclude,
String predictionField,
String cardinalityMapping)
Validate the config parameters for the spark sink and spark compute classes.
|
static void |
validateFeatureGeneratorConfig(co.cask.cdap.api.data.schema.Schema inputSchema,
Map<String,String> map,
String pattern)
Validate config parameters for feature generator classes.
|
static void |
validateLabelFieldForTrainer(co.cask.cdap.api.data.schema.Schema inputSchema,
String labelField)
Validate label field for trainer.
|
static void |
validateTextField(co.cask.cdap.api.data.schema.Schema inputSchema,
String key)
Validate the input field to be used for text based feature generation.
|
public static void validateConfigParameters(co.cask.cdap.api.data.schema.Schema inputSchema,
@Nullable
String featuresToInclude,
@Nullable
String featuresToExclude,
String predictionField,
@Nullable
String cardinalityMapping)
inputSchema - schema of the received record.featuresToInclude - features to be used for training/prediction.featuresToExclude - features to be excluded when training/predicting.predictionField - field containing the prediction values.public static Map<String,Integer> getFeatureList(co.cask.cdap.api.data.schema.Schema inputSchema, @Nullable String featuresToInclude, @Nullable String featuresToExclude, String predictionField)
inputSchema - schema of the received record.featuresToInclude - features to be used for training/prediction.featuresToExclude - features to be excluded when training/predicting.predictionField - field containing the prediction values.public static Map<Integer,Integer> getCategoricalFeatureInfo(co.cask.cdap.api.data.schema.Schema inputSchema, @Nullable String featuresToInclude, @Nullable String featuresToExclude, String labelField, @Nullable String cardinalityMapping)
inputSchema - schema of the received record.featuresToInclude - features to be used for training/prediction.featuresToExclude - features to be excluded when training/predicting.labelField - field containing the prediction values.cardinalityMapping - feature to cardinality mapping specified for categorical features.public static void validateLabelFieldForTrainer(co.cask.cdap.api.data.schema.Schema inputSchema,
String labelField)
inputSchema - schema of the received record.labelField - field from which to get the prediction.public static co.cask.cdap.api.data.format.StructuredRecord.Builder cloneRecord(co.cask.cdap.api.data.format.StructuredRecord record,
co.cask.cdap.api.data.schema.Schema outputSchema,
String predictionField)
public static co.cask.cdap.api.data.schema.Schema getOutputSchema(co.cask.cdap.api.data.schema.Schema inputSchema,
String predictionField)
public static void validateFeatureGeneratorConfig(co.cask.cdap.api.data.schema.Schema inputSchema,
Map<String,String> map,
String pattern)
inputSchema - input Schemamap - Map of the input fields to map to the transformed output fields.public static void validateTextField(co.cask.cdap.api.data.schema.Schema inputSchema,
String key)
inputSchema - input schema coming in from the previous stagekey - text field on which to perform text based feature generationpublic static List<String> getInputFieldValue(co.cask.cdap.api.data.format.StructuredRecord input, String inputField, com.google.common.base.Splitter splitter)
input - input Structured RecordinputField - field to use for feature generationsplitter - Splitter object to be used for splitting the input stringCopyright © 2017 Cask Data, Inc. Licensed under the Apache License, Version 2.0.