Preprocessor

  1. This endpoint is to preprocess data, and output: X_train, X_test, y_train and y_test for use in subsequent endpoints.

  2. It mainly includes two classes:

  3. Feature-selector: A class for performing feature selection for machine learning or data preprocessing. This includes 4 different methods to identify features for removal: a) find columns with a missing percentage greater than a specified threshold (default = 60%). b) find columns with a single unique value c) find collinear variables with a correlation greater than a specified correlation coefficient (default = 0.8). d) find columns with a number of values that is 95% of the sample sizes

  4. Pre_processor: A class for performing data split, scaling, LDA and skewness correction for numerical data, SKBest chi-square to select categorical features.

Request Body

NameDatatypeDescriptionMandatorySample ValuesNotes
clientIdStringIt is meant for identification purpose as information are stored under unique id valueYes
customersArray of DictionaryArray containing each customer’s personal information. The fields are different per project, based on what we’re getting from the client’s DBThe array is mandatory. For each customer’s information in the dictionary, the only mandatory field is customerID"nationality", "gender", "age", "race", "platformCountry", "personalIncome", "savingsRatio", "agentId", "customerId" values of each customer.The set of key-value pair dictionary in the array object always changes. And the input validation of the set of key-value pair is defined and set by the customer
goalsArray of DictionaryArray containing each customer’s goal information. The fields are different per project, based on what we’re getting from the client’s DBThe array is mandatory. For each customer’s information in the dictionary, the only mandatory fields are customerID and goalType"customerId", "goalType", "status", "goalName", "goalValue", "goalStartDate", "goalEndDate", "goalPriority", "initialInvestment", "contributionAmount", "contributionFrequency", "lastContributionDate", "riskProfileId", "modelPortfolioId", "id", "createdAt", "modifiedAt", "createdBy", "modifiedBy".The set of key-value pair dictionary in the array object always changes. And the input validation of the set of key-value pair is defined and set by the customer

Response Body

NameDatatypeDescriptionSample ValuesNo. of decimalsNotes
categoricFeaturesForModelArrayArray of categoric features included in training[“gender_male”, ]
categoricImportanceArray of dictionaryImportance of each categorical feature{ "name": "gender_male", "value": 1.38217 }5
nameStringName of categoric features"gender_male"
valueNumberImportance value of the feature5
categorical_feature placeholderNumber0
collinearDictionaryContains corr_features, corr_values, and drop_features5
corr_featuresDictionaryFeatures that are highly correlated with drop_features, and will be included for training{“0”: “gender_male”}
corr_valuesDictionaryCorrelation values{“0”: -1}
drop_featuresDictionaryFeatures that are highly correlated with corr_features, and will be dropped{“0”: “gender_female” }
missingFractionOfFeaturesArray of dictionariesratio of missing value for each feature{ "name": "gender", "value": 0 }
nameStringName of featuregender"
valueNumbermissing fraction of the feature5
feature placeholderDictionary
numericFeaturesForModelArrayArray of numerical features including in trainingnumericFeaturesForModel: [“age”, “personalIncome”]
numericalImportanceArray of dictionariesImportance of each numerical feature{ "name": "personalIncome", "value": 6.51383 }
nameStringName of feature
valueNumberImportance value of the feature5
numerical_feature placeholderNumber5
skewAfterTransformArray of dictionariesskewness of each feature after transformation"skewAfterTransform": [ { "name": "age", "value": 0.04826 },… ]
nameStringName of the feature
valueNumberSkewness of the feature after transformation5
numerical_feature placeholderNumber5
skewB4TransformArray of dictionariesskewness of each feature before transformation"skewB4Transform": [ { "name": "age", "value": 0.58215 }, … ]5PowerTransformer will be applied for features with a skewness > 0.5 or < -0.5
nameStringName of the feature
valueNumberSkewness of the feature before transformation5
numerical_feature placeholderNumber5
uniqueValueInCustomersDictionaryContains unique values of each categoric_features and numeric_features.
categoricFeaturesArray of dictionariesNumber of unique values in each categoric feature{ "name": "nationality", "value": 1 }, { "name": "race", "value": 1 },
nameStringname of the feature in customers
valueIntegerNumber of unique value for each feature0
numericFeaturesNumber of unique values in each numeric feature
categoric_fetaures placeholderNumber0
numericFeaturesArray of dictionariesNumber of unique values in each numeric feature[ { "name": "age", "value": 82 }, { "name": "personalIncome", "value": 976 } ]
numerical_feature placeholderNumber0
uniqueValueInGoalsArray of dictionariesUnique goal types[ { "name": "goalType", "value": 9 } ]
nameStringName of goal
valueIntegerNumber of unique goals0
prediction_key_name placeholderNumber0
Language
Authorization
Bearer
Click Try It! to start a request and see the response here!