post https://api-lib.bambu.life/api/autoMl/v2/dataRetrieval
Description
This endpoint is to retrieve the dataset, and return a dictionary that lists sample size, information of categoric and numeric features in the dataset.
Request Body
Name | Datatype | Description | Mandatory | Sample value | Notes |
---|---|---|---|---|---|
clientId | String | It is meant for identification purpose as information are stored under unique id value | Yes | ||
customers | Array of Dictionary | Array containing each customer’s informationThe fields are different per project, based on what we’re getting from the client’s DB. | The array is mandatoryFor each customer’s information in the dictionary, the only mandatory field is customerID | [ { "gender": "male", "age": 67, "personalIncome": 24299, "customerId": "0", "nationality": "US", "race": "American", "platformCountry": "US" }, { "gender": "female", "age": 20, "personalIncome": 7790, "customerId": "1", "nationality": "US", "race": "American", "platformCountry": "US" } . . . ] | The set of key-value pair dictionary in the array object always changes. And the input validation of the set of key-value pair is defined and set by the customer |
Response Body
Name | Datatype | Description | Sample value | No. of decimal places | Notes |
---|---|---|---|---|---|
baseCategoricFeatures | Array of string | List of categorical features of each customer | [ "race", "gender", "platformCountry", "nationality", "agentId" ] | This is the enum for user to select the categoric features he is interest in | |
baseNumericFeatures | Array of string | List of numerical features of each customer | [ "age", "personalIncome", "savingsRatio" ] | This is the enum for user to select the numeric features he is interest in | |
categoricFeaturesInfo | Array of dictionaries | Contains dataType, missing ,nunique and unique values for each categoric feature | "categoricFeaturesInfo": [ { "info": { "dataType": "categoric", "missing": 0, "nunique": 2, "unique": [ "male", "female" ] }, "name": "gender" }, { "info": { "dataType": "categoric", "missing": 0, "nunique": 1, "unique": [ "US" ] }, "name": "nationality" } ] | ||
dataType | String | Type of feature | categoric | ||
missing | Integer | Number of missing data for particular feature | 0 | ||
nunique | Integer | Number of unique values | 2 | ||
unique | Array | List of unique value | [“female”, “male”] | ||
numericFeaturesInfo | Array of dictionaries | Contains information for each numeric feature | "numericFeaturesInfo": [ { "info": { "25Percentile": 31, "50Percentile": 46, "75Percentile": 62, "dataType": "numeric", "kurtosis": -0.50211, "max": 100, "mean": 49.48949, "min": 19, "missing": 0.0, "nunique": 82.0, "skewness": 0.62364, "standardDeviation": 21.89702 }, "name": "age" }, { "info": { "25Percentile": 2807.5, "50Percentile": 5923, "75Percentile": 10727, "dataType": "numeric", "kurtosis": 2.30218, "max": 35101, "mean": 8077.2092, "min": 8, "missing": 0, "nunique": 976, "skewness": 1.56952, "standardDeviation": 7381.92973 }, "name": "personalIncome"} ] | Contains "25percentile", "50percentile", "75percentile", "dataType", "kurtosis", "max", "mean", "min", "missing", "nunique", "skewness", "standardDeviation". | |
25Percentile | Float | 30 | 5 | ||
50Percentile | Float | 42 | 5 | ||
75Percentile | Float | 55 | 5 | ||
dataType | String | Numeric | |||
kurtosis | Float | 1.12121 | 5 | ||
max | Float | 70 | 5 | ||
mean | Float | 42.5646 | 5 | ||
min | Float | 18 | 5 | ||
missing | Integer | 0 | 0 | ||
nunique | Integer | 53 | 0 | ||
skewness | Float | 0.07156 | 5 | ||
standardDeviation | Float | 14.19541 | 5 | ||
numberOfCategoricFeatures | Integer | Total number of categorical features for each customer | 5 | 0 | |
numberOfNumericFeatures | Integer | Total number of categorical features for each customer | 3 | 0 | |
sampleSize | Integer | The Length of the "customers” array | 5000 | 0 |