Formula
Add new features to your dataset.
Inputs
- Data: input dataset
Outputs
- Data: dataset with additional features
Formula allows computing new columns by combining the existing ones with a user-defined expression. The resulting column can be categorical, numerical or textual.
For numeric variables, it sufices to provide a name and an expression.
- List of constructed variables
- Add or remove variables
- New feature name
- Expression in Python
- Select a feature
- Select a function
- Produce a report
- Press Send to communicate changes
The following example shows construction of a categorical variable: its value is "lower" is "sepal length" is below 6, "mid" if it is at least 6 but below 7, and "higher" otherwise. Note that spaces need to be replaced by underscores (sepal_length
).
- List of variable definitions
- Add or remove variables
- New feature name
- Expression in Python
- If checked, the feature is put among meta attributes
- Select a feature to use in expression
- Select a function to use in expression
- Optional list of values, used to define their order
- Press Send to compute and output data
Hints
If you are unfamiliar with Python math language, here's a quick introduction.
Expressions can use the following operators:
+
,-
,*
,/
: addition, subtraction, multiplication, division//
: integer division%
: remainder after integer division**
: exponentiation (for square root square by 0.5)<
,>
,<=
,>=
less than, greater than, less or equal, greater or equal==
equal!=
not equal- if-else: value
if
condition else other-value (see the above example
See more here.