Transformers
Transformers are used for many different tasks, but the most common inmachine learning are:
- Feature extraction
- Model scoring
Feature extraction is the process of taking one of more features fromand input dataset and deriving new features from them. In the case ofdata frames, the features come from the input data frame and are writtento the output data frame.
Some examples of feature extraction are:
- One hot encoding - Converting an integer value to a vector of 1s and0s
- Feature selection - Running analysis to determine which features aremost effective for training a predictive ML algorithm (i.e. CHI2)
- Math - Basic mathematical functions, such as dividing two features byeach other or taking the log of a feature
Regression is used to predict a continuousnumeric value, such as the price of a car or a home. Regression models,for the most part, operate on vectors of doubles called a “featurevector”. The feature vector contains all of the known information aboutwhat is being predicted. In the case of predicting a price of a house,the feature vector will have things like the encoded region where thehouse is, the square footage, how many bathrooms there are, how old itis, etc.
See a list of supported regression models.
Classification is used to predict categorical information. An example ismaking a binary prediction of whether or not to give a consumer a loan. Another example is predicting what type of sound is contained ina .wav file, or whether or not there is a person in and image.
Clustering is used to assign a label to similar data (thus categorizing/clustering it). It is similar toclassification in that the predictions are discrete values from a set.Unlike classification models though, clustering models are part of the unsupervised family of models and do not operate on labeled data.This is useful for feature engineering, anomaly detection, as well as many other tasks.
.
Transformers can do ANYTHING! This is just a sample of the most commonuses of them. However, you can build transformers to resize images,resample sound data, import data from different data sources or anythingelse you can think of. The sky is the limit.