ARTML Models
ART-ML technique can be applied for all the linear regression and classification algorithms like MLR, Naïve Bayesian, SVM, LDA, PCA etc. This section details how to use artml library for building different realtime regression & classification models.
Classification models
Naive Bayesian
The Naïve Bayesian classifier is based on Bayes’ theorem with independence assumptions between attributes. In the real time version of Bayesian classifiers, we calculate the likelihood and the prior probabilities from the Basic Elements Table (BET) which can be updated in real time.
We have Gaussian & Multinomial Regression models in artml library. For the numerical attributes likelihood can be calculated from the normal distribution equation and hence Guassian method can be used. for categorical data go for multinomial model.
from artml.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb.fit(BET, 'Target1','Target2')
gnb.predict(Testing_data)
gnb.score(TestingData, y_test)
After importing GaussianNB
, for fitting the classifier input Bascic Element table (BET) & target names. Even if it a binary classification task, two target names 'class' & 'notclass' should be given. For predicting give testing dataframe as the input.
from artml.naive_bayes import MultinomialNB
mnb = MultinomialNB()
mnb.fit(BET, 'Target1','Target2')
mnb.predict(Testing_data)
mnb.score(TestingData, y_test)
Using MultinomialNB
is similar to GaussianNB. Follow similar syntax styles for categorical data also. Makesure to onehot encode all the categorical datavalues while building BET.
Linear Discriminant Analysis
LDA is based upon the concept of searching for a linear combination of attributes that best separates two classes (0 and 1) of a binary attribute. (LDA is only for binary classification)
from artml.models import lda
lda = lda.LinearDiscriminantAnalysis()
lda.fit(BET, 'Target_name')
lda.predict(TestingData)
for fitting the 'lda' classifier, Basic Element table (BET) and Targetname are the only arguments required. Unlike Naivebayes, we can give only one target name (instead of churn and notchurn give only churn variable) in the input. For predicting give testing dataframe as the input. For multiclass classification use QDA
instead of LDA.
Quadratic Discriminant Analysis
QDA is a general discriminant function with quadratic decision boundaries which can be used to classify datasets with two or more classes.
from artml.models import lda
QDA = QDA.QuadraticDiscriminantAnalysis()
QDA.fit(BET, 'Target_name')
QDA.predict(TestingData)
arguments for fitting the classifier and making the predictions are similar to lda
classification.
Note: Since LDA
& QDA
involves inversion of the covariance matrices, makesure that it is not a singular matrix. (Avoid null features & multicollinearity)
Support Vector classifier
Support Vector Machine (SVM) performs classification by finding the hyperplane that maximizes the margin between the two classes. (SVC can be used for binary or multiclass classification)
from artml.models import svm
svc = svm.LinearSVC()
svc.fit(BET, 'Target_name1', 'Target_name1' , c= 0.0001)
svc.predict(TestingData)
svc.score(TestingData, y_test)
After importing LinearSVC
, for fitting the classifier input Bascic Element table (BET) & target names. Even if it a binary classification task, two target names 'class' & 'notclass' should be given like we did in GaussianNB. Apart form BET and target names, value of c should be given as input. c
is the tuning parameter and it should be adjusted to improve the classifier performance. For predicting give testing dataframe as the input.
Note: Vary c
value in the tenth powers to find the optimum value. [1,10,100,1000,0.1,0.001,...]
Regression Models
Multiple Linear regression
Multiple Linear Regression (MLR) is a method used to model the linear relationship between a target (dependent variable) and one or more attributes (independent variables). Using MLR
in artml is very similar to sklearn regression models.
from artml.models import MLR
lr = MLR.LinearRegression()
lr.fit(BET, 'Target')
lr.predict(TestingData)
Arguments to fit the MLR model are only BET & target name. Use artml.metrics to find the R2 and other regression metrics.
Note: Avoid Multicollinearity in the features. Multicollinearity leads to highly unreliable regression coefficient estimates and large errors in the test data. Use regression feature selection techniques for choosing the best features for MLR
RidgeMLR
artml also has the flexibility for using the regularized Linear regression model. Since Lasso regression doesnt have an analytical solution, we can use Ridge regression model and update it in real time using the Basic Element Table.
from artml.models import MLR
Rlr = MLR.RidgeRegression()
Rlr.fit(BET, 'Target', c=0.1)
Rlr.predict(TestingData)
In the above arguments, c
is the regularization parameter and it can be adjusted to reduce the overfitting by penalizing the larger weights in the model parameters.
Support Vector regression
LinearSVR is one of the powerful model for regression. Implementation of Support vector regression is similar to SVC. By building new features we can introduce nonlinearity into the LinearSVR
model.
from artml.models import svm
svr = svm.LinearSVR()
svr.fit(BET, 'Target', c=0.1)
svr.predict(TestingData)
Other than BET and targetname we have c
tuning parameter in the SVR. default value of c is 0.1 and it can be adjusted to improve the model performance.