Quantcast
Channel: SCN : Discussion List - SAP HANA Developer Center
Viewing all articles
Browse latest Browse all 6412

Need help with Logistic regression prediction

$
0
0

Hi All,

 

I am facing issue with respect to logistic regression/forecast with logistic regression PAL functions in SAP HANA. I am getting incorrect predictions. Below is my situation:

 

I am having 10 records where data is about whether a customer buys a product or not which is represented by 0 – not buy and 1 – buy.

  1. i.e. Customer 1001 whose salary is 50000 and age is 45 has bought a product = 1.

Whereas customer 1003 with salary 25000 and age 23 hasn’t bought the product.

The data is in such a manner that customer with low salary and young age do not buy the product whereas customers with more salary and are older buy the product.

 

The variables which determines whether a customer buys or not are salary and age.  The training data is sent to logistic regression function and a coefficient table is got. Below is the code for the same which also has the test data:

SETSCHEMA TRAINING;

 

-- LOGISTIC REGRESSION PMML DATA code starts

 

DROPTYPE PAL_T_RG_DATA;

DROPTYPE PAL_T_RG_PARAMS;

DROPTYPE PAL_T_RG_COEFF;

DROPTYPE PAL_T_RG_PMML;

DROPTABLE PAL_RG_SIGNATURE;

DROPTABLE RG_PARAMS;

DROPTABLE RG_COEFF;

DROPTABLE RG_PMML;

 

CREATETYPE PAL_T_RG_DATA ASTABLE (CUSTID INTEGER, SALARY DOUBLE, AGE INTEGER, BOUGHT INTEGER);

CREATETYPE PAL_T_RG_PARAMS ASTABLE (NAME VARCHAR(50), INTARGS INTEGER, DOUBLEARGS DOUBLE, STRINGARGS VARCHAR (100));

CREATETYPE PAL_T_RG_COEFF ASTABLE (ID INTEGER, AI DOUBLE);

CREATETYPE PAL_T_RG_PMML ASTABLE (ID INTEGER, PMML VARCHAR(5000));

 

CREATECOLUMNTABLE PAL_RG_SIGNATURE (ID INTEGER, TYPENAME VARCHAR(100), DIRECTION VARCHAR(100));

TRUNCATETABLE PAL_RG_SIGNATURE;

INSERTINTO PAL_RG_SIGNATURE VALUES (1, 'TRAINING.PAL_T_RG_DATA', 'in');

INSERTINTO PAL_RG_SIGNATURE VALUES (2, 'TRAINING.PAL_T_RG_PARAMS', 'in');

INSERTINTO PAL_RG_SIGNATURE VALUES (3, 'TRAINING.PAL_T_RG_COEFF', 'out');

INSERTINTO PAL_RG_SIGNATURE VALUES (4, 'TRAINING.PAL_T_RG_PMML', 'out');

SELECT * FROM TRAINING.PAL_RG_SIGNATURE;

GRANTSELECTON TRAINING.PAL_RG_SIGNATURE toSYSTEM;

 

CALL"SYSTEM"."AFL_WRAPPER_ERASER" ('PAL_RG');

CALLSYSTEM.AFL_WRAPPER_GENERATOR ('PAL_RG', 'AFLPAL', 'LOGISTICREGRESSION', "TRAINING"."PAL_RG_SIGNATURE");

 

CREATECOLUMNTABLE PAL_RG_DATA LIKE PAL_T_RG_DATA;

 

INSERTINTO PAL_RG_DATA VALUES ('1001','50000','45','1');

INSERTINTO PAL_RG_DATA VALUES ('1002','20000','20','0');

INSERTINTO PAL_RG_DATA VALUES ('1003','25000','23','0');

INSERTINTO PAL_RG_DATA VALUES ('1004','40000','47','1');

INSERTINTO PAL_RG_DATA VALUES ('1005','35000','35','0');

INSERTINTO PAL_RG_DATA VALUES ('1006','75000','50','1');

INSERTINTO PAL_RG_DATA VALUES ('1007','60000','50','1');

INSERTINTO PAL_RG_DATA VALUES ('1008','55000','65','1');

INSERTINTO PAL_RG_DATA VALUES ('1009','20000','20','0');

INSERTINTO PAL_RG_DATA VALUES ('1010','20000','23','0');

 

SELECT * FROM PAL_RG_DATA;

 

CREATECOLUMNTABLE RG_PARAMS LIKE PAL_T_RG_PARAMS;

CREATECOLUMNTABLE RG_COEFF LIKE PAL_T_RG_COEFF;

CREATECOLUMNTABLE RG_PMML LIKE PAL_T_RG_PMML;

 

INSERTINTO RG_PARAMS VALUES ('THREAD_NUMBER', 4, null, null);

INSERTINTO RG_PARAMS VALUES ('MAX_ITERATION', 100, null, null);

INSERTINTO RG_PARAMS VALUES ('EXIT_THRESHOLD', null, 0.00001, null);

INSERTINTO RG_PARAMS VALUES ('VARIABLE_NUM', 2, null, null);

INSERTINTO RG_PARAMS VALUES ('METHOD', 0, null, null);

INSERTINTO RG_PARAMS VALUES ('PMML_EXPORT', 2, null, null);

INSERTINTO RG_PARAMS VALUES ('CATEGORY_COL', 3, null, null);

 

SELECT * FROM RG_PARAMS;

 

TRUNCATETABLE RG_COEFF;

TRUNCATETABLE RG_PMML;

 

CALL _SYS_AFL.PAL_RG (PAL_RG_DATA, RG_PARAMS, RG_COEFF, RG_PMML) WITH OVERVIEW;

 

SELECT * FROM RG_COEFF;

SELECT * FROM RG_PMML;

--Code ends

Logistic regression coefficient table output:

Logistic regression PMML table output:

 

We have another 10 set of records for which prediction should be made whether a customer buys or not.

This is fed into forecast with logistic regression along with the output coefficient table from logistic regression function. Below is code for the same:

--FORECAST/PREDICTION WITH LOGISTIC REGRESSION code starts

 

DROPTYPE TRAINING.PAL_T_FRG_PREDICT;

DROPTYPE TRAINING.PAL_T_FRG_CONTROL;

DROPTYPE TRAINING.PAL_T_FRG_COEFF;

DROPTYPE TRAINING.PAL_T_FRG_FITTED;

CREATETYPE PAL_T_FRG_PREDICT ASTABLE ("CUSTID"INTEGER, "SALARY"DOUBLE, "AGE"INTEGER);

CREATETYPE PAL_T_FRG_CONTROL ASTABLE (NAME VARCHAR(60), INTARGS INTEGER, DOUBLEARGS DOUBLE, STRINGARGS VARCHAR (100));

CREATETYPE PAL_T_FRG_COEFF ASTABLE (ID INTEGER, AI VARCHAR(5000));

CREATETYPE PAL_T_FRG_FITTED ASTABLE ("ID"INTEGER, "FITTED"DOUBLE, "TYPE"INTEGER);

 

DROPTABLE TRAINING.PAL_FRG_SIGN;

CREATECOLUMNTABLE PAL_FRG_SIGN ("ID"INTEGER, "TYPENAME"VARCHAR(100), "DIRECTION"VARCHAR(100));

TRUNCATETABLE TRAINING.PAL_FRG_SIGN;

 

INSERTINTO PAL_FRG_SIGN VALUES ('1','TRAINING.PAL_T_FRG_PREDICT','IN');

INSERTINTO PAL_FRG_SIGN VALUES ('2','TRAINING.PAL_T_FRG_CONTROL','IN');

INSERTINTO PAL_FRG_SIGN VALUES ('3','TRAINING.PAL_T_FRG_COEFF','IN');

INSERTINTO PAL_FRG_SIGN VALUES ('4','TRAINING.PAL_T_FRG_FITTED','OUT');

 

GRANTSELECTON TRAINING.PAL_FRG_SIGN TOSYSTEM;

 

CALLSYSTEM.AFL_WRAPPER_ERASER('PAL_FRLGR_PROC');

CALLSYSTEM.AFL_WRAPPER_GENERATOR('PAL_FRLGR_PROC','AFLPAL','FORECASTWITHLOGISTICR',TRAINING.PAL_FRG_SIGN);

 

DROPTABLE TRAINING.PAL_FRG_PREDICT;

CREATECOLUMNTABLE PAL_FRG_PREDICT LIKE TRAINING.PAL_T_FRG_PREDICT;

TRUNCATETABLE TRAINING.PAL_FRG_PREDICT;

 

INSERTINTO PAL_FRG_PREDICT VALUES ('1011','48000','44');

INSERTINTO PAL_FRG_PREDICT VALUES ('1012','18000','22');

INSERTINTO PAL_FRG_PREDICT VALUES ('1013','28000','25');

INSERTINTO PAL_FRG_PREDICT VALUES ('1014','35000','30');

INSERTINTO PAL_FRG_PREDICT VALUES ('1015','50000','50');

INSERTINTO PAL_FRG_PREDICT VALUES ('1016','25000','27');

INSERTINTO PAL_FRG_PREDICT VALUES ('1017','50000','52');

INSERTINTO PAL_FRG_PREDICT VALUES ('1018','70000','67');

INSERTINTO PAL_FRG_PREDICT VALUES ('1019','40000','47');

INSERTINTO PAL_FRG_PREDICT VALUES ('1020','25000','42');

 

DROPTABLE TRAINING.PAL_FRG_CONTROL;

CREATECOLUMNTABLE PAL_FRG_CONTROL LIKE PAL_T_FRG_CONTROL;

TRUNCATETABLE TRAINING.PAL_FRG_CONTROL;

INSERTINTO PAL_FRG_CONTROL VALUES ('THREAD_NUMBER',8,null,null);

INSERTINTO PAL_FRG_CONTROL VALUES ('CATEGORY_COL',3,null,null);

INSERTINTO PAL_FRG_CONTROL VALUES ('MODEL_FORMAT',1,null,null);

 

DROPTABLE TRAINING.PAL_FRG_COEFF;

CREATECOLUMNTABLE PAL_FRG_COEFF LIKE PAL_T_FRG_COEFF;

TRUNCATETABLE TRAINING.PAL_FRG_COEFF;

INSERTINTO TRAINING.PAL_FRG_COEFF SELECT * FROM TRAINING.RG_PMML;

 

DROPTABLE TRAINING.PAL_FRG_FITTED;

CREATECOLUMNTABLE PAL_FRG_FITTED LIKE PAL_T_FRG_FITTED;

TRUNCATETABLE TRAINING.PAL_FRG_FITTED;

 

CALL _SYS_AFL.PAL_FRLGR_PROC (TRAINING.PAL_FRG_PREDICT,TRAINING.PAL_FRG_CONTROL,TRAINING.PAL_FRG_COEFF,TRAINING.PAL_FRG_FITTED) WITH OVERVIEW;

 

SELECT * FROM TRAINING.PAL_FRG_FITTED;

--Code ends


Predicted/Fitted table output from Forecast with logistic regression function:

Expected Prediction is:

CUSTID

SALARY

AGE

Expected BOUGHT

1011

48000

44

1

1012

18000

22

0

1013

28000

25

0

1014

35000

30

0

1015

50000

50

1

1016

25000

27

0

1017

50000

52

1

1018

70000

67

1

1019

40000

47

1

1020

25000

42

1

 

The prediction by logistic regression/forecast with logistic regression is not correct.

Can anybody help in this on how to achieve the correct prediction using logistic regression.

 

Thanks and Regards,

M.N.Adinarayanan


Viewing all articles
Browse latest Browse all 6412

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>