201301.ch2.r.mining4_10수업.eng.hwp
?read.table
dataset<-read.csv(file="D:/abc/dm.hw.01.dataset.01.csv")
?glm
Y<-dataset[,1]
X<-as.matrix(dataset[,-1])
length(Y)
dim(X)
plot(X[,1],Y)
plot(X[,10],Y) # 선형모양이 아닌것처럼 보여도 데이터에서 제외하면 안됨
Y<-(Y>0)*1
myfit<-summary(glm(Y~X,family="binomial")) #?glm , ?family 확인하기
myfit
summary(myfit)
glm(formula = Y ~ X, family = "binomial")
Deviance Residuals:
Min 1Q Median 3Q Max
-1.77621 -0.09955 0.00015 0.10863 1.80266
Coefficients:
Estimate Std. Error z value Pr(>|z|) #pr(>z) ---> p밸류
(Intercept) 0.5261 0.4483 1.173 0.240618
XX1 5.1485 1.5909 3.236 0.001211 **
XX2 3.6369 1.0841 3.355 0.000795 ***
XX3 4.9238 1.5072 3.267 0.001087 **
XX4 2.3151 0.9106 2.542 0.011009 *
XX5 2.5618 0.9186 2.789 0.005290 **
XX6 -0.7336 0.5726 -1.281 0.200161
XX7 1.6666 0.7810 2.134 0.032849 *
XX8 1.3549 0.6470 2.094 0.036238 *
XX9 -0.5826 0.5456 -1.068 0.285604
XX10 0.3062 0.4082 0.750 0.453197
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 138.589 on 99 degrees of freedom
Residual deviance: 39.849 on 89 degrees of freedom
AIC: 61.849
Number of Fisher Scoring iterations: 8
names(myfit) # 계수 확인?
myfit$coef
myfit$coef[,1]
X[1,]
B<-myfit$coef[,1]
X0<-c(1,X[1,])
L0<-t(X0)%*%B
exp(L0)/(1+exp(L0))
L1<-cbind(1,X)%*%B
hat.y<-(exp(L1)/(1+exp(L1))>0.5)*1
e<-cbind(Y,hat.y) # Y와 hat.y 비교
e
sen<-sum(hat.y[Y==1])/sum(Y) #1인 것의 비율
spc<-sum(hat.y[Y==0])/(length(Y)-sum(Y))
'프로그래밍, 통계학 > R(데이터마이닝)' 카테고리의 다른 글
14.4.17데이터미아닝 (0) | 2014.04.17 |
---|---|
14.4.15 (0) | 2014.04.15 |
데이터마이닝 팀과제설명, 로지스틱회귀 (0) | 2014.04.08 |
14.4.1 데이터마이닝 (0) | 2014.04.01 |
14.3..27 데이터마이닝 (0) | 2014.03.27 |