Yet another study of Sazae-san Janken

  • ????????????????????

素晴らしいウェブサイトを見つけたので、二番煎じでこっそりやってみました。


R。


################################
# Sazae-san Janken analysis
################################

################
# Making data
X <- read.table("sazae.txt",sep="\t",header=F)
levels(X$V3) <- c("g","c","p","na")
X <- subset(X,V3!="na")  # Remove NA

expander <- function(X,num.history){
  n <- length(X)
  sapply(1:num.history,function(k) X[k:(n-num.history+k)])
}

Y <- data.frame(expander(X$V3,10))

 

######################################
# Fundamental analysis of Janken data
######################################

##########################################
# Is the Gu-Choki-Pa share is biased?
chisq.test(table(Y$X1))
# Chi-squared test for given probabilities
#
# data: table(Y$X1) 
# X-squared = 0.8767, df = 2, p-value = 0.6451

# => We cannot say the share is biased.


################################
# Depends on last week?
chisq.test(table(Y$X1,Y$X2))
## Pearson's Chi-squared test
##
## data: table(Y$X1, Y$X2)
## X-squared = 75.6329, df = 4, p-value = 1.464e-15

# => Yes!!

#################
# Show table
table(Y$X1,Y$X2)
##     c   g   p
## c  70 139 145
## g 135  73 122
## p 149 119  70

# => Same hands does not occur successively!!



###############################
# Comparing with 2-weeks ago
table(Y$X1,Y$X3)
##     c   g   p
## c  87 136 131
## g 124  67 139
## p 144 128  66

##############################
# Comparing with 3-weeks ago
table(Y$X1,Y$X4)
#     c   g   p
# c 101 124 129
# g 129  81 120
# p 125 125  88

#############################
# Comparing with 4-weeks ago
table(Y$X1,Y$X5)
# c g p
# c 138 114 102
# g 114 110 106
# p 102 107 129

# => It seems that data at 1~3-weeks ago effective for prediciton??



###########################
# FFT analysis
signal <- as.integer(as.factor(Y$X1))
signal <- signal - mean(signal)
plot(abs(fft(signal))^2,type="l",main="Power spectrum of Janken signal")
dev2bitmap(file="janken-spectrum.jpg",taa=4,gaa=4)
which.max(abs(fft(signal))[1:500])
 
# => Very peaky at index 244, this indicates that the period is about 4...
# => Not completely random, HITONO-TSUKURISHI-MONO..



###################################
# Prediction via machine learning
###################################

#########
# SVM
library(e1071)
svm.fit <- svm(X1~X2+X3+X4,data=Y,cross=20)
summary(svm.fit)

######################################################################
## Result 
##-------------------------------------------------------------------
##
## Call:
## svm(formula = X1 ~ X2 + X3 + X4, data = Y, cross = 20)
##
## Parameters:
## SVM-Type: C-classification 
## SVM-Kernel: radial 
## cost: 1 
## gamma: 0.1428571 
##
## Number of Support Vectors: 898   ## <= Too many support vectors!!! ;(
##
## ( 287 304 307 )
##
##
## Number of Classes: 3 
##
## Levels: 
## c g p
##
## 20-fold cross-validation on training data:
##
## Total Accuracy: 53.7182   # <= This is estimated generalized error!(%)
## Single Accuracies:
## 64.70588 43.13725 60.78431 56.86275 47.05882 62.7451 54.90196 41.17647
## 54.90196 59.61538 49.01961 54.90196 60.78431 52.94118 54.90196 47.05882
## 58.82353 62.7451 37.2549 50

 

################
# random forest
library(randomForest)
cross <- 20
result <- rep(0,cross)
nsamp <- length(Y$X1)
id.train <- sample(1:cross,nsamp,T)
for(i in 1:cross){
  Y.train <- Y[id.train!=i,]
  Y.test <- Y[id.train==i,]
  rf.fit <- randomForest(X1~X2+X3+X4,data=Y.train)
  result[i] <- sum(predict(rf.fit,Y.test)==Y.test$X1)/length(Y.test$X1)
}

result

###########################################################
## 20-fold CV result
##---------------------------------------------------------
##
## [1] 0.3454545 0.5918367 0.6122449 0.5294118 0.5111111 0.4629630 0.5967742
## [8] 0.6200000 0.5454545 0.5660377 0.4864865 0.6521739 0.5967742 0.5000000
## [15] 0.4754098 0.5636364 0.5454545 0.5263158 0.5192308 0.5510204

mean(result)

####################################################
## Generalized error result 
##--------------------------------------------------
## [1] 0.5398895


################################################
# How many variables should be included?
f <- list()
f[[1]] <- X1 ~ X2
f[[2]] <- X1 ~ X2 + X3
f[[3]] <- X1 ~ X2 + X3 + X4
f[[4]] <- X1 ~ X2 + X3 + X4 + X5
f[[5]] <- X1 ~ X2 + X3 + X4 + X5 + X6
f[[6]] <- X1 ~ X2 + X3 + X4 + X5 + X6 + X7
f[[7]] <- X1 ~ X2 + X3 + X4 + X5 + X6 + X7 + X8

svm.result <- lapply(f, function(formula){
  svm.fit.tmp <- svm(formula,data=Y,cross=20)
  list(formula,summary(svm.fit.tmp))
})

sapply(svm.result,function(x){
  paste(round(x[[2]]$tot.accuracy,3),
        paste(x[[1]],collapse=""),
        sep="% : ")
})

## [1] "42.368% : X1`~`X2"
## [2] "49.119% : X1`~`X2 + X3"
## [3] "53.718% : X1`~`X2 + X3 + X4" # <= Finest accuracy!!
## [4] "50.098% : X1`~`X2 + X3 + X4 + X5"
## [5] "51.174% : X1`~`X2 + X3 + X4 + X5 + X6"
## [6] "50.587% : X1`~`X2 + X3 + X4 + X5 + X6 + X7"
## [7] "50.294% : X1`~`X2 + X3 + X4 + X5 + X6 + X7 + X8"


#############################################
# Finally, lets make a janken recipe!!
janken <- c("g","c","p")
tbl <- expand.grid(janken,janken,janken)
colnames(tbl) <- paste("X",2:4,sep="")
recipe <- cbind(tbl,predict=predict(svm.fit,tbl))
colnames(recipe) <- c("Last week","2 weeks ago","3 weeks ago","Prediction")

####################################################
##
## Janken recipe!!
## -- You can win 53.7% of the time!!
## -- Accuracy is calculated by cross-validation
## -- This is SVM result.
##
##--------------------------------------------------
##    Last week 2 weeks ago 3 weeks ago Prediction
## 1          g           g           g          p
## 2          c           g           g          p
## 3          p           g           g          g
## 4          g           c           g          p
## 5          c           c           g          c
## 6          p           c           g          c
## 7          g           p           g          g
## 8          c           p           g          c
## 9          p           p           g          g
## 10         g           g           c          p
## 11         c           g           c          p
## 12         p           g           c          c
## 13         g           c           c          p
## 14         c           c           c          c
## 15         p           c           c          c
## 16         g           p           c          p
## 17         c           p           c          c
## 18         p           p           c          c
## 19         g           g           p          g
## 20         c           g           p          p
## 21         p           g           p          g
## 22         g           c           p          g
## 23         c           c           p          c
## 24         p           c           p          c
## 25         g           p           p          g
## 26         c           p           p          c
## 27         p           p           p          g
###################################################


怪しげなジャンケンスペクトル。244のところにピークがあるから、だいたい周期は4というところだろうか。これは変数選択の結果ともあっている。周期が4というところがいかにも人が「ランダムっぽく」設定している、ということかもしれない。

ところで周期性の検定ってどうやるんだろう?

はてなブックマーク - Yet another study of Sazae-san Janken
Pocket