見出し画像

超簡単Pythonで株価予測(アンサンブル・Voting 利用)機械学習

Pythonでアンサンブル(Voting)学習を利用して翌日の株価の上下予測を超簡単に機械学習

1. ツールインストール

$ pip install scikit-learn pandas-datareader rgf-python xgboost

2. ファイル作成

pred.py

import pandas_datareader as pdr
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.ensemble import VotingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
import xgboost as xgb
from sklearn.naive_bayes import BernoulliNB
from rgf.sklearn import RGFClassifier
from sklearn.neural_network import MLPClassifier

df = pdr.get_data_yahoo("AAPL", "2010-11-01", "2020-11-01")
df["Diff"] = df.Close.diff()
df["SMA_2"] = df.Close.rolling(2).mean()
df["Force_Index"] = df["Close"] * df["Volume"]
df["y"] = df["Diff"].apply(lambda x: 1 if x > 0 else 0).shift(-1)
df = df.drop(
   ["Open", "High", "Low", "Close", "Volume", "Diff", "Adj Close"],
   axis=1,
).dropna()
# print(df)
X = df.drop(["y"], axis=1).values
y = df["y"].values
X_train, X_test, y_train, y_test = train_test_split(
   X,
   y,
   test_size=0.2,
   shuffle=False,
)
estimators = [
   ('xgb', xgb.XGBClassifier()),
   ('lr', LogisticRegression()),
   ('nb', BernoulliNB()),
   ('rgf', RGFClassifier()),
   ('svm', make_pipeline(StandardScaler(), SVC(gamma="auto"))),
   ('mlp', make_pipeline(StandardScaler(), MLPClassifier(random_state=0, shuffle=False))),
]
clf = VotingClassifier(estimators)
clf.fit(
   X_train,
   y_train,
)
y_pred = clf.predict(X_test)
print(accuracy_score(y_test, y_pred))

3. 実行

$ python pred.py

0.5416666666666666

以上、超簡単!

4. 結果


同じデータ、特徴量で、計算した結果、XGBoostDNNLSTMGRURNNLogisticRegressionk-nearest neighborRandomForestBernoulliNBSVMRGFMLPBaggingVotingStackingLightGBMTCNHGBCのうちMLPが最も良いという事に

XGBoost            0.5119047619047619
DNN                0.5496031746031746
LSTM               0.5178571428571429
GRU                0.5138888888888888
RNN                0.5376984126984127
LogisticRegression 0.5496031746031746
k-nearest neighbor 0.5198412698412699
RandomForest       0.49603174603174605
BernoulliNB        0.5496031746031746
SVM                0.5396825396825397
RGF                0.5158730158730159
MLP                0.5694444444444444
Bagging            0.5297619047619048
Voting             0.5416666666666666
Stacking           0.5218253968253969
LightGBM           0.5456349206349206
TCN                0.5198412698412699
HGBC               0.5

5. 参考


いいなと思ったら応援しよう!