Table of Content
- About Time Series Analysis
- Terms Related to Time series
- Predicting Stock Price
Time series analysis is a statistical method to deal with time series data or to find trends in data with respect to time.
Now a question arises what is time series data, it’s basically a data which consist series of particular time periods or intervals taken sequentially.
It consist only one variable and data is taken at various time for that variable.
If the data is taken for more than one variable at same point of time it is know as Cross-sectional data.
Stationary Data : If the mean value of data remains constant over a time interval is called Stationary.
Non-Stationary Data : If mean value of data changes over a time period is called Non-stationary data.
Some examples of time series data are: temperature values, stock price, and price of house over time, etc.
Differencing : It is used to make the series stationary or to de-trend the data.
Exponential smoothing : It is used to predict the next period values based on the historical data. It involves averaging of data such that non-averaging data or component of each individual cancel out each other. It is used to predict the short term prediction.
LSTM stand for Long-short term memory, it is an artificial feed forward and Recurrent Neural Network (RNN) used in deep learning. It is capable of learning order dependencies in sequence prediction problems.
Its take 3 dimensions as input for prediction. It is used for classifying, image processing, video processing, speech recognition and for predictions.
We are going to predict the stock price using multi layer LSTM model
Libraries needed are Pandas, Numpy, Scikit-Learn, Plotly , Tensorflow and Keras.
# Importing Libraries import pandas as pd import numpy as np import matplotlib.pyplot as plt import plotly.io as plt_io import plotly.express as px import plotly.figure_factory as ff from tensorflow import keras from sklearn.preprocessing import MinMaxScaler # read the data file (Download Data from Here) stock_price = pd.read_csv('Stock data.csv', parse_dates=['Date']) stock_price = stock_price.set_index('Date') stock_price.head(5) Open High ... Adj Close Volume Date ... 2020-03-03 38480.890625 38754.238281 ... 38623.699219 10600.0 2020-03-04 38715.718750 38791.699219 ... 38409.480469 15300.0 2020-03-05 38604.250000 38887.800781 ... 38470.609375 13500.0 2020-03-06 37613.960938 37747.070313 ... 37576.621094 19000.0 2020-03-09 36950.199219 36950.199219 ... 35634.949219 18800.0
# Retaining only the closing price stock_price = stock_price[['Close']] stock_price.head(5) Close Date 2020-03-03 38623.699219 2020-03-04 38409.480469 2020-03-05 38470.609375 2020-03-06 37576.621094 2020-03-09 35634.949219
After uploading the data next step is to check whether data have nan values or not and then we plot the close price on graph.
stock_price[stock_price.isnull().any(axis=1)] Open High Low Close Adj Close Volume Date 2020-11-14 NaN NaN NaN NaN NaN NaN 2021-01-01 NaN NaN NaN NaN NaN NaN stock_price=stock_price.dropna()
We got 2 null values on 14-11-2020 and 01-01-2021 but we know that on this days due to National holidays market was closed, so actually it’s not a missing value and we can drop them.
Let’s visualize the price trend of stock. We are using the plotly module, so we are going to change the default svg to browser. Unless you do this your plots wont be visible. You will have to run these patch of code
plt_io.renderers.default='browser' stock_chart=px.line(x = stock_price.index, y = stock_price['Close'], title = "Closing Price Of Stock") stock_chart.show()
Figure 1 : Interactive plot of Stock Data using plotly rendered on a browser
As we can see it’s not a stationary data, it follows a trend and for this type LSTM model is good for predictions.
The next step is to split the data into train and test sets to train over model.
train_rec = int(len(stock_price) * 0.70) train_df = stock_price[0:train_rec] print(train_df.shape) (174, 1) test_df = stock_price[train_rec:] print(test_df.shape) (75, 1)
As our data has very high range of prices so let’s scale down them.
sc = MinMaxScaler(feature_range = (0, 1)) train_scaled = sc.fit_transform(train_df) test_scaled = sc.transform(test_df) def dataset(data , n_features): X, Y = ,  for i in range(len(data)-n_features-1): a = data[i:(i+n_features), 0] X.append(a) Y.append(data[i + n_features, 0]) return np.array(X), np.array(Y) n_features = 3 # It takes previous 3 days stock price to predict next day price X_train, y_train = dataset(train_scaled, n_features) X_test, y_test = dataset(test_scaled, n_features)
We are going to use LSTM algorithm which takes input data as 3-D. So we have to convert data into 3-D.
X_train = X_train.reshape(X_train.shape ,X_train.shape,1) X_test = X_test.reshape(X_test.shape ,X_test.shape,1) print(X_train.shape , y_train.shape , X_test.shape , y_test.shape) (170, 3, 1) (170,) (71, 3, 1) (71,)
Now we will create model with 150 neurons. One output layer for predicting stock price. We are using mean squared error loss function and adam gradient optimizer.
# Create the model inputs = keras.layers.Input(shape=(X_train.shape, X_train.shape)) x = keras.layers.LSTM(150, return_sequences= True)(inputs) x = keras.layers.Dropout(0.3)(x) x = keras.layers.LSTM(150, return_sequences=True)(x) x = keras.layers.Dropout(0.3)(x) x = keras.layers.LSTM(150)(x) outputs = keras.layers.Dense(1, activation='linear')(x) model = keras.Model(inputs=inputs, outputs=outputs) model.compile(optimizer='adam', loss="mean_squared_error", metrics=['mean_squared_error']) model.summary() Model: "model" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) [(None, 3, 1)] 0 _________________________________________________________________ lstm (LSTM) (None, 3, 150) 91200 _________________________________________________________________ dropout (Dropout) (None, 3, 150) 0 _________________________________________________________________ lstm_1 (LSTM) (None, 3, 150) 180600 _________________________________________________________________ dropout_1 (Dropout) (None, 3, 150) 0 _________________________________________________________________ lstm_2 (LSTM) (None, 150) 180600 _________________________________________________________________ dense (Dense) (None, 1) 151 ================================================================= Total params: 452,551 Trainable params: 452,551 Non-trainable params: 0 # Train model history = model.fit(X_train, y_train, epochs = 25 , batch_size = 60, validation_split= 0.2 ) pred = model.predict(X_test) pred = sc.inverse_transform(pred) test_actual = y_test.reshape(X_test.shape , 1) test_actual = sc.inverse_transform(test_actual) # Append the predicted values to the list test_predicted =  for i in pred: test_predicted.append(i) df_predicted = pd.DataFrame() df_predicted['Prediction'] = test_predicted df_predicted['Actual'] = test_actual df_predicted.head(5) Prediction Actual 0 43248.277344 44180.050781 1 43497.429688 43599.960938 2 43822.769531 43882.250000 3 43804.261719 44077.148438 4 43559.601562 44523.019531
We got the predicted price and actual price in a single dataframe let’s compare them by visualizing.
def interactive_plot(df, title): stock_chart = px.line(title = title) for i in df.columns[0:]: stock_chart.add_scatter(x = df.index, y = df[i], name = i) stock_chart.show() interactive_plot(df_predicted, "Stock Price Predicted")
Figure 2 : Plot of Actual vs Predicted Stock Data using plotly
We can clearly see that our model worked good for recent data stamps, we can see that model predicted little higher value as compared to actual stock value. So, in this article we have learnt bout Time Series Analysis , LSTM Model and seen that how to build LSTM model for stock price prediction. And at last we have compared predicted stock price with actual price .Do read this interesting article on MNIST digit classification using Logistic Regression.
About the Author's: