Table of Content

  1. About Time Series Analysis
  2. Terms Related to Time series
  3. Predicting Stock Price
    1. Importing necessary Libraries and  Dataset
    2. Check for NaN value
    3. Model Building
    4. Visualizing  the result
  4. Summary

 

About Time Series Analysis 

Time series analysis is a statistical method to deal with time series data or to find trends in data with respect to time.

Now a question arises what is time series data, it’s basically a data which consist series of particular time periods or intervals taken sequentially.

It consist only one variable and data is taken at various time for that variable.

If the data is taken for more than one variable at same point of time it is know as Cross-sectional data.

 

Terms Related to Time Series

 Stationary Data :  If the mean value of data remains constant over a time interval  is called Stationary.

 Non-Stationary Data : If mean value of data changes over a time period is called Non-stationary data.

Some examples of time series data are: temperature values, stock price, and price of house over time, etc.

Differencing : It is used to make the series stationary or to de-trend the data.

Exponential smoothing : It is used to predict the next period values based on the historical data. It involves averaging of data such that non-averaging data or component of each individual cancel out each other. It is used to predict the short term prediction.

 

Predicting Stock Price Using LSTM Model

LSTM stand for Long-short term memory, it is an artificial feed forward and  Recurrent Neural Network (RNN) used in deep learning.  It is capable of learning order dependencies in sequence prediction problems.

Its take 3 dimensions as input for prediction.  It is used for classifying,  image processing, video processing, speech recognition and for predictions.

We are going to predict the stock price using multi layer LSTM model

 

Importing necessary Libraries and  Dataset

Libraries needed are Pandas, Numpy, Scikit-Learn, Plotly , Tensorflow and Keras.

    # Importing Libraries
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import plotly.io as plt_io
    import plotly.express as px
    import plotly.figure_factory as ff
    from tensorflow import keras
    from sklearn.preprocessing import MinMaxScaler
    
    # read the data file (Download Data from Here)
    stock_price = pd.read_csv('Stock data.csv', parse_dates=['Date'])
    stock_price = stock_price.set_index('Date')
    stock_price.head(5)

                    Open          High  ...     Adj Close   Volume
Date                                    ...                       
2020-03-03  38480.890625  38754.238281  ...  38623.699219  10600.0
2020-03-04  38715.718750  38791.699219  ...  38409.480469  15300.0
2020-03-05  38604.250000  38887.800781  ...  38470.609375  13500.0
2020-03-06  37613.960938  37747.070313  ...  37576.621094  19000.0
2020-03-09  36950.199219  36950.199219  ...  35634.949219  18800.0
    # Retaining only the closing price
    stock_price = stock_price[['Close']]
    stock_price.head(5)

                   Close
Date                    
2020-03-03  38623.699219
2020-03-04  38409.480469
2020-03-05  38470.609375
2020-03-06  37576.621094
2020-03-09  35634.949219

 

Checking for Nan values and trend 

After uploading the data next step is to check whether data have nan values or not and then we plot the close price on graph.

    stock_price[stock_price.isnull().any(axis=1)]

            Open  High  Low  Close  Adj Close  Volume
Date                                                 
2020-11-14   NaN   NaN  NaN    NaN        NaN     NaN
2021-01-01   NaN   NaN  NaN    NaN        NaN     NaN

    stock_price=stock_price.dropna()

We got 2 null values on 14-11-2020 and 01-01-2021 but we know that on this days due to National holidays market was closed, so actually it’s not a missing value and we can drop them.

Let’s visualize the price trend of stock. We are using the plotly module, so we are going to change the default svg to browser. Unless you do this your plots wont be visible. You will have to run these patch of code

    plt_io.renderers.default='browser'
    stock_chart=px.line(x = stock_price.index, y = stock_price['Close'], title = "Closing Price Of Stock")
    stock_chart.show()

Plot of Stock Data using plotly

Figure 1 : Interactive plot of Stock Data using plotly rendered on a browser

 

As we can see it’s not a stationary data, it follows a trend and for this type LSTM model is good for predictions.

The next step is to split the data into train and test sets to train over model.

    train_rec = int(len(stock_price) * 0.70)
    train_df = stock_price[0:train_rec]
    print(train_df.shape)

(174, 1)

    test_df = stock_price[train_rec:]
    print(test_df.shape)

(75, 1)

As our data has very high range of prices so let’s scale down them.

    sc = MinMaxScaler(feature_range = (0, 1))
    train_scaled = sc.fit_transform(train_df)
    test_scaled = sc.transform(test_df)
    
    def dataset(data , n_features):
        X, Y = [], []
        for i in range(len(data)-n_features-1):
            a = data[i:(i+n_features), 0]
            X.append(a)
            Y.append(data[i + n_features, 0])
        return np.array(X), np.array(Y)
    
    n_features = 3 # It takes previous 3 days stock price to predict next day price
    X_train, y_train = dataset(train_scaled, n_features)
    X_test, y_test = dataset(test_scaled, n_features)

Model Building

We are going to use LSTM algorithm which takes input data as 3-D. So we have to convert data into 3-D.

    X_train = X_train.reshape(X_train.shape[0] ,X_train.shape[1],1)
    X_test = X_test.reshape(X_test.shape[0] ,X_test.shape[1],1)
    print(X_train.shape , y_train.shape , X_test.shape , y_test.shape)

(170, 3, 1) (170,) (71, 3, 1) (71,)

Now we will create model with 150 neurons. One output layer for predicting stock price. We are using mean squared error loss function and adam gradient optimizer.

    # Create the model
    inputs = keras.layers.Input(shape=(X_train.shape[1], X_train.shape[2]))
    x = keras.layers.LSTM(150, return_sequences= True)(inputs)
    x = keras.layers.Dropout(0.3)(x)
    x = keras.layers.LSTM(150, return_sequences=True)(x)
    x = keras.layers.Dropout(0.3)(x)
    x = keras.layers.LSTM(150)(x)
    outputs = keras.layers.Dense(1, activation='linear')(x)

    model = keras.Model(inputs=inputs, outputs=outputs)
    model.compile(optimizer='adam', loss="mean_squared_error", metrics=['mean_squared_error'])
    model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 3, 1)]            0         
_________________________________________________________________
lstm (LSTM)                  (None, 3, 150)            91200     
_________________________________________________________________
dropout (Dropout)            (None, 3, 150)            0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 3, 150)            180600    
_________________________________________________________________
dropout_1 (Dropout)          (None, 3, 150)            0         
_________________________________________________________________
lstm_2 (LSTM)                (None, 150)               180600    
_________________________________________________________________
dense (Dense)                (None, 1)                 151       
=================================================================
Total params: 452,551
Trainable params: 452,551
Non-trainable params: 0

    # Train model
    history = model.fit(X_train, y_train, epochs = 25 , batch_size = 60, validation_split= 0.2 )
    pred = model.predict(X_test)
    pred = sc.inverse_transform(pred)
    test_actual = y_test.reshape(X_test.shape[0] , 1)
    test_actual = sc.inverse_transform(test_actual)

    # Append the predicted values to the list
    test_predicted = []
    for i in pred:
      test_predicted.append(i[0])
    df_predicted = pd.DataFrame()
    df_predicted['Prediction'] = test_predicted
    df_predicted['Actual'] = test_actual
    df_predicted.head(5)

     Prediction        Actual
0  43248.277344  44180.050781
1  43497.429688  43599.960938
2  43822.769531  43882.250000
3  43804.261719  44077.148438
4  43559.601562  44523.019531

Visualization

We got the predicted price and actual price in a single dataframe let’s compare them by visualizing.

    def interactive_plot(df, title):
      stock_chart = px.line(title = title)
      for i in df.columns[0:]:
        stock_chart.add_scatter(x = df.index, y = df[i], name = i)
      stock_chart.show()

    interactive_plot(df_predicted, "Stock Price Predicted")

Plot of Actual vs Predicted Stock Data using plotly

Figure 2 : Plot of Actual vs Predicted Stock Data using plotly

 

Summary

We can clearly see that our model worked good for recent data stamps, we can see that model predicted little higher value as compared to actual stock value. So, in this article we have learnt bout Time Series Analysis , LSTM Model and seen that how to build LSTM model for stock price prediction. And at last we have compared predicted stock price with actual price .Do read this interesting article on MNIST digit classification using Logistic Regression.

 

 

About the Author's:

Sachin Kumar Gupta

Sachin, is a Mechanical Engineer and data science enthusiast. He loves to find trend in data and extract useful information from it. He has executed projects on Machine Learning and Deep Learning using Python.

 

Mohan Rai

Mohan Rai is an Alumni of IIM Bangalore , he has completed his MBA from University of Pune and Bachelor of Science (Statistics) from University of Pune. He is a Certified Data Scientist by EMC. Mohan is a learner and has been enriching his experience throughout his career by exposing himself to several opportunities in the capacity of an Advisor, Consultant and a Business Owner. He has more than 18 years’ experience in the field of Analytics and has worked as an Analytics SME on domains ranging from IT, Banking, Construction, Real Estate, Automobile, Component Manufacturing and Retail. His functional scope covers areas including Training, Research, Sales, Market Research, Sales Planning, and Market Strategy.