Axis Bank Stock Data Analysis Project Blog Post
Data Analysis of axis bank stock market time-series dataset.
- AxisBank Stock Data Analysis
- Downloading the Dataset
- Data Preparation and Cleaning
- Summary of the operations done till now:
- Exploratory Analysis and Visualization
- Summary of above exploratory Analysis:
- Exploratory Analysis of stock quotes year-wise for Axis Bank:
- Asking and Answering Questions
- Inferences and Conclusion
- References and Future Work
AxisBank Stock Data Analysis
The project is based on the dataset I obtained from kaggle. The Analysis I am performing is on the 'AXISBANK' stock market data from 2019-2021.AXISBANK is one of the stocks listed in NIFTY50 index. The NIFTY 50 is a benchmark Indian stock market index that represents the weighted average of 50 of the largest Indian companies listed on the National Stock Exchange. It is one of the two main stock indices used in India, the other being the BSE SENSEX. The Analysis is performed on the stock quote data of "AXIS BANK" from the dataset of NIFTY50 Stock Market data obtained from kaggle repo.
Axis Bank Limited, formerly known as UTI Bank (1993–2007), is an Indian banking and financial services company headquartered in Mumbai, Maharashtra.It sells financial services to large and mid-size companies, SMEs and retail businesses.
The bank was founded on 3 December 1993 as UTI Bank, opening its registered office in Ahmedabad and a corporate office in Mumbai. The bank was promoted jointly by the Administrator of the Unit Trust of India (UTI), Life Insurance Corporation of India (LIC), General Insurance Corporation, National Insurance Company, The New India Assurance Company, The Oriental Insurance Corporation and United India Insurance Company. The first branch was inaugurated on 2 April 1994 in Ahmedabad by Manmohan Singh, then finance minister of India \ I chose this dataset because of the importance of NIFTY50 listed stocks on Indian economy. In most ways the NIFTY50 presents how well the Indian capital markets are doing.
Downloading the Dataset
In this section of the Jupyter notebook we are going to download an interesting data set from kaggle dataset repositories. We are using python library called OpenDatasets for downloading from kaggle. While downloading we are asked for kaggle user id and API token key for accessing the dataset from kaggle. Kaggle is a platform used for obtaining datasets and various other datascience tasks.
!pip install jovian opendatasets --upgrade --quiet
Let's begin by downloading the data, and listing the files within the dataset.
dataset_url = 'https://www.kaggle.com/rohanrao/nifty50-stock-market-data'
import opendatasets as od
od.download(dataset_url)
The dataset has been downloaded and extracted.
data_dir = './nifty50-stock-market-data'
import os
os.listdir(data_dir)
Let us save and upload our work to Jovian before continuing.
project_name = "nifty50-stockmarket-data" # change this (use lowercase letters and hyphens only)
Data Preparation and Cleaning
Data Preparation and Cleansing constitutes the first part of the Data Analysis project for any dataset. We do this process inorder to obtain retain valuable data from the data frame, one that is relevant for our analysis. The process is also used to remove erroneous values from the dataset(ex. NaN to 0). After the preparation of data and cleansing, the data can be used for analysis.</br> In our dataframe we have a lot of non-releavant information, so we are going to drop few columns in the dataframe and fix some of the elements in data frame for better analysis. We are also going to change the Date column into DateTime format which can be further used to group the data by months/year.
import pandas as pd
import numpy as np
axis_df= pd.read_csv(data_dir + "/AXISBANK.csv")
axis_df.info()
axis_df.describe()
axis_df
axis_df['Symbol'] = np.where(axis_df['Symbol'] == 'UTIBANK', 'AXISBANK', axis_df['Symbol'])
axis_df
axis_new_df = axis_df.drop(['Last','Series', 'VWAP', 'Trades','Deliverable Volume','%Deliverble'], axis=1)
axis_new_df
def getIndexes(dfObj, value):
''' Get index positions of value in dataframe i.e. dfObj.'''
listOfPos = list()
# Get bool dataframe with True at positions where the given value exists
result = dfObj.isin([value])
# Get list of columns that contains the value
seriesObj = result.any()
columnNames = list(seriesObj[seriesObj == True].index)
# Iterate over list of columns and fetch the rows indexes where value exists
for col in columnNames:
rows = list(result[col][result[col] == True].index)
for row in rows:
listOfPos.append((row, col))
# Return a list of tuples indicating the positions of value in the dataframe
return listOfPos
listOfPosition_axis = getIndexes(axis_df, '2019-01-01')
listOfPosition_axis
axis_new_df.drop(axis_new_df.loc[0:4728].index, inplace = True)
axis_new_df
Summary of the operations done till now:
- we have taken a csv file containing stock data of AXIS BANK from the data set of nifty50 stocks and performed data cleansing operations on them.</br>
- Originally, the data from the data set is noticed as stock price quotations from the year 2001 but for our analysis we have taken data for the years 2019-2021</br>
- Then we have dropped the columns that are not relevant for our analysis by using pandas dataframe operations.
axis_new_df.reset_index(drop=True, inplace=True)
axis_new_df
axis_new_df['Date'] = pd.to_datetime(axis_new_df['Date']) # we changed the Dates into Datetime format from the object format
axis_new_df.info()
axis_new_df['Daily Lag'] = axis_new_df['Close'].shift(1) # Added a new column Daily Lag to calculate daily returns of the stock
axis_new_df['Daily Returns'] = (axis_new_df['Daily Lag']/axis_new_df['Close']) -1
axis_dailyret_df = axis_new_df.drop(['Prev Close', 'Open','High', 'Low','Close','Daily Lag'], axis = 1)
axis_dailyret_df
Exploratory Analysis and Visualization
Here we compute the mean, max/min stock quotes of the stock AXISBANK. We specifically compute the mean of the Daily returns column. we are going to do the analysis by first converting the index datewise to month wise to have a good consolidated dataframe to analyze in broad timeline. we are going to divide the data frame into three for the years 2019, 2020, 2021 respectively, in order to analyze the yearly performance of the stock.
Let's begin by importingmatplotlib.pyplot
and seaborn
.
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 10
matplotlib.rcParams['figure.figsize'] = (15, 5)
matplotlib.rcParams['figure.facecolor'] = '#00000000'
Here we are going to explore the daily Returns column by plotting a line graph of daily returns v/s Months. Now we can see that daily returns are growing across months in the years 2019-2021.
plt.style.use('fivethirtyeight')
axis_dailyret_plot=axis_dailyret_df.groupby(axis_dailyret_df['Date'].dt.strftime('%B'))['Daily Returns'].sum().sort_values()
plt.plot(axis_dailyret_plot)
axis_new_df['Year'] = pd.DatetimeIndex(axis_new_df['Date']).year
axis_new_df
axis2019_df = axis_new_df[axis_new_df.Year == 2019 ]
axis2020_df = axis_new_df[axis_new_df.Year == 2020 ]
axis2021_df = axis_new_df[axis_new_df.Year == 2021 ]
axis2019_df.reset_index(drop = True, inplace = True)
axis2019_df
axis2020_df.reset_index(drop = True, inplace = True)
axis2020_df
axis2021_df.reset_index(drop=True, inplace=True)
axis2021_df
Summary of above exploratory Analysis:
In the above code cells, we performed plotting of the data by exploring a column from the data. We have divided the DataFrame into three data frames containing the stock quote data from year-wise i.e., for the years 2019, 2020, 2021. For dividing the DataFrame year-wise we have added a new column called 'Year' which is generated from the DataTime values of the column "Date".
axis_range_df = axis_dailyret_df['Daily Returns'].max() - axis_dailyret_df['Daily Returns'].min()
axis_range_df
axis_mean_df = axis_dailyret_df['Daily Returns'].mean()
axis_mean_df
In the above two code cells, we have computed the range i.e. the difference between maximum and minimum value of the column. We have also calculated the mean of the daily returns of the Axis Bank stock.
Exploratory Analysis of stock quotes year-wise for Axis Bank:
In this section we have plotted the Closing values of the stock throughout the year for the years 2019,2020,2021. We have only partial data for 2021(i.e. till Apr 2021). We have also done a plot to compare the performance throughout the year for the years 2019 and 2020(since we had full data for the respective years).
plt.style.use('fivethirtyeight')
plt.plot(axis2019_df['Date'],axis2019_df['Close'] )
plt.title('Closing Values of stock for the year 2019')
plt.xlabel(None)
plt.ylabel('Closing price of the stock')
plt.plot(axis2020_df['Date'],axis2020_df['Close'])
plt.title('Closing Values of stock for the year 2020')
plt.xlabel(None)
plt.ylabel('Closing price of the stock')
plt.plot(axis2021_df['Date'],axis2021_df['Close'])
plt.title('Closing Values of stock for the year 2021 Till April Month')
plt.xlabel(None)
plt.ylabel('Closing price of the stock')
TODO - Explore one or more columns by plotting a graph below, and add some explanation about it
plt.style.use('fivethirtyeight')
plt.plot(axis2019_df['Date'], axis2019_df['Close'],linewidth=3, label = '2019')
plt.plot(axis2020_df["Date"],axis2020_df['Close'],linewidth=3, label = '2020')
plt.legend(loc='best' )
plt.title('Closing Values of stock for the years 2019 and 2020')
plt.xlabel(None)
plt.ylabel('Closing price of the stock')
print(plt.style.available)
Let us save and upload our work to Jovian before continuing
Asking and Answering Questions
In this section, we are going to answer some of the questions regarding the dataset using various data analysis libraries like Numpy, Pandas, Matplotlib and seaborn. By using the tools we can see how useful the libraries come in handy while doing Inference on a dataset.
plt.plot(axis2019_df['Date'], axis2019_df['Close'],linewidth=3, label = '2019')
plt.plot(axis2020_df["Date"],axis2020_df['Close'],linewidth=3, label = '2020')
plt.plot(axis2021_df["Date"], axis2021_df['Close'],linewidth = 3, label = '2021')
plt.legend(loc='best' )
plt.title('Closing Price of stock for the years 2019-2021(Till April)')
plt.xlabel(None)
plt.ylabel('Closing price of the stock')
print('The Maximum closing price of the stock during 2019-2021 is',axis_new_df['Close'].max())
print('The Minimum closing price of the stock during 2019-2021 is',axis_new_df['Close'].min())
print('The Index for the Maximum closing price in the dataframe is',getIndexes(axis_new_df, axis_new_df['Close'].max()))
print('The Index for the Minimum closing price in the dataframe is',getIndexes(axis_new_df, axis_new_df['Close'].min()))
print(axis_new_df.iloc[104])
print(axis_new_df.iloc[303])
- As we can see from the above one of the two plots there was a dip in the closing price during the year 2020. The Maximum Closing price occurred on 2019-06-04(Close = 822.8). The lowest of closing price during the years occurred on 2020-03-24(Close = 303.15). This can say that the start of the pandemic has caused the steep down curve for the stock's closing price.
plt.plot(axis2019_df["Date"],axis2019_df["Volume"],linewidth=2, label = '2019')
plt.plot(axis2020_df["Date"],axis2020_df["Volume"],linewidth=2, label = '2020')
plt.plot(axis2021_df["Date"],axis2021_df["Volume"],linewidth=2, label = '2021')
plt.legend(loc='best')
plt.title('Volume of stock traded in the years 2019-2021(till April)')
plt.ylabel('Volume')
plt.xlabel(None)
print('The Maximum volume of the stock traded during 2019-2021 is',axis_new_df['Volume'].max())
print('The Minimum volume of the stock traded during 2019-2021 is',axis_new_df['Volume'].min())
print('The Index for the Maximum volume stock traded in the dataframe is',getIndexes(axis_new_df, axis_new_df['Volume'].max()))
print('The Index for the Minimum volume stock traded in the dataframe is',getIndexes(axis_new_df, axis_new_df['Volume'].min()))
print(axis_new_df.iloc[357])
print(axis_new_df.iloc[200])
As we can see from the above graph a lot of volume of trade happened during 2020. That means the stock was transacted a lot during the year 2020. The highest Volumed of stock is traded on 2020-06-16(Volume =96190274) and the Minimum volume of the stock traded during 2019-2021 is on 2019-10-27(Volume = 965772)
Q2: What was the daily return of the stock on average?
The daily return measures the price change in a stock's price as a percentage of the previous day's closing price. A positive return means the stock has grown in value, while a negative return means it has lost value. we will also attempt to calculate the maximum daily return of the stock during 2019-2021.
plt.plot(axis_new_df['Date'],axis_new_df['Daily Returns'], linewidth=2 ,label = 'Daily Returns')
plt.legend(loc='best' )
plt.title('Daily Returns of stock for the years 2019-2021(Till April)')
plt.xlabel(None)
plt.ylabel('Daily Returns of the stock')
plt.plot(axis_new_df['Date'],axis_new_df['Daily Returns'], linestyle='--', marker='o')
plt.title('Daily Returns of stock for the years 2019-2021(Till April)')
plt.xlabel(None)
plt.ylabel('Daily Returns of the stock')
print('The Maximum daily return during the years 2020 is',axis_new_df['Daily Returns'].max())
index = getIndexes(axis_new_df, axis_new_df['Daily Returns'].max())
axis_new_df.iloc[302]
def getIndexes(dfObj, value):
''' Get index positions of value in dataframe i.e. dfObj.'''
listOfPos = list()
# Get bool dataframe with True at positions where the given value exists
result = dfObj.isin([value])
# Get list of columns that contains the value
seriesObj = result.any()
columnNames = list(seriesObj[seriesObj == True].index)
# Iterate over list of columns and fetch the rows indexes where value exists
for col in columnNames:
rows = list(result[col][result[col] == True].index)
for row in rows:
listOfPos.append((row, col))
# Return a list of tuples indicating the positions of value in the dataframe
return listOfPos
As we can see from the plot there were high daily returns for the stock around late March 2020 and then there was ups and downs from April- July 2020 . we can see that the most changes in daily returns occurred during April 2020 - July 2020 and at other times the daily returns were almost flat. The maximum daily returns for the stock during 2019-2021 occurred on 2020-03-23(observed from the pandas table above).
Avgdailyret_2019 =axis2019_df['Daily Returns'].sum()/len(axis2019_df['Daily Returns'])
Avgdailyret_2020 =axis2020_df['Daily Returns'].sum()/len(axis2020_df['Daily Returns'])
Avgdailyret_2021 =axis2021_df['Daily Returns'].sum()/len(axis2021_df['Daily Returns'])
# create a dataset
data_dailyret = {'2019': Avgdailyret_2019, '2020':Avgdailyret_2020, '2021':Avgdailyret_2021}
Years = list(data_dailyret.keys())
Avgdailyret = list(data_dailyret.values())
# plotting a bar chart
plt.figure(figsize=(10, 7))
plt.bar(Years, Avgdailyret, color ='maroon',width = 0.3)
plt.xlabel("Years")
plt.ylabel("Average Daily Returns of the Stock Traded")
plt.title("Average Daily Returns of the Stock over the years 2019-2021(Till April) (in 10^7)")
plt.show()
plt.figure(figsize=(12, 7))
sns.distplot(axis_new_df['Daily Returns'].dropna(), bins=100, color='purple')
plt.title(' Histogram of Daily Returns')
plt.tight_layout()
Avgvol_2019 =axis2019_df['Volume'].sum()/len(axis2019_df['Volume'])
Avgvol_2020 =axis2020_df['Volume'].sum()/len(axis2020_df['Volume'])
Avgvol_2021 =axis2021_df['Volume'].sum()/len(axis2021_df['Volume'])
# create a dataset
data_volume = {'2019': Avgvol_2019, '2020':Avgvol_2020, '2021':Avgvol_2021}
Years = list(data_volume.keys())
AvgVol = list(data_volume.values())
# plotting a bar chart
plt.figure(figsize=(13, 7))
plt.bar(Years, AvgVol, color ='maroon',width = 0.3)
plt.xlabel("Years")
plt.ylabel("Average Volume of the Stock Traded")
plt.title("Average Trading volume of the Stock over the years 2019-2021(Till April) (in 10^7)")
plt.show()
From the above plot we can say that more volume of the Axis Bank stock is traded during the year 2020. We can see a significant rise in the trading volume of the stock from 2019 to 2020.
Avgclose_2019 =axis2019_df['Close'].sum()/len(axis2019_df['Close'])
Avgclose_2020 =axis2020_df['Close'].sum()/len(axis2020_df['Close'])
Avgclose_2021 =axis2021_df['Close'].sum()/len(axis2021_df['Close'])
# create a dataset
data_volume = {'2019': Avgclose_2019, '2020':Avgclose_2020, '2021':Avgclose_2021}
Years = list(data_volume.keys())
AvgClose = list(data_volume.values())
# plotting a bar chart
plt.figure(figsize=(13, 7))
plt.bar(Years, AvgClose, color ='maroon',width = 0.3)
plt.xlabel("Years")
plt.ylabel("Average Closding Price of the Stock Traded")
plt.title("Average Closing price of the Stock over the years 2019-2021(Till April) (in 10^7)")
plt.show()
We have seen the Trading Volume of the stock is more during the year 2020. In contrast, the Year 2020 has the lowest average closing price among the other two. But for the years 2019 and 2021 the Average closing price is almost same, there is not much change in the value.
Inferences and Conclusion
Inferences : The above data analysis is done on the data set of stock quotes for AXIS BANK during the years 2019-2021. From the Analysis we can say that during the year 2020 there has been a lot of unsteady growth, there has been rise in the volume of stock traded on the exchange, that means there has been a lot of transactions of the stock. The stock has seen a swift traffic in buy/sell during the year 2020 and has fallen back to normal in the year 2021. In contrast to the volume of the stock the closing price of the stock has decreased during the year 2020, which can be concluded as the volume of the stock traded has no relation to the price change of the stock(while most people think there can be a correlation among the two values). The price decrease for the stock may have been due to the pandemic rise in India during the year 2020.
References and Future Work
Future Ideas for the Analyis:
- I am planning to go forward with this basic Analysis of the AXISBANK stock quotes and build a Machine Learning model predicting the future stock prices.
- I plan to automate the Data Analysis process for every stock in the NIFTY50 Index by defining reusable functions and automating the Analysis procedures.
- Study more strong correlations between the different quotes of the stock and analyze how and why they are related in that fashion.
REFRENCES/LINKS USED FOR THIS PROJECT :
- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html
- https://stackoverflow.com/questions/16683701/in-pandas-how-to-get-the-index-of-a-known-value
- https://towardsdatascience.com/working-with-datetime-in-pandas-dataframe-663f7af6c587
- https://thispointer.com/python-find-indexes-of-an-element-in-pandas-dataframe/
- https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#timeseries-friendly-merging
- https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
- https://towardsdatascience.com/financial-analytics-exploratory-data-analysis-of-stock-data-d98cbadf98b9
- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transpose.html
- https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.set_index.html
- https://pandas.pydata.org/docs/reference/api/pandas.merge.html
- https://stackoverflow.com/questions/14661701/how-to-drop-a-list-of-rows-from-pandas-dataframe
- https://www.interviewqs.com/ddi-code-snippets/extract-month-year-pandas
- https://stackoverflow.com/questions/18172851/deleting-dataframe-row-in-pandas-based-on-column-value
- https://queirozf.com/entries/matplotlib-examples-displaying-and-configuring-legends
- https://jakevdp.github.io/PythonDataScienceHandbook/04.06-customizing-legends.html
- https://matplotlib.org/stable/tutorials/intermediate/legend_guide.html
- https://matplotlib.org/devdocs/gallery/subplots_axes_and_figures/subplots_demo.html
- https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html
- https://stackoverflow.com/questions/332289/how-do-you-change-the-size-of-figures-drawn-with-matplotlib
- https://www.investopedia.com/articles/investing/093014/stock-quotes-explained.asp
- https://stackoverflow.com/questions/44908383/how-can-i-group-by-month-from-a-datefield-using-python-pandas
- https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.hist.html
- https://note.nkmk.me/en/python-pandas-dataframe-rename/
- https://stackoverflow.com/questions/24748848/pandas-find-the-maximum-range-in-all-the-columns-of-dataframe
- https://stackoverflow.com/questions/29233283/plotting-multiple-lines-in-different-colors-with-pandas-dataframe
- https://jakevdp.github.io/PythonDataScienceHandbook/04.14-visualization-with-seaborn.html
- https://www.geeksforgeeks.org/python-pandas-extracting-rows-using-loc/