This project is a way for users to analyze previous and current stock data of the FAANG companies. In order to carry out a predictive, comparative and quantitative analysis of this data we have extracted relevant information from '.csv' files that have the required stock market information. We have imported various packages to help us carry out different functions which will make analyzing data in a more efficient way. We read the '.csv' files using pandas and extracted information into tables by splitting up the data as per the comma delimiters. Since, the FAANG companies had their IPOs established at different times which is why for the sake of consistency we chose to graph plots only after January, 2013. In this project we also saved the stock data in the form of a pandas dataframe, from these dataframes we pulled out the information that will be relevant in making graphs which can depict important stock information regarding the concerned company. Moreover, we have also tried to carry out hypothesis testing in order to check if all the data fits the line plots and scatter plots that have been made.
We used Kaggle - a popular online website which gives users access to thousands of public datasets. One user Aayush Mishra had uploaded a dataset with all FAANG stock ticks with the features - Date, Open, Close, High, Low, Adj Close and Volume - in a dataset named FAANG- Complete Stock Data. The link for this dataset is provided below:
https://www.kaggle.com/aayushmishra1512/faang-complete-stock-data
The dataset is realiable and we used Yahoo Stocks to verify that the given data's ticks matched perfectly with the actual data recorded by Yahoo Stocks.
Today, stocks are extremely important; they play an integral role in determining the growth of a company and can also shape an individuals wealth. For a very long time people have been investing their money in buying shares of a company and also trading stocks. People exist, who have made a fortune by investing in companies that have grown exponentially in the last few years. However, stocks have had erratic patterns throughout the years, they have suffered during recessions and have also had setbacks due to certain internal or even external factors. Our group decided to work on some kind of stock analysis that would help us and other readers understand more about the stock market, and also be able to visualize the various attributes about a particular stock at a particular time. We also wanted to examine the change in trend in stocks during recessions (2001, 2008) and compare them to the trends today amidst a worldwide pandemic.
We used Plotly - a python tool that allows us to create interactive plots. Some of the functionality is given below:
1) Hovering over any data allows a user to see what is contained in that datapoint on the plot.
2) Click on Compare Data On Hover to compare multiple datapoints on the same plot.
3) Drop Down menus can be used to toggle functionality/visualizations on the plots.
Here, we have imported all required packages and modules that python has to offer and those that will be useful in the project. We are primarily using pandas and pyplot from matplotlib. The functions in these modules provide us with functionality and accessibility that makes our code compact and also helps us to add interactive graphs making the final product visual and understandable by readers.
!pip3 install plotly==4.14.1
import json
import plotly.figure_factory as ff
import plotly.graph_objects as go
import plotly.express as px
import plotly.offline as py
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=False)
import pandas as pd
import numpy as np
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.api as sm
import sklearn
import math
from datetime import datetime, date
from sklearn import preprocessing
from sklearn import datasets
from sklearn import utils
from sklearn import linear_model
from sklearn.metrics import *
from sklearn.preprocessing import *
from statsmodels.formula.api import ols
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import train_test_split
In this part, we add all our collected data to our ipynb file which can then be processed and analyzed.
In this cell below, we have initialized DataFrames for each of the FAANG companies. The files that hold the stock data for each of these companies are ".csv" files which means that the values that we need are stored as comma separated values. What we have done here is create a pandas dataframe for the stock data for each of these companies by reading from their respective ".csv" files.
facebook = pd.read_csv("data/Facebook.csv", sep=',')
apple = pd.read_csv("data/Apple.csv", sep=',')
amazon = pd.read_csv("data/Amazon.csv", sep=',')
netflix = pd.read_csv("data/Netflix.csv", sep=',')
google = pd.read_csv("data/Google.csv", sep=',')
In this part, we process the data, also called Data Cleaning. We change the data time range and format it to our requirements.
When we read the .csv files above the DataFrame had columns where the date of the particular stock was stored in the form of a String. In a pandas dataframe if the date is in the form of a string then python will not be able to use it in graphs or any sort of computations. Hence, we have converted the entire Date column in the DataFrame from a Python String to a Python Datetime object that can be used to create graphs and use these Dates for comparisons and for understanding stock trends on a particular day or range of dates.
facebook['Date'] = pd.to_datetime(facebook['Date'])
apple['Date'] = pd.to_datetime(apple['Date'])
amazon['Date'] = pd.to_datetime(amazon['Date'])
netflix['Date'] = pd.to_datetime(netflix['Date'])
google['Date'] = pd.to_datetime(google['Date'])
Here we start setting up interactive graphs that will depict different, relevant information about stocks. Below, you will notice that we have taken only those rows in each DataFrame that are greater than 2012. We have done this because almost all the companies in FAANG had very different IPO dates. IPOs stand for Initial Public Offerings, which basically means that the stocks of these companies were now open to be bought by individuals.
In order to graph this DataFrame, we first changed each DataFrame to consist of those stock trends from Jan, 2013 to Aug, 2020. After modifying these DataFrames we dropped the index column, thus, making these DataFrames ready to plot.
facebook = facebook[(facebook['Date'].dt.year > 2012) & (facebook['Date'].dt.year < 2021)]
apple = apple[(apple['Date'].dt.year > 2012) & (apple['Date'].dt.year < 2021)]
amazon = amazon[(amazon['Date'].dt.year > 2012) & (amazon['Date'].dt.year < 2021)]
netflix = netflix[(netflix['Date'].dt.year > 2012) & (netflix['Date'].dt.year < 2021)]
google = google[(google['Date'].dt.year > 2012) & (google['Date'].dt.year < 2021)]
facebook = facebook.reset_index(drop=True)
apple = apple.reset_index(drop=True)
amazon = amazon.reset_index(drop=True)
netflix = netflix.reset_index(drop=True)
google = google.reset_index(drop=True)
facebook
The following plots visually attactive and interactive as well. This matrix - like graph has the various stock attributes like open price, close price and volume listed on the left and bottom of the matrix. We have calculated the different correlations by using the .pct_change() function provided to us. Hence, we created a new dataframe that was responsible for storing these values which will then be used to make a matrix plot that will then be filled with colors which determine the correlation between two different companies and their stock attributes. This correlation helps users to understand how a change in different attributes at one company can be noticed in another company in the same sector. We have provided a legend that will allow users to read this graph more efficiently and also understand it better.
corr_df_fb = facebook[['Open', 'Close', 'High', 'Low', 'Adj Close', 'Volume']].copy(deep=True)
retscomp_fb = corr_df_fb.pct_change()
corr_fb = retscomp_fb.corr()
corr_fb
fig = px.imshow(corr_fb)
fig.update_layout(title='Correlation between Features of Facebook Stock')
iplot(fig,show_link=False)
corr_df_ap = apple[['Open', 'Close', 'High', 'Low', 'Adj Close', 'Volume']].copy(deep=True)
retscomp_ap = corr_df_ap.pct_change()
corr_ap = retscomp_ap.corr()
corr_ap
fig = px.imshow(corr_ap)
fig.update_layout(title='Correlation between Features of Apple Stock')
iplot(fig,show_link=False)
corr_df_am = amazon[['Open', 'Close', 'High', 'Low', 'Adj Close', 'Volume']].copy(deep=True)
retscomp_am = corr_df_am.pct_change()
corr_am = retscomp_am.corr()
corr_am
fig = px.imshow(corr_am)
fig.update_layout(title='Correlation between Features of Amazon Stock')
iplot(fig,show_link=False)
corr_df_ne = netflix[['Open', 'Close', 'High', 'Low', 'Adj Close', 'Volume']].copy(deep=True)
retscomp_ne = corr_df_ne.pct_change()
corr_ne = retscomp_ne.corr()
corr_ne
fig = px.imshow(corr_ne)
fig.update_layout(title='Correlation between Features of Netflix Stock')
iplot(fig,show_link=False)
corr_df_go = google[['Open', 'Close', 'High', 'Low', 'Adj Close', 'Volume']].copy(deep=True)
retscomp_go = corr_df_go.pct_change()
corr_go = retscomp_go.corr()
corr_go
fig = px.imshow(corr_go)
fig.update_layout(title='Correlation between Features of Google Stock')
iplot(fig,show_link=False)
facebook['Company'] = ['Facebook']*len(facebook)
apple['Company'] = ['Apple']*len(apple)
amazon['Company'] = ['Amazon']*len(amazon)
netflix['Company'] = ['Netflix']*len(netflix)
google['Company'] = ['Google']*len(google)
frames = [facebook, apple, amazon, netflix, google]
result = pd.concat(frames)
result['Year'] = np.arange(len(result.index))
result['Date'] = pd.to_datetime(result['Date'])
for x, rows in result.iterrows():
result.loc[x, 'Year'] = rows['Date'].year
comp = result.groupby(['Company', 'Year'])
vol_df = pd.DataFrame()
vol = []
company = []
year = []
x = 0
for key,val in comp:
a,b = key
company.append(a)
year.append(b)
vol.append(comp.get_group(key).mean()['Volume'])
vol_df['Company'] = company
vol_df['Year'] = year
vol_df['Volume Mean'] = vol
fig = go.Figure()
avg_vol = vol_df['Volume Mean'].mean()
stand_vol = vol_df['Volume Mean'].std()
vol_df['standard_vol'] = np.arange(len(vol_df.index))
vol_df = vol_df.reset_index(drop=True)
for x, rows in vol_df.iterrows():
vol_df.loc[x, 'standard_vol'] = (rows['Volume Mean'] - avg_vol)/(stand_vol)
avg_close = result.groupby('Date')['Close'].mean()
stand_close = result.groupby('Date')['Close'].std()
stand_close = stand_close.reset_index()
avg_close = avg_close.reset_index()
result['standard_close'] = np.arange(len(result.index))
result = result.reset_index(drop=True)
for x, rows in result.iterrows():
result.loc[x, 'standard_close'] = (rows['Close'] - avg_close[avg_close['Date'] == rows['Date']]['Close']).values/(stand_close[stand_close['Date'] == rows['Date']]['Close']).values
result
In this section we have made five different graphs that will represent five different attributes about each and every FAANG company. These attributes are : Opening, Closing Prices, Volumes and 14, 21, 100 day moving averages. All of these are very important for investors, they are able to determine whether they should buy or sell a stock based on the values of these attributes. We decided to calculate the different moving averages because a lot of the buyers and sellers base their actions on these averages.
We have added another feature to the graph that shows the standardized volumes in the background of the primary scatterplot. The volumes have been scaled in order to help users see the volumes better. We have also added an option in the dropdown menu where users can choose to see the standardized volume histogram in much more detail.
As you can see the graphs in this section are all interactive and visual. We have made a separate plot for all attributes and a user can select which graph he or she wants to study based on their preference.
The graph below depicts the Opening Price, Closing Price, and 14,21 and 100 day moving averages for Facebook. Generally, Stocks with higher volumes have a lower price since there’s more of the same stock. This relation can be seen in the plot, where as volume decreases, price increases. Hence, Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Facebook stocks.
avg_14 = facebook.Close.rolling(window=14, min_periods=1).mean()
avg_21 = facebook.Close.rolling(window=21, min_periods=1).mean()
avg_100 = facebook.Close.rolling(window=100, min_periods=1).mean()
x_fb = facebook['Date']
y_fb = facebook['Open']
z_fb = facebook['Close']
fig = go.Figure()
fig.add_trace(go.Scatter(x=x_fb, y=y_fb, name='Open',
line=dict(color='royalblue', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=z_fb, name = 'Close',
line=dict(color='firebrick', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_14, name = '14 Day Close Avg',
line=dict(color='goldenrod', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_21, name = '21 Day Close Avg',
line=dict(color='orangered', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_100, name = '100 Day Close Avg',
line=dict(color='mediumorchid', width=1.5)))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Facebook']['Year'],
y=vol_df[vol_df['Company'] == 'Facebook']['Volume Mean']/200000, name='Volume (scaled)',
marker_color='slategray', opacity=0.3))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Facebook']['Year'],
y=vol_df[vol_df['Company'] == 'Facebook']['Volume Mean'], name='Volume',
marker_color='slategray', visible='legendonly'))
fig.update_layout(title='Open/Close prices and Volume for Facebook from Jan 2013 to Aug 2020',
xaxis_title='Date',
yaxis_title='Open/Close/Volume', template="plotly_dark")
fig.update_layout(
updatemenus=[
dict(
buttons=list([
dict(label = 'All',
method = 'update',
args = [{'visible': [True, True, True, True, True, True, False]},
{'title': 'All',
'showlegend':True}]),
dict(label = 'Open Price',
method = 'update',
args = [{'visible': [True, False, False, False, False, False, False]},
{'title': 'Open Price',
'showlegend':True}]),
dict(label = 'Close Price',
method = 'update',
args = [{'visible': [False, True, False, False, False, False, False]},
{'title': 'Close Price',
'showlegend':True}]),
dict(label = '14 Day Moving Average',
method = 'update',
args = [{'visible': [False, False, True, False, False, False, False]},
{'title': '14 Day Moving Average',
'showlegend':True}]),
dict(label = '21 Day Moving Average',
method = 'update',
args = [{'visible': [False, False, False, True, False, False, False]},
{'title': '21 Day Moving Average',
'showlegend':True}]),
dict(label = '100 Day Moving Average',
method = 'update',
args = [{'visible': [False, False, False, False, True, False, False]},
{'title': '100 Day Moving Average',
'showlegend':True}]),
dict(label = 'Volume (not scaled)',
method = 'update',
args = [{'visible': [False, False, False, False, False, False, True]},
{'title': '100 Day Moving Average',
'showlegend':True}]),
]),
direction="down",
pad={"r": 10, "t": 10},
showactive=True,
x=0.1,
xanchor="left",
y=1.1,
yanchor="top"
),
]
)
fig.update_layout(
autosize=False,
width=1000,
height=650,)
iplot(fig,show_link=False)
The graph below depicts the Opening Price, Closing Price, and 14,21 and 100 day moving averages for Apple. Generally, Stocks with higher volumes have a lower price since there’s more of the same stock. This relation can be seen in the plot, where as volume decreases, price increases. Hence, Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Apple stocks.
avg_14 = apple.Close.rolling(window=14, min_periods=1).mean()
avg_21 = apple.Close.rolling(window=21, min_periods=1).mean()
avg_100 = apple.Close.rolling(window=100, min_periods=1).mean()
x_ap = apple['Date']
y_ap = apple['Open']
z_ap = apple['Close']
fig = go.Figure()
fig.add_trace(go.Scatter(x=x_ap, y=y_ap, name='Open',
line=dict(color='royalblue', width=1.5)))
fig.add_trace(go.Scatter(x=x_ap, y=z_ap, name = 'Close',
line=dict(color='firebrick', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_14, name = '14 Day Close Avg',
line=dict(color='goldenrod', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_21, name = '21 Day Close Avg',
line=dict(color='orangered', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_100, name = '100 Day Close Avg',
line=dict(color='mediumorchid', width=1.5)))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Apple']['Year'],
y=vol_df[vol_df['Company'] == 'Apple']['Volume Mean']/3500000, name='Volume (scaled)',
marker_color='slategray', opacity=0.3))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Apple']['Year'],
y=vol_df[vol_df['Company'] == 'Apple']['Volume Mean'], name='Volume',
marker_color='slategray', visible='legendonly'))
fig.update_layout(title='Open/Close prices and Volume for Apple from Jan 2013 to Aug 2020',
xaxis_title='Date',
yaxis_title='Open/Close/Volume', template="plotly_dark")
fig.update_layout(
updatemenus=[
dict(
buttons=list([
dict(label = 'All',
method = 'update',
args = [{'visible': [True, True, True, True, True, True, False]},
{'title': 'All',
'showlegend':True}]),
dict(label = 'Open Price',
method = 'update',
args = [{'visible': [True, False, False, False, False, False, False]},
{'title': 'Open Price',
'showlegend':True}]),
dict(label = 'Close Price',
method = 'update',
args = [{'visible': [False, True, False, False, False, False, False]},
{'title': 'Close Price',
'showlegend':True}]),
dict(label = '14 Day Moving Average',
method = 'update',
args = [{'visible': [False, False, True, False, False, False, False]},
{'title': '14 Day Moving Average',
'showlegend':True}]),
dict(label = '21 Day Moving Average',
method = 'update',
args = [{'visible': [False, False, False, True, False, False, False]},
{'title': '21 Day Moving Average',
'showlegend':True}]),
dict(label = '100 Day Moving Average',
method = 'update',
args = [{'visible': [False, False, False, False, True, False, False]},
{'title': '100 Day Moving Average',
'showlegend':True}]),
dict(label = 'Volume (not scaled)',
method = 'update',
args = [{'visible': [False, False, False, False, False, False, True]},
{'title': 'Volume (not scaled)',
'showlegend':True}]),
]),
direction="down",
pad={"r": 10, "t": 10},
showactive=True,
x=0.1,
xanchor="left",
y=1.1,
yanchor="top"
),
]
)
fig.update_layout(
autosize=False,
width=1000,
height=650,)
iplot(fig,show_link=False)
The graph below depicts the Opening Price, Closing Price, and 14,21 and 100 day moving averages for Amazon. Generally, Stocks with higher volumes have a lower price since there’s more of the same stock. This relation can be seen in the plot, where as volume decreases, price increases. Hence, Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Amazon stocks.
avg_14 = amazon.Close.rolling(window=14, min_periods=1).mean()
avg_21 = amazon.Close.rolling(window=21, min_periods=1).mean()
avg_100 = amazon.Close.rolling(window=100, min_periods=1).mean()
x_am = amazon['Date']
y_am = amazon['Open']
z_am = amazon['Close']
fig = go.Figure()
fig.add_trace(go.Scatter(x=x_am, y=y_am, name='Open',
line=dict(color='royalblue', width=1.5)))
fig.add_trace(go.Scatter(x=x_am, y=z_am, name = 'Close',
line=dict(color='firebrick', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_14, name = '14 Day Close Avg',
line=dict(color='goldenrod', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_21, name = '21 Day Close Avg',
line=dict(color='orangered', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_100, name = '100 Day Close Avg',
line=dict(color='mediumorchid', width=1.5)))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Amazon']['Year'],
y=vol_df[vol_df['Company'] == 'Amazon']['Volume Mean']/2000, name='Volume (scaled)',
marker_color='slategray', opacity=0.3))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Amazon']['Year'],
y=vol_df[vol_df['Company'] == 'Amazon']['Volume Mean'], name='Volume',
marker_color='slategray', visible='legendonly'))
fig.update_layout(title='Open/Close prices and Volume for Amazon from Jan 2013 to Aug 2020',
xaxis_title='Date',
yaxis_title='Open/Close/Volume', template="plotly_dark")
fig.update_layout(
updatemenus=[
dict(
buttons=list([
dict(label = 'All',
method = 'update',
args = [{'visible': [True, True, True, True, True, True, False]},
{'title': 'All',
'showlegend':True}]),
dict(label = 'Open Price',
method = 'update',
args = [{'visible': [True, False, False, False, False, False, False]},
{'title': 'Open Price',
'showlegend':True}]),
dict(label = 'Close Price',
method = 'update',
args = [{'visible': [False, True, False, False, False, False, False]},
{'title': 'Close Price',
'showlegend':True}]),
dict(label = '14 Day Moving Average',
method = 'update',
args = [{'visible': [False, False, True, False, False, False, False]},
{'title': '14 Day Moving Average',
'showlegend':True}]),
dict(label = '21 Day Moving Average',
method = 'update',
args = [{'visible': [False, False, False, True, False, False, False]},
{'title': '21 Day Moving Average',
'showlegend':True}]),
dict(label = '100 Day Moving Average',
method = 'update',
args = [{'visible': [False, False, False, False, True, False, False]},
{'title': '100 Day Moving Average',
'showlegend':True}]),
dict(label = 'Volume (not scaled)',
method = 'update',
args = [{'visible': [False, False, False, False, False, False, True]},
{'title': 'Volume (not scaled)',
'showlegend':True}]),
]),
direction="down",
pad={"r": 10, "t": 10},
showactive=True,
x=0.1,
xanchor="left",
y=1.1,
yanchor="top"
),
]
)
fig.update_layout(
autosize=False,
width=1000,
height=650,)
iplot(fig,show_link=False)
The graph below depicts the Opening Price, Closing Price, and 14,21 and 100 day moving averages for Netflix. Generally, Stocks with higher volumes have a lower price since there’s more of the same stock. This relation can be seen in the plot, where as volume decreases, price increases. Hence, Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Netflix stocks.
avg_14 = netflix.Close.rolling(window=14, min_periods=1).mean()
avg_21 = netflix.Close.rolling(window=21, min_periods=1).mean()
avg_100 = netflix.Close.rolling(window=100, min_periods=1).mean()
x_ne = netflix['Date']
y_ne = netflix['Open']
z_ne = netflix['Close']
fig = go.Figure()
fig.add_trace(go.Scatter(x=x_ne, y=y_ne, name='Open',
line=dict(color='royalblue', width=1.5)))
fig.add_trace(go.Scatter(x=x_ne, y=z_ne, name = 'Close',
line=dict(color='firebrick', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_14, name = '14 Day Close Avg',
line=dict(color='goldenrod', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_21, name = '21 Day Close Avg',
line=dict(color='orangered', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_100, name = '100 Day Close Avg',
line=dict(color='mediumorchid', width=1.5)))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Netflix']['Year'],
y=vol_df[vol_df['Company'] == 'Netflix']['Volume Mean']/50000, name='Volume (scaled)',
marker_color='slategray', opacity=0.3))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Netflix']['Year'],
y=vol_df[vol_df['Company'] == 'Netflix']['Volume Mean'], name='Volume',
marker_color='slategray', visible='legendonly'))
fig.update_layout(title='Open/Close prices and Volume for Netflix from Jan 2013 to Aug 2020',
xaxis_title='Date',
yaxis_title='Open/Close/Volume', template="plotly_dark")
fig.update_layout(
updatemenus=[
dict(
buttons=list([
dict(label = 'All',
method = 'update',
args = [{'visible': [True, True, True, True, True, True, False]},
{'title': 'All',
'showlegend':True}]),
dict(label = 'Open Price',
method = 'update',
args = [{'visible': [True, False, False, False, False, False, False]},
{'title': 'Open Price',
'showlegend':True}]),
dict(label = 'Close Price',
method = 'update',
args = [{'visible': [False, True, False, False, False, False, False]},
{'title': 'Close Price',
'showlegend':True}]),
dict(label = '14 Day Moving Average',
method = 'update',
args = [{'visible': [False, False, True, False, False, False, False]},
{'title': '14 Day Moving Average',
'showlegend':True}]),
dict(label = '21 Day Moving Average',
method = 'update',
args = [{'visible': [False, False, False, True, False, False, False]},
{'title': '21 Day Moving Average',
'showlegend':True}]),
dict(label = '100 Day Moving Average',
method = 'update',
args = [{'visible': [False, False, False, False, True, False, False]},
{'title': '100 Day Moving Average',
'showlegend':True}]),
dict(label = 'Volume (not scaled)',
method = 'update',
args = [{'visible': [False, False, False, False, False, False, True]},
{'title': 'Volume (not scaled)',
'showlegend':True}]),
]),
direction="down",
pad={"r": 10, "t": 10},
showactive=True,
x=0.1,
xanchor="left",
y=1.1,
yanchor="top"
),
]
)
fig.update_layout(
autosize=False,
width=1000,
height=650,)
iplot(fig,show_link=False)
The graph below depicts the Opening Price, Closing Price, and 14,21 and 100 day moving averages for Google. Generally, Stocks with higher volumes have a lower price since there’s more of the same stock. This relation can be seen in the plot, where as volume decreases, price increases. Hence, Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Google stocks.
avg_14 = google.Close.rolling(window=14, min_periods=1).mean()
avg_21 = google.Close.rolling(window=21, min_periods=1).mean()
avg_100 = google.Close.rolling(window=100, min_periods=1).mean()
x_go = google['Date']
y_go = google['Open']
z_go = google['Close']
fig = go.Figure()
fig.add_trace(go.Scatter(x=x_go, y=y_go, name='Open',
line=dict(color='royalblue', width=1.5)))
fig.add_trace(go.Scatter(x=x_go, y=z_go, name = 'Close',
line=dict(color='firebrick', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_14, name = '14 Day Close Avg',
line=dict(color='goldenrod', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_21, name = '21 Day Close Avg',
line=dict(color='orangered', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_100, name = '100 Day Close Avg',
line=dict(color='mediumorchid', width=1.5)))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Google']['Year'],
y=vol_df[vol_df['Company'] == 'Google']['Volume Mean']/2000, name='Volume (scaled)',
marker_color='slategray', opacity=0.3))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Google']['Year'],
y=vol_df[vol_df['Company'] == 'Google']['Volume Mean'], name='Volume',
marker_color='slategray', visible='legendonly'))
fig.update_layout(title='Open/Close prices and Volume for Google from Jan 2013 to Aug 2020',
xaxis_title='Date',
yaxis_title='Open/Close/Volume', template="plotly_dark")
fig.update_layout(
updatemenus=[
dict(
buttons=list([
dict(label = 'All',
method = 'update',
args = [{'visible': [True, True, True, True, True, True, False]},
{'title': 'All',
'showlegend':True}]),
dict(label = 'Open Price',
method = 'update',
args = [{'visible': [True, False, False, False, False, False, False]},
{'title': 'Open Price',
'showlegend':True}]),
dict(label = 'Close Price',
method = 'update',
args = [{'visible': [False, True, False, False, False, False, False]},
{'title': 'Close Price',
'showlegend':True}]),
dict(label = '14 Day Moving Average',
method = 'update',
args = [{'visible': [False, False, True, False, False, False, False]},
{'title': '14 Day Moving Average',
'showlegend':True}]),
dict(label = '21 Day Moving Average',
method = 'update',
args = [{'visible': [False, False, False, True, False, False, False]},
{'title': '21 Day Moving Average',
'showlegend':True}]),
dict(label = '100 Day Moving Average',
method = 'update',
args = [{'visible': [False, False, False, False, True, False, False]},
{'title': '100 Day Moving Average',
'showlegend':True}]),
dict(label = 'Volume (not scaled)',
method = 'update',
args = [{'visible': [False, False, False, False, False, False, True]},
{'title': 'Volume (not scaled)',
'showlegend':True}]),
]),
direction="down",
pad={"r": 10, "t": 10},
showactive=True,
x=0.1,
xanchor="left",
y=1.1,
yanchor="top"
),
]
)
fig.update_layout(
autosize=False,
width=1000,
height=650,)
iplot(fig,show_link=False)
df_corr = pd.DataFrame()
df_corr['Facebook'] = facebook['Close']
df_corr['Apple'] = apple['Close']
df_corr['Amazon'] = amazon['Close']
df_corr['Netflix'] = netflix['Close']
df_corr['Google'] = google['Close']
retscomp = df_corr.pct_change()
corr = retscomp.corr()
corr
We plotted the box plots for all the price features (Open, Close, High, low, Adj Close) for all companies to see if they are similar. As we can see from the Box plot, all price features are fairly similar which can give us a good lead into our assumption in the next part (use of close price exclusively for regression and ML).
fig = go.Figure()
fig.add_trace(go.Box(y=facebook.Close, name='Close'))
fig.add_trace(go.Box(y=facebook.Open, name='Open'))
fig.add_trace(go.Box(y=facebook.Low, name='Low'))
fig.add_trace(go.Box(y=facebook.High, name='High'))
fig.add_trace(go.Box(y=facebook['Adj Close'], name='Adj Close'))
fig.add_trace(go.Box(y=apple.Close, name='Close', visible='legendonly'))
fig.add_trace(go.Box(y=apple.Open, name='Open', visible='legendonly'))
fig.add_trace(go.Box(y=apple.Low, name='Low', visible='legendonly'))
fig.add_trace(go.Box(y=apple.High, name='High', visible='legendonly'))
fig.add_trace(go.Box(y=apple['Adj Close'], name='Adj Close', visible='legendonly'))
fig.add_trace(go.Box(y=amazon.Close, name='Close', visible='legendonly'))
fig.add_trace(go.Box(y=amazon.Open, name='Open', visible='legendonly'))
fig.add_trace(go.Box(y=amazon.Low, name='Low', visible='legendonly'))
fig.add_trace(go.Box(y=amazon.High, name='High', visible='legendonly'))
fig.add_trace(go.Box(y=amazon['Adj Close'], name='Adj Close', visible='legendonly'))
fig.add_trace(go.Box(y=netflix.Close, name='Close', visible='legendonly'))
fig.add_trace(go.Box(y=netflix.Open, name='Open', visible='legendonly'))
fig.add_trace(go.Box(y=netflix.Low, name='Low', visible='legendonly'))
fig.add_trace(go.Box(y=netflix.High, name='High', visible='legendonly'))
fig.add_trace(go.Box(y=netflix['Adj Close'], name='Adj Close', visible='legendonly'))
fig.add_trace(go.Box(y=google.Close, name='Close', visible='legendonly'))
fig.add_trace(go.Box(y=google.Open, name='Open', visible='legendonly'))
fig.add_trace(go.Box(y=google.Low, name='Low', visible='legendonly'))
fig.add_trace(go.Box(y=google.High, name='High', visible='legendonly'))
fig.add_trace(go.Box(y=google['Adj Close'], name='Adj Close', visible='legendonly'))
fig.update_layout(title='Close prices for All Companies from Jan 2013 to Aug 2020',
yaxis_title='Close Price', template="plotly_dark")
fig.update_layout(
updatemenus=[
dict(
buttons=list([
dict(label = 'Facebook',
method = 'update',
args = [{'visible': [True, True, True, True, True,
False, False, False, False, False,
False, False, False, False, False,
False, False, False, False, False,
False, False, False, False, False]},
{'title': 'Facebook',
'showlegend':True}]),
dict(label = 'Apple',
method = 'update',
args = [{'visible': [False, False, False, False, False,
True, True, True, True, True,
False, False, False, False, False,
False, False, False, False, False,
False, False, False, False, False]},
{'title': 'Apple',
'showlegend':True}]),
dict(label = 'Amazon',
method = 'update',
args = [{'visible': [False, False, False, False, False,
False, False, False, False, False,
True, True, True, True, True,
False, False, False, False, False,
False, False, False, False, False]},
{'title': 'Amazon',
'showlegend':True}]),
dict(label = 'Netflix',
method = 'update',
args = [{'visible': [False, False, False, False, False,
False, False, False, False, False,
False, False, False, False, False,
True, True, True, True, True,
False, False, False, False, False]},
{'title': 'Netflix',
'showlegend':True}]),
dict(label = 'Gooogle',
method = 'update',
args = [{'visible': [False, False, False, False, False,
False, False, False, False, False,
False, False, False, False, False,
False, False, False, False, False,
True, True, True, True, True]},
{'title': 'Google',
'showlegend':True}]),
]),
direction="down",
pad={"r": 10, "t": 10},
showactive=True,
x=0.1,
xanchor="left",
y=1.1,
yanchor="top"
),
]
)
iplot(fig,show_link=False)
The following plot is extremely visual as well as interactive. This matrix like graph has the companies listen on the left border and the bottom border. We have calculated the closing price correlation by using the .pct_change() function provided to us. Hence, we created a new dataframe that was responsible for storing these values which will then be used to make a matrix plot that will then be filled with colors that determine the correlation between two different companies and their stock prices. This correlation helps users to understand how rise and fall in stock prices in companies can also be used to see the rise and fall in prices for other companies in the same sector. We have provided a legend that will allow users to read this graph more efficiently and also understand it better.
ASSUMPTION:
We noticed that the Open, Close, High and Low prices were fairly similar also, when it comes to actually buying and selling stock, traders normally pick based on close price. This is why from this point forward, we've analyzed stocks based on their Close Price.
fig = px.imshow(corr)
fig.update_layout(title='Correlation between All FAANG Stocks Close Price')
iplot(fig,show_link=False)
The graph below, as you can see, represents the closing prices for the stocks of FAANG companies. One of the most fascinating and useful feature about this graph is the fact that it is interactive. This graph allows the user to select a particular timeframe in the given range of dates and times and find the exact day, date and time what the closing prices of the concerned stock is. Furthermore, if the user finds it a little hard to understand this graph due to five different scatterplots in one, we have added functionality for the user to select the closing price of only one of these stocks which will make it easier to study the given graph. We have selected the closing price of each stock for every month in the years 2013 - 2020, by choosing if from the dataframe we created. After finding the closing value we went ahead and used different functions provided by plotly in order to make this interactive graph.
fig = go.Figure()
fig.add_trace(go.Scatter(x=facebook.Date, y=facebook.Close, name='FB'))
fig.add_trace(go.Scatter(x=apple.Date, y=apple.Close, name='AAPL'))
fig.add_trace(go.Scatter(x=amazon.Date, y=amazon.Close, name='AMZN'))
fig.add_trace(go.Scatter(x=netflix.Date, y=netflix.Close, name='NFLX'))
fig.add_trace(go.Scatter(x=google.Date, y=google.Close, name='GOOG'))
fig.update_layout(title='Close prices for All Companies from Jan 2013 to Aug 2020',
xaxis_title='Date',
yaxis_title='Close Price', template="plotly_dark")
fig.update_layout(
updatemenus=[
dict(
buttons=list([
dict(label = 'All',
method = 'update',
args = [{'visible': [True, True, True, True, True]},
{'title': 'All',
'showlegend':True}]),
dict(label = 'Facebook',
method = 'update',
args = [{'visible': [True, False, False, False, False]},
{'title': 'FB',
'showlegend':True}]),
dict(label = 'Apple',
method = 'update',
args = [{'visible': [False, True, False, False, False]},
{'title': 'APPL',
'showlegend':True}]),
dict(label = 'Amazon',
method = 'update',
args = [{'visible': [False, False, True, False, False]},
{'title': 'AMZN',
'showlegend':True}]),
dict(label = 'Netflix',
method = 'update',
args = [{'visible': [False, False, False, True, False]},
{'title': 'NFLX',
'showlegend':True}]),
dict(label = 'Google',
method = 'update',
args = [{'visible': [False, False, False, False, True]},
{'title': 'GOOG',
'showlegend':True}]),
]),
direction="down",
pad={"r": 10, "t": 10},
showactive=True,
x=0.1,
xanchor="left",
y=1.1,
yanchor="top"
),
]
)
fig.update_layout(
autosize=False,
width=1000,
height=650,)
iplot(fig,show_link=False)
The prices of individual investment securities can vary widely and thus a common reporting practice is to standardize or index these values to a baseline value. Hence, in this section we have standardized the closing prices for the stocks of each FAANG company. Furthermore, we have made an interactive graph that determines the relationship between these stocks by taking into consideration the standardized closing values. The user can hover over the graph and get the values of closing prices for each stock by using the “Compare Data on Hover” function of the graph.
fig = px.line(result, x="Date", y="standard_close", color='Company')
fig.update_layout(title='Standardized Close prices for All Companies from Jan 2013 to Aug 2020',
xaxis_title='Date',
yaxis_title='Standardized Close Price', template="plotly_dark")
iplot(fig,show_link=False)
Volume measures the number of shares traded in a stock or contracts traded in futures or options. Volume can be an indicator of market strength, as rising markets on increasing volume are typically viewed as strong and healthy. In this part we have plotted a histogram that represents a comparative study of the volumes of stocks for each company from 2013 - 2020.
result['Year'] = np.arange(len(result.index))
result['Date'] = pd.to_datetime(result['Date'])
for x, rows in result.iterrows():
result.loc[x, 'Year'] = rows['Date'].year
comp = result.groupby(['Company', 'Year'])
vol_df = pd.DataFrame()
vol = []
company = []
year = []
x = 0
for key,val in comp:
a,b = key
company.append(a)
year.append(b)
vol.append(comp.get_group(key).mean()['Volume'])
vol_df['Company'] = company
vol_df['Year'] = year
vol_df['Volume Mean'] = vol
fig = go.Figure()
avg_vol = vol_df['Volume Mean'].mean()
stand_vol = vol_df['Volume Mean'].std()
vol_df['standard_vol'] = np.arange(len(vol_df.index))
vol_df = vol_df.reset_index(drop=True)
for x, rows in vol_df.iterrows():
vol_df.loc[x, 'standard_vol'] = (rows['Volume Mean'] - avg_vol)/(stand_vol)
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Facebook']['Year'], y=vol_df[vol_df['Company'] == 'Facebook']['standard_vol'], name='FB'))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Apple']['Year'], y=vol_df[vol_df['Company'] == 'Apple']['standard_vol'], name='AAPL'))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Amazon']['Year'], y=vol_df[vol_df['Company'] == 'Amazon']['standard_vol'], name='AMZN'))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Netflix']['Year'], y=vol_df[vol_df['Company'] == 'Netflix']['standard_vol'], name='NFLX'))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Google']['Year'], y=vol_df[vol_df['Company'] == 'Google']['standard_vol'], name='GOOG'))
fig.update_layout(title='Standardized Volume for All Companies from Jan 2013 to Aug 2020 Grouped by Year',
xaxis_title='Date',
yaxis_title='Standard Volume', template="plotly_dark")
iplot(fig,show_link=False)
This set of graphs represent the fitted regression models for stock prices of FAANG companies. This is where machine learning comes into use, we have used two sets of data to feed to this algorithm. These are training and testing values. The training values are those values which are responsible for making the machine understand patterns in the data and also improve the efficiency and accuracy of the algorithm. Consequently, the testing data here is used to check how well the algorithm can predict new answers based on its training. We have plotted regression lines to fit this data and find the best fitting method out of Linear, k-NN and Decision Tree Regression.
This graph represents all different regression techniques to find the plot that best fits and depicts the relationship between the trained and tested data of stock prices of Facebook. It also shows us a line of best fit in the graph which makes it easier to read.
facebook['timestamp'] = pd.to_datetime(facebook.Date).astype(int) // (10**9)
X = np.array(facebook['timestamp']).reshape(-1,1)
y = np.array(facebook['Close'])
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
model = LinearRegression()
model.fit(X_train, y_train)
x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))
fig = go.Figure()
fig.add_trace(go.Scatter(x=X_train.squeeze(), y=y_train, name='Training Data', mode='markers'))
fig.add_trace(go.Scatter(x=X_test.squeeze(), y=y_test, name='Testing Data', mode='markers'))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Linear Regression'))
model = KNeighborsRegressor()
model.fit(X_train, y_train)
x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='kNN Regressor'))
model = DecisionTreeRegressor()
model.fit(X_train, y_train)
x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Decision Tree', marker_color='gold'))
fig.update_layout(
updatemenus=[
dict(
buttons=list([
dict(label = 'All',
method = 'update',
args = [{'visible': [True, True, True, True, True]},
{'title': 'All',
'showlegend':True}]),
dict(label = 'Linear Regression',
method = 'update',
args = [{'visible': [True, True, True, False, False]},
{'title': 'Linear Regression',
'showlegend':True}]),
dict(label = 'k-NN Regressor',
method = 'update',
args = [{'visible': [True, True, False, True, False]},
{'title': 'k-NN Regressor',
'showlegend':True}]),
dict(label = 'Decision Tree Regressor',
method = 'update',
args = [{'visible': [True, True, False, False, True]},
{'title': 'Decision Tree Regressor',
'showlegend':True}]),
]),
direction="down",
pad={"r": 10, "t": 10},
showactive=True,
x=0.1,
xanchor="left",
y=1.1,
yanchor="top"
),
]
)
fig.update_layout(title='Regression Line Fit for Facebook from Jan 2013 to Aug 2020',
xaxis_title='Date',
yaxis_title='Close Price', template="plotly_dark")
fig.update_layout(
autosize=False,
width=1000,
height=650,)
iplot(fig,show_link=False)
results = sm.OLS(y,X).fit()
print(results.summary())
Hypothesis Test is conducted to see if there is a relationship between time and Close Price for Facebook at a 95% Confidence Interval
Hypothesis -
Null Hypothesis: There is no relationship between time and Close Price for Facebook
Alternative Hypothesis: There is relationship between time and Close Price for Facebook
Decision Rule -
The alpha value here is 0.05.
Test Statistic -
p - Value = 0.000
Decision -
The p - value we get is 0.000, hence the alpha value is greater than the p - value, so we reject the null hypothesis and accept the alternative hypothesis.
Conclusion -
As the null hypothesis is rejected, we can conclude that there is a relationship between time and Close Price of Facebook.
This graph represents all different regression techniques to find the plot that best fits and depicts the relationship between the trained and tested data of stock prices of Apple. It also shows us a line of best fit in the graph which makes it easier to read.
apple['timestamp'] = pd.to_datetime(apple.Date).astype(int) // (10**9)
X = np.array(apple['timestamp']).reshape(-1,1)
y = np.array(apple['Close'])
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
model = LinearRegression()
model.fit(X_train, y_train)
x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))
fig = go.Figure()
fig.add_trace(go.Scatter(x=X_train.squeeze(), y=y_train, name='Training Data', mode='markers'))
fig.add_trace(go.Scatter(x=X_test.squeeze(), y=y_test, name='Testing Data', mode='markers'))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Linear Regression'))
model = KNeighborsRegressor()
model.fit(X_train, y_train)
x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='kNN Regressor'))
model = DecisionTreeRegressor()
model.fit(X_train, y_train)
x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Decision Tree', marker_color='gold'))
fig.update_layout(
updatemenus=[
dict(
buttons=list([
dict(label = 'All',
method = 'update',
args = [{'visible': [True, True, True, True, True]},
{'title': 'All',
'showlegend':True}]),
dict(label = 'Linear Regression',
method = 'update',
args = [{'visible': [True, True, True, False, False]},
{'title': 'Linear Regression',
'showlegend':True}]),
dict(label = 'k-NN Regressor',
method = 'update',
args = [{'visible': [True, True, False, True, False]},
{'title': 'k-NN Regressor',
'showlegend':True}]),
dict(label = 'Decision Tree Regressor',
method = 'update',
args = [{'visible': [True, True, False, False, True]},
{'title': 'Decision Tree Regressor',
'showlegend':True}]),
]),
direction="down",
pad={"r": 10, "t": 10},
showactive=True,
x=0.1,
xanchor="left",
y=1.1,
yanchor="top"
),
]
)
fig.update_layout(title='Regression Line Fit for Apple from Jan 2013 to Aug 2020',
xaxis_title='Date',
yaxis_title='Close Price', template="plotly_dark")
fig.update_layout(
autosize=False,
width=1000,
height=650,)
iplot(fig,show_link=False)
results = sm.OLS(y,X).fit()
print(results.summary())
Hypothesis Test is conducted to see if there is a relationship between time and Close Price for Apple at a 95% Confidence Interval
Hypothesis -
Null Hypothesis: There is no relationship between time and Close Price for Apple
Alternative Hypothesis: There is relationship between time and Close Price for Apple
Decision Rule -
The alpha value here is 0.05.
Test Statistic -
p - Value = 0.000
Decision -
The p - value we get is 0.000, hence the alpha value is greater than the p - value, so we reject the null hypothesis and accept the alternative hypothesis.
Conclusion -
As the null hypothesis is rejected, we can conclude that there is a relationship between time and Close Price of Apple.
This graph represents all different regression techniques to find the plot that best fits and depicts the relationship between the trained and tested data of stock prices of Amazon. It also shows us a line of best fit in the graph which makes it easier to read.
amazon['timestamp'] = pd.to_datetime(amazon.Date).astype(int) // (10**9)
X = np.array(amazon['timestamp']).reshape(-1,1)
y = np.array(amazon['Close'])
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
model = LinearRegression()
model.fit(X_train, y_train)
x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))
fig = go.Figure()
fig.add_trace(go.Scatter(x=X_train.squeeze(), y=y_train, name='Training Data', mode='markers'))
fig.add_trace(go.Scatter(x=X_test.squeeze(), y=y_test, name='Testing Data', mode='markers'))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Linear Regression'))
model = KNeighborsRegressor()
model.fit(X_train, y_train)
x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='kNN Regressor'))
model = DecisionTreeRegressor()
model.fit(X_train, y_train)
x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Decision Tree', marker_color='gold'))
fig.update_layout(
updatemenus=[
dict(
buttons=list([
dict(label = 'All',
method = 'update',
args = [{'visible': [True, True, True, True, True]},
{'title': 'All',
'showlegend':True}]),
dict(label = 'Linear Regression',
method = 'update',
args = [{'visible': [True, True, True, False, False]},
{'title': 'Linear Regression',
'showlegend':True}]),
dict(label = 'k-NN Regressor',
method = 'update',
args = [{'visible': [True, True, False, True, False]},
{'title': 'k-NN Regressor',
'showlegend':True}]),
dict(label = 'Decision Tree Regressor',
method = 'update',
args = [{'visible': [True, True, False, False, True]},
{'title': 'Decision Tree Regressor',
'showlegend':True}]),
]),
direction="down",
pad={"r": 10, "t": 10},
showactive=True,
x=0.1,
xanchor="left",
y=1.1,
yanchor="top"
),
]
)
fig.update_layout(title='Regression Line Fit for Amazon from Jan 2013 to Aug 2020',
xaxis_title='Date',
yaxis_title='Close Price', template="plotly_dark")
fig.update_layout(
autosize=False,
width=1000,
height=650,)
iplot(fig,show_link=False)
results = sm.OLS(y,X).fit()
print(results.summary())
Hypothesis Test is conducted to see if there is a relationship between time and Close Price for Amazon at a 95% Confidence Interval
Hypothesis -
Null Hypothesis: There is no relationship between time and Close Price for Amazon
Alternative Hypothesis: There is relationship between time and Close Price for Amazon
Decision Rule -
The alpha value here is 0.05.
Test Statistic -
p - Value = 0.000
Decision -
The p - value we get is 0.000, hence the alpha value is greater than the p - value, so we reject the null hypothesis and accept the alternative hypothesis.
Conclusion -
As the null hypothesis is rejected, we can conclude that there is a relationship between time and Close Price of Amazon.
This graph represents all different regression techniques to find the plot that best fits and depicts the relationship between the trained and tested data of stock prices of Netflix. It also shows us a line of best fit in the graph which makes it easier to read.
netflix['timestamp'] = pd.to_datetime(netflix.Date).astype(int) // (10**9)
X = np.array(netflix['timestamp']).reshape(-1,1)
y = np.array(netflix['Close'])
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
model = LinearRegression()
model.fit(X_train, y_train)
x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))
fig = go.Figure()
fig.add_trace(go.Scatter(x=X_train.squeeze(), y=y_train, name='Training Data', mode='markers'))
fig.add_trace(go.Scatter(x=X_test.squeeze(), y=y_test, name='Testing Data', mode='markers'))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Linear Regression'))
model = KNeighborsRegressor()
model.fit(X_train, y_train)
x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='kNN Regressor'))
model = DecisionTreeRegressor()
model.fit(X_train, y_train)
x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Decision Tree', marker_color='gold'))
fig.update_layout(
updatemenus=[
dict(
buttons=list([
dict(label = 'All',
method = 'update',
args = [{'visible': [True, True, True, True, True]},
{'title': 'All',
'showlegend':True}]),
dict(label = 'Linear Regression',
method = 'update',
args = [{'visible': [True, True, True, False, False]},
{'title': 'Linear Regression',
'showlegend':True}]),
dict(label = 'k-NN Regressor',
method = 'update',
args = [{'visible': [True, True, False, True, False]},
{'title': 'k-NN Regressor',
'showlegend':True}]),
dict(label = 'Decision Tree Regressor',
method = 'update',
args = [{'visible': [True, True, False, False, True]},
{'title': 'Decision Tree Regressor',
'showlegend':True}]),
]),
direction="down",
pad={"r": 10, "t": 10},
showactive=True,
x=0.1,
xanchor="left",
y=1.1,
yanchor="top"
),
]
)
fig.update_layout(title='Regression Line Fit for Netflix from Jan 2013 to Aug 2020',
xaxis_title='Date',
yaxis_title='Close Price', template="plotly_dark")
fig.update_layout(
autosize=False,
width=1000,
height=650,)
iplot(fig,show_link=False)
results = sm.OLS(y,X).fit()
print(results.summary())
Hypothesis Test is conducted to see if there is a relationship between time and Close Price for Netflix at a 95% Confidence Interval
Hypothesis -
Null Hypothesis: There is no relationship between time and Close Price for Netflix
Alternative Hypothesis: There is relationship between time and Close Price for Netflix
Decision Rule -
The alpha value here is 0.05.
Test Statistic -
p - Value = 0.000
Decision -
The p - value we get is 0.000, hence the alpha value is greater than the p - value, so we reject the null hypothesis and accept the alternative hypothesis.
Conclusion -
As the null hypothesis is rejected, we can conclude that there is a relationship between time and Close Price of Netflix.
This graph represents all different regression techniques to find the plot that best fits and depicts the relationship between the trained and tested data of stock prices of Google. It also shows us a line of best fit in the graph which makes it easier to read.
google['timestamp'] = pd.to_datetime(google.Date).astype(int) // (10**9)
X = np.array(google['timestamp']).reshape(-1,1)
y = np.array(google['Close'])
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
model = LinearRegression()
model.fit(X_train, y_train)
x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))
fig = go.Figure()
fig.add_trace(go.Scatter(x=X_train.squeeze(), y=y_train, name='Training Data', mode='markers'))
fig.add_trace(go.Scatter(x=X_test.squeeze(), y=y_test, name='Testing Data', mode='markers'))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Linear Regression'))
model = KNeighborsRegressor()
model.fit(X_train, y_train)
x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='kNN Regressor'))
model = DecisionTreeRegressor()
model.fit(X_train, y_train)
x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Decision Tree', marker_color='gold'))
fig.update_layout(
updatemenus=[
dict(
buttons=list([
dict(label = 'All',
method = 'update',
args = [{'visible': [True, True, True, True, True]},
{'title': 'All',
'showlegend':True}]),
dict(label = 'Linear Regression',
method = 'update',
args = [{'visible': [True, True, True, False, False]},
{'title': 'Linear Regression',
'showlegend':True}]),
dict(label = 'k-NN Regressor',
method = 'update',
args = [{'visible': [True, True, False, True, False]},
{'title': 'k-NN Regressor',
'showlegend':True}]),
dict(label = 'Decision Tree Regressor',
method = 'update',
args = [{'visible': [True, True, False, False, True]},
{'title': 'Decision Tree Regressor',
'showlegend':True}]),
]),
direction="down",
pad={"r": 10, "t": 10},
showactive=True,
x=0.1,
xanchor="left",
y=1.1,
yanchor="top"
),
]
)
fig.update_layout(title='Regression Line Fit for Google from Jan 2013 to Aug 2020',
xaxis_title='Date',
yaxis_title='Close Price', template="plotly_dark")
fig.update_layout(
autosize=False,
width=1000,
height=650,)
iplot(fig,show_link=False)
results = sm.OLS(y,X).fit()
print(results.summary())
Hypothesis Test is conducted to see if there is a relationship between time and Close Price for Google at a 95% Confidence Interval
Hypothesis -
Null Hypothesis: There is no relationship between time and Close Price for Google Alternative Hypothesis: There is relationship between time and Close Price for Google
Decision Rule -
The alpha value here is 0.05.
Test Statistic -
p - Value = 0.000
Decision -
The p - value we get is 0.000, hence the alpha value is greater than the p - value, so we reject the null hypothesis and accept the alternative hypothesis.
Conclusion -
As the null hypothesis is rejected, we can conclude that there is a relationship between time and Close Price of Google.
In this section we will use Tree predictors, Linear predictors and k-NN predictors in order to predict the different values of any possible or new data points. Here, we will be predicting the new data points for closing values. This graph is an interactive plot and also has different “modes”, the user can select a different technique for predicting values by using the drop down menu right above the graph. Each mode will give a new graph that shows the actual value and predicted value of the closing price for FAANG stocks.
The graph below depicts the actual and predicted closing prices for Facebook. Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Facebook stocks.
df = facebook[['Close']].copy(deep=True)
future_days = 500
df['Prediction'] = df[['Close']].shift(-future_days)
X = np.array(df.drop(['Prediction'], 1))[:-future_days]
y = np.array(df['Prediction'])[:-future_days]
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)
tree = DecisionTreeRegressor().fit(x_train, y_train)
lr = LinearRegression().fit(x_train, y_train)
knn = KNeighborsRegressor().fit(x_train, y_train)
x_future = df.drop(['Prediction'], 1)[:-future_days]
x_future = x_future.tail(future_days)
x_future = np.array(x_future)
tree_prediction = tree.predict(x_future)
lr_prediction = lr.predict(x_future)
knn_prediction = knn.predict(x_future)
predictions = tree_prediction
valid = df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]
fig = go.Figure()
fig.add_trace(go.Scatter(x=df.index.values, y=df['Close'], name='Actual Close',
line=dict(width=1.5)))
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close D-Tree',
line=dict(width=1.5)))
predictions = lr_prediction
valid = df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close Lin Reg',
line=dict(width=1.5)))
predictions = knn_prediction
valid = df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close k-NN',
marker_color='gold',
line=dict(width=1.5)))
fig.update_layout(
updatemenus=[
dict(
buttons=list([
dict(label = 'All',
method = 'update',
args = [{'visible': [True, True, True, True]},
{'title': 'All',
'showlegend':True}]),
dict(label = 'Decision Tree Prediction',
method = 'update',
args = [{'visible': [True, True, False, False]},
{'title': 'Linear Regression',
'showlegend':True}]),
dict(label = 'Linear Regression Prediction',
method = 'update',
args = [{'visible': [True, False, True, False]},
{'title': 'k-NN Regressor',
'showlegend':True}]),
dict(label = 'k-NN Regressor Prediction',
method = 'update',
args = [{'visible': [True, False, False, True]},
{'title': 'Decision Tree Regressor',
'showlegend':True}]),
]),
direction="down",
pad={"r": 10, "t": 10},
showactive=True,
x=0.1,
xanchor="left",
y=1.1,
yanchor="top"
),
]
)
fig.update_layout(title='Predicted Values for Facebook For the last 500 Days',
xaxis_title='Date',
yaxis_title='Close Price', template="plotly_dark")
fig.update_layout(
autosize=False,
width=1000,
height=650,)
iplot(fig,show_link=False)
Here we calculate the accuracy scores for the three different models, k-NN Regressor, Decision Tree Regressor, and Linear Regressor. k-NN Regressor model has an accuracy score of 0.8749537040637907, Decision Tree Regressor has an accuracy score of 0.7799458755648638, and Linear Regressor has an accuracy score of 0.7997626275855734. Looking at the accuracy scores, it is evident that all models have a very high accuracy, however, k-NN Regressor model is the most accurate prediciting model.
confidencetree = tree.score(x_test, y_test)
confidencelr = lr.score(x_test, y_test)
confidenceknn = knn.score(x_test, y_test)
print("k-NN Regressor Accuracy Score: " + str(confidenceknn))
print("Decision Tree Regressor Accuracy Score: " + str(confidencetree))
print("Linear Regressor Accuracy Score: " + str(confidencelr))
The graph below depicts the actual and predicted closing prices for Apple. Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Apple stocks.
df = apple[['Close']].copy(deep=True)
future_days = 500
df['Prediction'] = df[['Close']].shift(-future_days)
X = np.array(df.drop(['Prediction'], 1))[:-future_days]
y = np.array(df['Prediction'])[:-future_days]
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)
tree = DecisionTreeRegressor().fit(x_train, y_train)
lr = LinearRegression().fit(x_train, y_train)
knn = KNeighborsRegressor().fit(x_train, y_train)
x_future = df.drop(['Prediction'], 1)[:-future_days]
x_future = x_future.tail(future_days)
x_future = np.array(x_future)
tree_prediction = tree.predict(x_future)
lr_prediction = lr.predict(x_future)
knn_prediction = knn.predict(x_future)
predictions = tree_prediction
valid = df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]
fig = go.Figure()
fig.add_trace(go.Scatter(x=df.index.values, y=df['Close'], name='Actual Close',
line=dict(width=1.5)))
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close D-Tree',
line=dict(width=1.5)))
predictions = lr_prediction
valid = df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close Lin Reg',
line=dict(width=1.5)))
predictions = knn_prediction
valid = df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close k-NN',
marker_color='gold',
line=dict(width=1.5)))
fig.update_layout(
updatemenus=[
dict(
buttons=list([
dict(label = 'All',
method = 'update',
args = [{'visible': [True, True, True, True]},
{'title': 'All',
'showlegend':True}]),
dict(label = 'Decision Tree Prediction',
method = 'update',
args = [{'visible': [True, True, False, False]},
{'title': 'Linear Regression',
'showlegend':True}]),
dict(label = 'Linear Regression Prediction',
method = 'update',
args = [{'visible': [True, False, True, False]},
{'title': 'k-NN Regressor',
'showlegend':True}]),
dict(label = 'k-NN Regressor Prediction',
method = 'update',
args = [{'visible': [True, False, False, True]},
{'title': 'Decision Tree Regressor',
'showlegend':True}]),
]),
direction="down",
pad={"r": 10, "t": 10},
showactive=True,
x=0.1,
xanchor="left",
y=1.1,
yanchor="top"
),
]
)
fig.update_layout(title='Predicted Values for Apple For the last 500 Days',
xaxis_title='Date',
yaxis_title='Close Price', template="plotly_dark")
fig.update_layout(
autosize=False,
width=1000,
height=650,)
iplot(fig,show_link=False)
Here we calculate the accuracy scores for the three different models, k-NN Regressor, Decision Tree Regressor, and Linear Regressor. k-NN Regressor model has an accuracy score of 0.8810877897308647, Decision Tree Regressor has an accuracy score of 0.8153372738432569, and Linear Regressor has an accuracy score of 0.7450194573323083. Looking at the accuracy scores, it is evident that all models have a very high accuracy, however, k-NN Regressor model is the most accurate prediciting model.
confidencetree = tree.score(x_test, y_test)
confidencelr = lr.score(x_test, y_test)
confidenceknn = knn.score(x_test, y_test)
print("k-NN Regressor Accuracy Score: " + str(confidenceknn))
print("Decision Tree Regressor Accuracy Score: " + str(confidencetree))
print("Linear Regressor Accuracy Score: " + str(confidencelr))
The graph below depicts the actual and predicted closing prices for Amazon. Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Amazon stocks.
df = amazon[['Close']].copy(deep=True)
future_days = 500
df['Prediction'] = df[['Close']].shift(-future_days)
X = np.array(df.drop(['Prediction'], 1))[:-future_days]
y = np.array(df['Prediction'])[:-future_days]
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)
tree = DecisionTreeRegressor().fit(x_train, y_train)
lr = LinearRegression().fit(x_train, y_train)
knn = KNeighborsRegressor().fit(x_train, y_train)
x_future = df.drop(['Prediction'], 1)[:-future_days]
x_future = x_future.tail(future_days)
x_future = np.array(x_future)
tree_prediction = tree.predict(x_future)
lr_prediction = lr.predict(x_future)
knn_prediction = knn.predict(x_future)
predictions = tree_prediction
valid = df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]
fig = go.Figure()
fig.add_trace(go.Scatter(x=df.index.values, y=df['Close'], name='Actual Close',
line=dict(width=1.5)))
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close D-Tree',
line=dict(width=1.5)))
predictions = lr_prediction
valid = df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close Lin Reg',
line=dict(width=1.5)))
predictions = knn_prediction
valid = df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close k-NN',
marker_color='gold',
line=dict(width=1.5)))
fig.update_layout(
updatemenus=[
dict(
buttons=list([
dict(label = 'All',
method = 'update',
args = [{'visible': [True, True, True, True]},
{'title': 'All',
'showlegend':True}]),
dict(label = 'Decision Tree Prediction',
method = 'update',
args = [{'visible': [True, True, False, False]},
{'title': 'Linear Regression',
'showlegend':True}]),
dict(label = 'Linear Regression Prediction',
method = 'update',
args = [{'visible': [True, False, True, False]},
{'title': 'k-NN Regressor',
'showlegend':True}]),
dict(label = 'k-NN Regressor Prediction',
method = 'update',
args = [{'visible': [True, False, False, True]},
{'title': 'Decision Tree Regressor',
'showlegend':True}]),
]),
direction="down",
pad={"r": 10, "t": 10},
showactive=True,
x=0.1,
xanchor="left",
y=1.1,
yanchor="top"
),
]
)
fig.update_layout(title='Predicted Values for Amazon For the last 500 Days',
xaxis_title='Date',
yaxis_title='Close Price', template="plotly_dark")
fig.update_layout(
autosize=False,
width=1000,
height=650,)
iplot(fig,show_link=False)
Here we calculate the accuracy scores for the three different models, k-NN Regressor, Decision Tree Regressor, and Linear Regressor. k-NN Regressor model has an accuracy score of 0.9481650428008439, Decision Tree Regressor has an accuracy score of 0.9183419691660628, and Linear Regressor has an accuracy score of 0.8465830403947987. Looking at the accuracy scores, it is evident that all models have a very high accuracy, however, k-NN Regressor model is the most accurate prediciting model.
confidencetree = tree.score(x_test, y_test)
confidencelr = lr.score(x_test, y_test)
confidenceknn = knn.score(x_test, y_test)
print("k-NN Regressor Accuracy Score: " + str(confidenceknn))
print("Decision Tree Regressor Accuracy Score: " + str(confidencetree))
print("Linear Regressor Accuracy Score: " + str(confidencelr))
The graph below depicts the actual and predicted closing prices for Netflix. Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Netflix stocks.
df = netflix[['Close']].copy(deep=True)
future_days = 500
df['Prediction'] = df[['Close']].shift(-future_days)
X = np.array(df.drop(['Prediction'], 1))[:-future_days]
y = np.array(df['Prediction'])[:-future_days]
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)
tree = DecisionTreeRegressor().fit(x_train, y_train)
lr = LinearRegression().fit(x_train, y_train)
knn = KNeighborsRegressor().fit(x_train, y_train)
x_future = df.drop(['Prediction'], 1)[:-future_days]
x_future = x_future.tail(future_days)
x_future = np.array(x_future)
tree_prediction = tree.predict(x_future)
lr_prediction = lr.predict(x_future)
knn_prediction = knn.predict(x_future)
predictions = tree_prediction
valid = df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]
fig = go.Figure()
fig.add_trace(go.Scatter(x=df.index.values, y=df['Close'], name='Actual Close',
line=dict(width=1.5)))
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close D-Tree',
line=dict(width=1.5)))
predictions = lr_prediction
valid = df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close Lin Reg',
line=dict(width=1.5)))
predictions = knn_prediction
valid = df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close k-NN',
marker_color='gold',
line=dict(width=1.5)))
fig.update_layout(
updatemenus=[
dict(
buttons=list([
dict(label = 'All',
method = 'update',
args = [{'visible': [True, True, True, True]},
{'title': 'All',
'showlegend':True}]),
dict(label = 'Decision Tree Prediction',
method = 'update',
args = [{'visible': [True, True, False, False]},
{'title': 'Linear Regression',
'showlegend':True}]),
dict(label = 'Linear Regression Prediction',
method = 'update',
args = [{'visible': [True, False, True, False]},
{'title': 'k-NN Regressor',
'showlegend':True}]),
dict(label = 'k-NN Regressor Prediction',
method = 'update',
args = [{'visible': [True, False, False, True]},
{'title': 'Decision Tree Regressor',
'showlegend':True}]),
]),
direction="down",
pad={"r": 10, "t": 10},
showactive=True,
x=0.1,
xanchor="left",
y=1.1,
yanchor="top"
),
]
)
fig.update_layout(title='Predicted Values for Netflix For the last 500 Days',
xaxis_title='Date',
yaxis_title='Close Price', template="plotly_dark")
fig.update_layout(
autosize=False,
width=1000,
height=650,)
iplot(fig,show_link=False)
Here we calculate the accuracy scores for the three different models, k-NN Regressor, Decision Tree Regressor, and Linear Regressor. k-NN Regressor model has an accuracy score of 0.8285444089046352, Decision Tree Regressor has an accuracy score of 0.7158090364057252, and Linear Regressor has an accuracy score of 0.6464066460021082. Looking at the accuracy scores, it is evident that all models have a very high accuracy, however, k-NN Regressor model is the most accurate prediciting model.
confidencetree = tree.score(x_test, y_test)
confidencelr = lr.score(x_test, y_test)
confidenceknn = knn.score(x_test, y_test)
print("k-NN Regressor Accuracy Score: " + str(confidenceknn))
print("Decision Tree Regressor Accuracy Score: " + str(confidencetree))
print("Linear Regressor Accuracy Score: " + str(confidencelr))
The graph below depicts the actual and predicted closing prices for Google. Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Google stocks.
df = google[['Close']].copy(deep=True)
future_days = 500
df['Prediction'] = df[['Close']].shift(-future_days)
X = np.array(df.drop(['Prediction'], 1))[:-future_days]
y = np.array(df['Prediction'])[:-future_days]
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)
tree = DecisionTreeRegressor().fit(x_train, y_train)
lr = LinearRegression().fit(x_train, y_train)
knn = KNeighborsRegressor().fit(x_train, y_train)
x_future = df.drop(['Prediction'], 1)[:-future_days]
x_future = x_future.tail(future_days)
x_future = np.array(x_future)
tree_prediction = tree.predict(x_future)
lr_prediction = lr.predict(x_future)
knn_prediction = knn.predict(x_future)
predictions = tree_prediction
valid = df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]
fig = go.Figure()
fig.add_trace(go.Scatter(x=df.index.values, y=df['Close'], name='Actual Close',
line=dict(width=1.5)))
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close D-Tree',
line=dict(width=1.5)))
predictions = lr_prediction
valid = df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close Lin Reg',
line=dict(width=1.5)))
predictions = knn_prediction
valid = df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close k-NN',
marker_color='gold',
line=dict(width=1.5)))
fig.update_layout(
updatemenus=[
dict(
buttons=list([
dict(label = 'All',
method = 'update',
args = [{'visible': [True, True, True, True]},
{'title': 'All',
'showlegend':True}]),
dict(label = 'Decision Tree Prediction',
method = 'update',
args = [{'visible': [True, True, False, False]},
{'title': 'Linear Regression',
'showlegend':True}]),
dict(label = 'Linear Regression Prediction',
method = 'update',
args = [{'visible': [True, False, True, False]},
{'title': 'k-NN Regressor',
'showlegend':True}]),
dict(label = 'k-NN Regressor Prediction',
method = 'update',
args = [{'visible': [True, False, False, True]},
{'title': 'Decision Tree Regressor',
'showlegend':True}]),
]),
direction="down",
pad={"r": 10, "t": 10},
showactive=True,
x=0.1,
xanchor="left",
y=1.1,
yanchor="top"
),
]
)
fig.update_layout(title='Predicted Values for Google For the last 500 Days',
xaxis_title='Date',
yaxis_title='Close Price', template="plotly_dark")
fig.update_layout(
autosize=False,
width=1000,
height=650,)
iplot(fig,show_link=False)
Here we calculate the accuracy scores for the three different models, k-NN Regressor, Decision Tree Regressor, and Linear Regressor. k-NN Regressor model has an accuracy score of 0.9285098507660844, Decision Tree Regressor has an accuracy score of 0.8827985730839002, and Linear Regressor has an accuracy score of 0.8891613969513601. Looking at the accuracy scores, it is evident that all models have a very high accuracy, however, k-NN Regressor model is the most accurate prediciting model.
confidencetree = tree.score(x_test, y_test)
confidencelr = lr.score(x_test, y_test)
confidenceknn = knn.score(x_test, y_test)
print("k-NN Regressor Accuracy Score: " + str(confidenceknn))
print("Decision Tree Regressor Accuracy Score: " + str(confidencetree))
print("Linear Regressor Accuracy Score: " + str(confidencelr))
The inference we can draw from this graph is that the k-NN Regressor model gives us one of the most accurate predictions looking at the accuracy scores. Although, there are a couple outliers in the graphs that could possibly change the predictions. However, these outliers happen to be very insignificant and can always change based on the trading model that companies decide to adopt.
Here we will outline what we saw and learned from the analysis of all the plots:
1) The general trend of all stocks' price features were increasing over time. This is in line with the reason FAANG stocks are popular. They are considered "Blue Chip" stocks since they're meant to be held for a long time due to their upward trend. The FAANG stocks are often grouped together since they relate to the larger technology companies in Silicon Valley based on their stock prices. We saw from our Close Prices plot that they have an increase over time which shows us why these stocks are "Blue Chip".
2) It's evident from the individual company price and volume plots that the stocks with higher volume have lower price and the ones with lower volume have higher prices. This is a trend that can also be put verbally since if a company has a larger volume of stocks available they can price them at a lower rate, and the opposite for companies with lower volumes. For example, Apple has the highest volumes overall among all companies for all years and their stock price is the lowest at a median price of USD37.78 per stock. On the other hand, the lowest volume available was for Amazon, which had a median stock price of USD765.7 per stock. This is sort of shocking since this way people can afford Apple stock more easily than Amazon stock.
3) Based on our correlation plots and the standardized close price plot, we noticed a strong correlation between the stock prices of Amazon and Google. The correlation plot gives us a value of 0.6017701 which was the highest value of correlation among the others. Also we can see on our standardized close plot that their prices were almost symmetrical along the y-axis. When Google's price when up, Amazon went down and vice-versa.
4) We noticed from our regression fit plots that the Decision Tree Regressor was a better fit, however later realized that k-Nearest Neighbors (k-NN) regressor gives us a better fit when analyzing the graphs and analyzing the r-squared scores and regression summaries of the OLS model, which showed us that there was a relationship between time and close price of the stocks. This was expected since k-NN Regressor and Classifier is commonly used as a regression algorithm to predict stocks in the real world. However, k-NN is often combined with better strategies and the market is narrowed before the stock prices are predicted. We also concluded in our Hypothesis Testing that the relationship between time and close price for these stocks exist. This was then analyzed using our regressions and we found that our regression analysis was valid.
5) Finally we used some predictive analysis from our learnings of the regression fits which helped us predict the prices of the stocks during their last 500 days in the time period we chose which was from Jan 2013 to Aug 2020. The reason we chose this time period was because Facebook hadn't had an IPO (initial price offering) until 2012. Which is why to keep our data and analysis clean, we used the time period between 2013 and 2020 to analyze our model. We saw in our predictions' plots that we were able to (at least to some extent) predict the future prices based on our model. The Linear Regression pdeictions were terrible and would not work in the real world, however they did give us a general increasing trend which was expected. Based on a simple visual analysis, we assumed that Decision Tree Regressor gave us a better prediction, which would not be in line with real world stock prediction. However, when analyzing the accuracy scores of our predictions, we saw that the k-NN regressor gave us the best predictions. This was in fact in line with real world usage as we said in our previous point. k-NN is often used to predict stock prices in the real world and several stock price analyses use k-NN to predict the movement of these stocks. We should however, add that k-NN on its own cannot be used as a "perfect" stock price predictor and must be coupled with better strategies and narrower markets to make accurate predictions.
Future Scope: We will examine the relationship between Google and Amazon to understand the reason behind their strong correlation. To do this, we will need to dig deeper into the working of these companies in real life. The research can be extended with more efficient algorithms to identify trends and patterns which can be used for beter predictive modelling. We can also use better strategies to predict the future of these stock prices, which may make their predictions more accurate. We can use more factors to analyze hidden trends and narrow the market to improve our predictability.
fig = go.Figure()
fig.add_trace(go.Candlestick(x=facebook['Date'],
open=facebook['Open'],
high=facebook['High'],
low=facebook['Low'],
close=facebook['Close'], name='FB'))
fig.add_trace(go.Candlestick(x=apple['Date'],
open=apple['Open'],
high=apple['High'],
low=apple['Low'],
close=apple['Close'], name='AAPL', visible='legendonly'))
fig.add_trace(go.Candlestick(x=amazon['Date'],
open=amazon['Open'],
high=amazon['High'],
low=amazon['Low'],
close=amazon['Close'], name='AMZN', visible='legendonly'))
fig.add_trace(go.Candlestick(x=netflix['Date'],
open=netflix['Open'],
high=netflix['High'],
low=netflix['Low'],
close=netflix['Close'], name='NFLX', visible='legendonly'))
fig.add_trace(go.Candlestick(x=google['Date'],
open=google['Open'],
high=google['High'],
low=google['Low'],
close=google['Close'], name='GOOG', visible='legendonly'))
fig.update_layout(title='Candlestick Plots for All Companies from Jan 2013 to Aug 2020',
xaxis_title='Date',
yaxis_title='Ticks', template="plotly_dark")
fig.update_layout(
updatemenus=[
dict(
buttons=list([
dict(label = 'Facebook',
method = 'update',
args = [{'visible': [True, False, False, False, False]},
{'title': 'FB',
'showlegend':True}]),
dict(label = 'Apple',
method = 'update',
args = [{'visible': [False, True, False, False, False]},
{'title': 'APPL',
'showlegend':True}]),
dict(label = 'Amazon',
method = 'update',
args = [{'visible': [False, False, True, False, False]},
{'title': 'AMZN',
'showlegend':True}]),
dict(label = 'Netflix',
method = 'update',
args = [{'visible': [False, False, False, True, False]},
{'title': 'NFLX',
'showlegend':True}]),
dict(label = 'Google',
method = 'update',
args = [{'visible': [False, False, False, False, True]},
{'title': 'GOOG',
'showlegend':True}]),
]),
direction="down",
pad={"r": 10, "t": 10},
showactive=True,
x=0.1,
xanchor="left",
y=1.1,
yanchor="top"
),
]
)
fig.update_layout(xaxis_rangeslider_visible=False)
fig.update_layout(
autosize=False,
width=1000,
height=650,)
iplot(fig,show_link=False)