Data Analysis on FAANG Stocks from 2013 to 2020

CMSC 320 Final Project Tutorial

Authors: Harshit Raj | Vaibhav Khetan | Yash Kalyani

FAANG LOGOS

Project Description -

This project is a way for users to analyze previous and current stock data of the FAANG companies. In order to carry out a predictive, comparative and quantitative analysis of this data we have extracted relevant information from '.csv' files that have the required stock market information. We have imported various packages to help us carry out different functions which will make analyzing data in a more efficient way. We read the '.csv' files using pandas and extracted information into tables by splitting up the data as per the comma delimiters. Since, the FAANG companies had their IPOs established at different times which is why for the sake of consistency we chose to graph plots only after January, 2013. In this project we also saved the stock data in the form of a pandas dataframe, from these dataframes we pulled out the information that will be relevant in making graphs which can depict important stock information regarding the concerned company. Moreover, we have also tried to carry out hypothesis testing in order to check if all the data fits the line plots and scatter plots that have been made.

Dataset -

We used Kaggle - a popular online website which gives users access to thousands of public datasets. One user Aayush Mishra had uploaded a dataset with all FAANG stock ticks with the features - Date, Open, Close, High, Low, Adj Close and Volume - in a dataset named FAANG- Complete Stock Data. The link for this dataset is provided below:

https://www.kaggle.com/aayushmishra1512/faang-complete-stock-data

The dataset is realiable and we used Yahoo Stocks to verify that the given data's ticks matched perfectly with the actual data recorded by Yahoo Stocks.

STONKS

Motivation -

Today, stocks are extremely important; they play an integral role in determining the growth of a company and can also shape an individuals wealth. For a very long time people have been investing their money in buying shares of a company and also trading stocks. People exist, who have made a fortune by investing in companies that have grown exponentially in the last few years. However, stocks have had erratic patterns throughout the years, they have suffered during recessions and have also had setbacks due to certain internal or even external factors. Our group decided to work on some kind of stock analysis that would help us and other readers understand more about the stock market, and also be able to visualize the various attributes about a particular stock at a particular time. We also wanted to examine the change in trend in stocks during recessions (2001, 2008) and compare them to the trends today amidst a worldwide pandemic.

Usage of Interactive Plots -

We used Plotly - a python tool that allows us to create interactive plots. Some of the functionality is given below:

1) Hovering over any data allows a user to see what is contained in that datapoint on the plot.

2) Click on Compare Data On Hover to compare multiple datapoints on the same plot.

3) Drop Down menus can be used to toggle functionality/visualizations on the plots.

For More Information on Plotly, go to the following link:

PLOTLY FOR PYTHON

About Imports -

Here, we have imported all required packages and modules that python has to offer and those that will be useful in the project. We are primarily using pandas and pyplot from matplotlib. The functions in these modules provide us with functionality and accessibility that makes our code compact and also helps us to add interactive graphs making the final product visual and understandable by readers.

In [230]:
!pip3 install plotly==4.14.1
Requirement already satisfied: plotly==4.14.1 in /opt/conda/lib/python3.8/site-packages (4.14.1)
Requirement already satisfied: retrying>=1.3.3 in /opt/conda/lib/python3.8/site-packages (from plotly==4.14.1) (1.3.3)
Requirement already satisfied: six in /opt/conda/lib/python3.8/site-packages (from plotly==4.14.1) (1.15.0)
In [231]:
import json
import plotly.figure_factory as ff
import plotly.graph_objects as go
import plotly.express as px
import plotly.offline as py
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=False)
In [232]:
import pandas as pd
import numpy as np
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.api as sm
import sklearn
import math
from datetime import datetime, date
from sklearn import preprocessing
from sklearn import datasets
from sklearn import utils
from sklearn import linear_model
from sklearn.metrics import *
from sklearn.preprocessing import *
from statsmodels.formula.api import ols
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import train_test_split

Data Collection -

In this part, we add all our collected data to our ipynb file which can then be processed and analyzed.

In this cell below, we have initialized DataFrames for each of the FAANG companies. The files that hold the stock data for each of these companies are ".csv" files which means that the values that we need are stored as comma separated values. What we have done here is create a pandas dataframe for the stock data for each of these companies by reading from their respective ".csv" files.

In [233]:
facebook = pd.read_csv("data/Facebook.csv", sep=',')
apple = pd.read_csv("data/Apple.csv", sep=',')
amazon = pd.read_csv("data/Amazon.csv", sep=',')
netflix = pd.read_csv("data/Netflix.csv", sep=',')
google = pd.read_csv("data/Google.csv", sep=',')

Data Processing -

In this part, we process the data, also called Data Cleaning. We change the data time range and format it to our requirements.

When we read the .csv files above the DataFrame had columns where the date of the particular stock was stored in the form of a String. In a pandas dataframe if the date is in the form of a string then python will not be able to use it in graphs or any sort of computations. Hence, we have converted the entire Date column in the DataFrame from a Python String to a Python Datetime object that can be used to create graphs and use these Dates for comparisons and for understanding stock trends on a particular day or range of dates.

Formatting Dates to Datetime:

In [234]:
facebook['Date'] = pd.to_datetime(facebook['Date'])
apple['Date'] = pd.to_datetime(apple['Date'])
amazon['Date'] = pd.to_datetime(amazon['Date'])
netflix['Date'] = pd.to_datetime(netflix['Date'])
google['Date'] = pd.to_datetime(google['Date'])

Cleaning Data:

Here we start setting up interactive graphs that will depict different, relevant information about stocks. Below, you will notice that we have taken only those rows in each DataFrame that are greater than 2012. We have done this because almost all the companies in FAANG had very different IPO dates. IPOs stand for Initial Public Offerings, which basically means that the stocks of these companies were now open to be bought by individuals.

In order to graph this DataFrame, we first changed each DataFrame to consist of those stock trends from Jan, 2013 to Aug, 2020. After modifying these DataFrames we dropped the index column, thus, making these DataFrames ready to plot.

Picking Our Date Ranges:

In [235]:
facebook = facebook[(facebook['Date'].dt.year > 2012) & (facebook['Date'].dt.year < 2021)]
apple = apple[(apple['Date'].dt.year > 2012) & (apple['Date'].dt.year < 2021)]
amazon = amazon[(amazon['Date'].dt.year > 2012) & (amazon['Date'].dt.year < 2021)]
netflix = netflix[(netflix['Date'].dt.year > 2012) & (netflix['Date'].dt.year < 2021)]
google = google[(google['Date'].dt.year > 2012) & (google['Date'].dt.year < 2021)]

facebook = facebook.reset_index(drop=True)
apple = apple.reset_index(drop=True)
amazon = amazon.reset_index(drop=True)
netflix = netflix.reset_index(drop=True)
google = google.reset_index(drop=True)

facebook
Out[235]:
Date Open High Low Close Adj Close Volume
0 2013-01-02 27.440001 28.180000 27.420000 28.000000 28.000000 69846400
1 2013-01-03 27.879999 28.469999 27.590000 27.770000 27.770000 63140600
2 2013-01-04 28.010000 28.930000 27.830000 28.760000 28.760000 72715400
3 2013-01-07 28.690001 29.790001 28.650000 29.420000 29.420000 83781800
4 2013-01-08 29.510000 29.600000 28.860001 29.059999 29.059999 45871300
... ... ... ... ... ... ... ...
1916 2020-08-12 258.970001 263.899994 258.109985 259.890015 259.890015 21428300
1917 2020-08-13 261.549988 265.160004 259.570007 261.299988 261.299988 17374000
1918 2020-08-14 262.309998 262.649994 258.679993 261.239990 261.239990 14792700
1919 2020-08-17 262.500000 264.100006 259.399994 261.160004 261.160004 13351100
1920 2020-08-18 260.950012 265.149994 259.260010 262.339996 262.339996 18677500

1921 rows × 7 columns

Exploratory Analysis and Data Visualization -

Correlation Plots for Features of Individual Companies :

The following plots visually attactive and interactive as well. This matrix - like graph has the various stock attributes like open price, close price and volume listed on the left and bottom of the matrix. We have calculated the different correlations by using the .pct_change() function provided to us. Hence, we created a new dataframe that was responsible for storing these values which will then be used to make a matrix plot that will then be filled with colors which determine the correlation between two different companies and their stock attributes. This correlation helps users to understand how a change in different attributes at one company can be noticed in another company in the same sector. We have provided a legend that will allow users to read this graph more efficiently and also understand it better.

In [236]:
corr_df_fb = facebook[['Open', 'Close', 'High', 'Low', 'Adj Close', 'Volume']].copy(deep=True)

retscomp_fb = corr_df_fb.pct_change()

corr_fb = retscomp_fb.corr()
corr_fb
Out[236]:
Open Close High Low Adj Close Volume
Open 1.000000 0.401758 0.769315 0.758697 0.401758 0.016311
Close 0.401758 1.000000 0.747093 0.732999 1.000000 0.007707
High 0.769315 0.747093 1.000000 0.790089 0.747093 0.192635
Low 0.758697 0.732999 0.790089 1.000000 0.732999 -0.178854
Adj Close 0.401758 1.000000 0.747093 0.732999 1.000000 0.007707
Volume 0.016311 0.007707 0.192635 -0.178854 0.007707 1.000000
In [237]:
fig = px.imshow(corr_fb)

fig.update_layout(title='Correlation between Features of Facebook Stock')

iplot(fig,show_link=False)
In [238]:
corr_df_ap = apple[['Open', 'Close', 'High', 'Low', 'Adj Close', 'Volume']].copy(deep=True)

retscomp_ap = corr_df_ap.pct_change()

corr_ap = retscomp_ap.corr()
corr_ap
Out[238]:
Open Close High Low Adj Close Volume
Open 1.000000 0.413942 0.751016 0.768682 0.414086 -0.037703
Close 0.413942 1.000000 0.742652 0.735377 0.999469 -0.106651
High 0.751016 0.742652 1.000000 0.775474 0.741752 0.113300
Low 0.768682 0.735377 0.775474 1.000000 0.734981 -0.264364
Adj Close 0.414086 0.999469 0.741752 0.734981 1.000000 -0.107836
Volume -0.037703 -0.106651 0.113300 -0.264364 -0.107836 1.000000
In [239]:
fig = px.imshow(corr_ap)

fig.update_layout(title='Correlation between Features of Apple Stock')

iplot(fig,show_link=False)
In [240]:
corr_df_am = amazon[['Open', 'Close', 'High', 'Low', 'Adj Close', 'Volume']].copy(deep=True)

retscomp_am = corr_df_am.pct_change()

corr_am = retscomp_am.corr()
corr_am
Out[240]:
Open Close High Low Adj Close Volume
Open 1.000000 0.423578 0.786375 0.747709 0.423578 0.044092
Close 0.423578 1.000000 0.747264 0.757656 1.000000 0.058062
High 0.786375 0.747264 1.000000 0.787892 0.747264 0.236255
Low 0.747709 0.757656 0.787892 1.000000 0.757656 -0.127467
Adj Close 0.423578 1.000000 0.747264 0.757656 1.000000 0.058062
Volume 0.044092 0.058062 0.236255 -0.127467 0.058062 1.000000
In [241]:
fig = px.imshow(corr_am)

fig.update_layout(title='Correlation between Features of Amazon Stock')

iplot(fig,show_link=False)
In [242]:
corr_df_ne = netflix[['Open', 'Close', 'High', 'Low', 'Adj Close', 'Volume']].copy(deep=True)

retscomp_ne = corr_df_ne.pct_change()

corr_ne = retscomp_ne.corr()
corr_ne
Out[242]:
Open Close High Low Adj Close Volume
Open 1.000000 0.425437 0.749779 0.784565 0.425437 0.025012
Close 0.425437 1.000000 0.763005 0.728188 1.000000 0.123859
High 0.749779 0.763005 1.000000 0.774663 0.763005 0.278367
Low 0.784565 0.728188 0.774663 1.000000 0.728188 -0.109167
Adj Close 0.425437 1.000000 0.763005 0.728188 1.000000 0.123859
Volume 0.025012 0.123859 0.278367 -0.109167 0.123859 1.000000
In [243]:
fig = px.imshow(corr_ne)

fig.update_layout(title='Correlation between Features of Netflix Stock')

iplot(fig,show_link=False)
In [244]:
corr_df_go = google[['Open', 'Close', 'High', 'Low', 'Adj Close', 'Volume']].copy(deep=True)

retscomp_go = corr_df_go.pct_change()

corr_go = retscomp_go.corr()
corr_go
Out[244]:
Open Close High Low Adj Close Volume
Open 1.000000 0.384602 0.766185 0.726269 0.384602 0.025639
Close 0.384602 1.000000 0.724499 0.745961 1.000000 0.012011
High 0.766185 0.724499 1.000000 0.800660 0.724499 0.178618
Low 0.726269 0.745961 0.800660 1.000000 0.745961 -0.129019
Adj Close 0.384602 1.000000 0.724499 0.745961 1.000000 0.012011
Volume 0.025639 0.012011 0.178618 -0.129019 0.012011 1.000000
In [245]:
fig = px.imshow(corr_go)

fig.update_layout(title='Correlation between Features of Google Stock')

iplot(fig,show_link=False)

Creating Dataframe of All Companies for All Stock Ticks:

In [246]:
facebook['Company'] = ['Facebook']*len(facebook)
apple['Company'] = ['Apple']*len(apple)
amazon['Company'] = ['Amazon']*len(amazon)
netflix['Company'] = ['Netflix']*len(netflix)
google['Company'] = ['Google']*len(google)

frames = [facebook, apple, amazon, netflix, google]

result = pd.concat(frames)

Modifying Volume by Calculating Mean Volume per Year and Standardizing the Volume:

In [247]:
result['Year'] = np.arange(len(result.index))
result['Date'] = pd.to_datetime(result['Date'])

for x, rows in result.iterrows():
    result.loc[x, 'Year'] = rows['Date'].year

comp = result.groupby(['Company', 'Year'])

vol_df = pd.DataFrame()
vol = []
company = []
year = []

x = 0

for key,val in comp:
    a,b = key
    company.append(a)
    year.append(b)
    vol.append(comp.get_group(key).mean()['Volume'])

vol_df['Company'] = company
vol_df['Year'] = year
vol_df['Volume Mean'] = vol

fig = go.Figure()

avg_vol = vol_df['Volume Mean'].mean()
stand_vol = vol_df['Volume Mean'].std()

vol_df['standard_vol'] = np.arange(len(vol_df.index))
vol_df = vol_df.reset_index(drop=True)

for x, rows in vol_df.iterrows():
    vol_df.loc[x, 'standard_vol'] = (rows['Volume Mean'] - avg_vol)/(stand_vol)

Standarizing Close Price:

In [248]:
avg_close = result.groupby('Date')['Close'].mean()
stand_close = result.groupby('Date')['Close'].std()

stand_close = stand_close.reset_index()
avg_close = avg_close.reset_index()

result['standard_close'] = np.arange(len(result.index))
result = result.reset_index(drop=True)

for x, rows in result.iterrows():
    result.loc[x, 'standard_close'] = (rows['Close'] - avg_close[avg_close['Date'] == rows['Date']]['Close']).values/(stand_close[stand_close['Date'] == rows['Date']]['Close']).values
    
result
Out[248]:
Date Open High Low Close Adj Close Volume Company Year standard_close
0 2013-01-02 27.440001 28.180000 27.420000 28.000000 28.000000 69846400.0 Facebook 2013 -0.663215
1 2013-01-03 27.879999 28.469999 27.590000 27.770000 27.770000 63140600.0 Facebook 2013 -0.665516
2 2013-01-04 28.010000 28.930000 27.830000 28.760000 28.760000 72715400.0 Facebook 2013 -0.659137
3 2013-01-07 28.690001 29.790001 28.650000 29.420000 29.420000 83781800.0 Facebook 2013 -0.661599
4 2013-01-08 29.510000 29.600000 28.860001 29.059999 29.059999 45871300.0 Facebook 2013 -0.661829
... ... ... ... ... ... ... ... ... ... ...
9610 2020-08-31 1643.569946 1644.500000 1625.329956 1629.530029 1629.530029 1321100.0 Google 2020 0.707107
9611 2020-09-01 1632.160034 1659.219971 1629.530029 1655.079956 1655.079956 1133800.0 Google 2020 0.707107
9612 2020-09-02 1668.010010 1726.099976 1660.189941 1717.390015 1717.390015 2476100.0 Google 2020 NaN
9613 2020-09-03 1699.520020 1700.000000 1607.709961 1629.510010 1629.510010 3180200.0 Google 2020 NaN
9614 2020-09-04 1609.000000 1634.989990 1537.970093 1581.209961 1581.209961 2792533.0 Google 2020 NaN

9615 rows × 10 columns

Open, Close, Volume and Moving Averages:

In this section we have made five different graphs that will represent five different attributes about each and every FAANG company. These attributes are : Opening, Closing Prices, Volumes and 14, 21, 100 day moving averages. All of these are very important for investors, they are able to determine whether they should buy or sell a stock based on the values of these attributes. We decided to calculate the different moving averages because a lot of the buyers and sellers base their actions on these averages.

We have added another feature to the graph that shows the standardized volumes in the background of the primary scatterplot. The volumes have been scaled in order to help users see the volumes better. We have also added an option in the dropdown menu where users can choose to see the standardized volume histogram in much more detail.

As you can see the graphs in this section are all interactive and visual. We have made a separate plot for all attributes and a user can select which graph he or she wants to study based on their preference.

Open, Close, Volume and Moving Averages for Facebook:

The graph below depicts the Opening Price, Closing Price, and 14,21 and 100 day moving averages for Facebook. Generally, Stocks with higher volumes have a lower price since there’s more of the same stock. This relation can be seen in the plot, where as volume decreases, price increases. Hence, Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Facebook stocks.

In [249]:
avg_14 = facebook.Close.rolling(window=14, min_periods=1).mean()
avg_21 = facebook.Close.rolling(window=21, min_periods=1).mean()
avg_100 = facebook.Close.rolling(window=100, min_periods=1).mean()
In [250]:
x_fb = facebook['Date']
y_fb = facebook['Open']
z_fb = facebook['Close']

fig = go.Figure()

fig.add_trace(go.Scatter(x=x_fb, y=y_fb, name='Open',
                         line=dict(color='royalblue', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=z_fb, name = 'Close',
                         line=dict(color='firebrick', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_14, name = '14 Day Close Avg',
                         line=dict(color='goldenrod', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_21, name = '21 Day Close Avg',
                         line=dict(color='orangered', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_100, name = '100 Day Close Avg',
                         line=dict(color='mediumorchid', width=1.5)))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Facebook']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Facebook']['Volume Mean']/200000, name='Volume (scaled)', 
                     marker_color='slategray', opacity=0.3))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Facebook']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Facebook']['Volume Mean'], name='Volume', 
                     marker_color='slategray', visible='legendonly'))


fig.update_layout(title='Open/Close prices and Volume for Facebook from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Open/Close/Volume', template="plotly_dark")

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True, True, False]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Open Price',
                  method = 'update',
                  args = [{'visible': [True, False, False, False, False, False, False]},
                          {'title': 'Open Price',
                           'showlegend':True}]),
             dict(label = 'Close Price',
                  method = 'update',
                  args = [{'visible': [False, True, False, False, False, False, False]},
                          {'title': 'Close Price',
                           'showlegend':True}]),
             dict(label = '14 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, True, False, False, False, False]},
                          {'title': '14 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '21 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, True, False, False, False]},
                          {'title': '21 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '100 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, True, False, False]},
                          {'title': '100 Day Moving Average',
                           'showlegend':True}]),
            dict(label = 'Volume (not scaled)',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, False, False, True]},
                          {'title': '100 Day Moving Average',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)

Open, Close, Volume and Moving Averages for Apple:

The graph below depicts the Opening Price, Closing Price, and 14,21 and 100 day moving averages for Apple. Generally, Stocks with higher volumes have a lower price since there’s more of the same stock. This relation can be seen in the plot, where as volume decreases, price increases. Hence, Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Apple stocks.

In [251]:
avg_14 = apple.Close.rolling(window=14, min_periods=1).mean()
avg_21 = apple.Close.rolling(window=21, min_periods=1).mean()
avg_100 = apple.Close.rolling(window=100, min_periods=1).mean()
In [252]:
x_ap = apple['Date']
y_ap = apple['Open']
z_ap = apple['Close']

fig = go.Figure()

fig.add_trace(go.Scatter(x=x_ap, y=y_ap, name='Open',
                         line=dict(color='royalblue', width=1.5)))
fig.add_trace(go.Scatter(x=x_ap, y=z_ap, name = 'Close',
                         line=dict(color='firebrick', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_14, name = '14 Day Close Avg',
                         line=dict(color='goldenrod', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_21, name = '21 Day Close Avg',
                         line=dict(color='orangered', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_100, name = '100 Day Close Avg',
                         line=dict(color='mediumorchid', width=1.5)))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Apple']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Apple']['Volume Mean']/3500000, name='Volume (scaled)', 
                     marker_color='slategray', opacity=0.3))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Apple']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Apple']['Volume Mean'], name='Volume', 
                     marker_color='slategray', visible='legendonly'))


fig.update_layout(title='Open/Close prices and Volume for Apple from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Open/Close/Volume', template="plotly_dark")

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True, True, False]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Open Price',
                  method = 'update',
                  args = [{'visible': [True, False, False, False, False, False, False]},
                          {'title': 'Open Price',
                           'showlegend':True}]),
             dict(label = 'Close Price',
                  method = 'update',
                  args = [{'visible': [False, True, False, False, False, False, False]},
                          {'title': 'Close Price',
                           'showlegend':True}]),
             dict(label = '14 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, True, False, False, False, False]},
                          {'title': '14 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '21 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, True, False, False, False]},
                          {'title': '21 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '100 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, True, False, False]},
                          {'title': '100 Day Moving Average',
                           'showlegend':True}]),
            dict(label = 'Volume (not scaled)',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, False, False, True]},
                          {'title': 'Volume (not scaled)',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)

Open, Close, Volume and Moving Averages for Amazon:

The graph below depicts the Opening Price, Closing Price, and 14,21 and 100 day moving averages for Amazon. Generally, Stocks with higher volumes have a lower price since there’s more of the same stock. This relation can be seen in the plot, where as volume decreases, price increases. Hence, Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Amazon stocks.

In [253]:
avg_14 = amazon.Close.rolling(window=14, min_periods=1).mean()
avg_21 = amazon.Close.rolling(window=21, min_periods=1).mean()
avg_100 = amazon.Close.rolling(window=100, min_periods=1).mean()
In [254]:
x_am = amazon['Date']
y_am = amazon['Open']
z_am = amazon['Close']

fig = go.Figure()

fig.add_trace(go.Scatter(x=x_am, y=y_am, name='Open',
                         line=dict(color='royalblue', width=1.5)))
fig.add_trace(go.Scatter(x=x_am, y=z_am, name = 'Close',
                         line=dict(color='firebrick', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_14, name = '14 Day Close Avg',
                         line=dict(color='goldenrod', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_21, name = '21 Day Close Avg',
                         line=dict(color='orangered', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_100, name = '100 Day Close Avg',
                         line=dict(color='mediumorchid', width=1.5)))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Amazon']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Amazon']['Volume Mean']/2000, name='Volume (scaled)', 
                     marker_color='slategray', opacity=0.3))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Amazon']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Amazon']['Volume Mean'], name='Volume', 
                     marker_color='slategray', visible='legendonly'))


fig.update_layout(title='Open/Close prices and Volume for Amazon from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Open/Close/Volume', template="plotly_dark")

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True, True, False]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Open Price',
                  method = 'update',
                  args = [{'visible': [True, False, False, False, False, False, False]},
                          {'title': 'Open Price',
                           'showlegend':True}]),
             dict(label = 'Close Price',
                  method = 'update',
                  args = [{'visible': [False, True, False, False, False, False, False]},
                          {'title': 'Close Price',
                           'showlegend':True}]),
             dict(label = '14 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, True, False, False, False, False]},
                          {'title': '14 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '21 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, True, False, False, False]},
                          {'title': '21 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '100 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, True, False, False]},
                          {'title': '100 Day Moving Average',
                           'showlegend':True}]),
            dict(label = 'Volume (not scaled)',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, False, False, True]},
                          {'title': 'Volume (not scaled)',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)

Open, Close, Volume and Moving Averages for Netflix:

The graph below depicts the Opening Price, Closing Price, and 14,21 and 100 day moving averages for Netflix. Generally, Stocks with higher volumes have a lower price since there’s more of the same stock. This relation can be seen in the plot, where as volume decreases, price increases. Hence, Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Netflix stocks.

In [255]:
avg_14 = netflix.Close.rolling(window=14, min_periods=1).mean()
avg_21 = netflix.Close.rolling(window=21, min_periods=1).mean()
avg_100 = netflix.Close.rolling(window=100, min_periods=1).mean()
In [256]:
x_ne = netflix['Date']
y_ne = netflix['Open']
z_ne = netflix['Close']

fig = go.Figure()

fig.add_trace(go.Scatter(x=x_ne, y=y_ne, name='Open',
                         line=dict(color='royalblue', width=1.5)))
fig.add_trace(go.Scatter(x=x_ne, y=z_ne, name = 'Close',
                         line=dict(color='firebrick', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_14, name = '14 Day Close Avg',
                         line=dict(color='goldenrod', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_21, name = '21 Day Close Avg',
                         line=dict(color='orangered', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_100, name = '100 Day Close Avg',
                         line=dict(color='mediumorchid', width=1.5)))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Netflix']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Netflix']['Volume Mean']/50000, name='Volume (scaled)', 
                     marker_color='slategray', opacity=0.3))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Netflix']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Netflix']['Volume Mean'], name='Volume', 
                     marker_color='slategray', visible='legendonly'))


fig.update_layout(title='Open/Close prices and Volume for Netflix from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Open/Close/Volume', template="plotly_dark")

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True, True, False]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Open Price',
                  method = 'update',
                  args = [{'visible': [True, False, False, False, False, False, False]},
                          {'title': 'Open Price',
                           'showlegend':True}]),
             dict(label = 'Close Price',
                  method = 'update',
                  args = [{'visible': [False, True, False, False, False, False, False]},
                          {'title': 'Close Price',
                           'showlegend':True}]),
             dict(label = '14 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, True, False, False, False, False]},
                          {'title': '14 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '21 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, True, False, False, False]},
                          {'title': '21 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '100 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, True, False, False]},
                          {'title': '100 Day Moving Average',
                           'showlegend':True}]),
            dict(label = 'Volume (not scaled)',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, False, False, True]},
                          {'title': 'Volume (not scaled)',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)

Open, Close, Volume and Moving Averages for Google:

The graph below depicts the Opening Price, Closing Price, and 14,21 and 100 day moving averages for Google. Generally, Stocks with higher volumes have a lower price since there’s more of the same stock. This relation can be seen in the plot, where as volume decreases, price increases. Hence, Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Google stocks.

In [257]:
avg_14 = google.Close.rolling(window=14, min_periods=1).mean()
avg_21 = google.Close.rolling(window=21, min_periods=1).mean()
avg_100 = google.Close.rolling(window=100, min_periods=1).mean()
In [258]:
x_go = google['Date']
y_go = google['Open']
z_go = google['Close']

fig = go.Figure()

fig.add_trace(go.Scatter(x=x_go, y=y_go, name='Open',
                         line=dict(color='royalblue', width=1.5)))
fig.add_trace(go.Scatter(x=x_go, y=z_go, name = 'Close',
                         line=dict(color='firebrick', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_14, name = '14 Day Close Avg',
                         line=dict(color='goldenrod', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_21, name = '21 Day Close Avg',
                         line=dict(color='orangered', width=1.5)))
fig.add_trace(go.Scatter(x=x_fb, y=avg_100, name = '100 Day Close Avg',
                         line=dict(color='mediumorchid', width=1.5)))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Google']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Google']['Volume Mean']/2000, name='Volume (scaled)', 
                     marker_color='slategray', opacity=0.3))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Google']['Year'], 
                     y=vol_df[vol_df['Company'] == 'Google']['Volume Mean'], name='Volume', 
                     marker_color='slategray', visible='legendonly'))


fig.update_layout(title='Open/Close prices and Volume for Google from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Open/Close/Volume', template="plotly_dark")

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True, True, False]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Open Price',
                  method = 'update',
                  args = [{'visible': [True, False, False, False, False, False, False]},
                          {'title': 'Open Price',
                           'showlegend':True}]),
             dict(label = 'Close Price',
                  method = 'update',
                  args = [{'visible': [False, True, False, False, False, False, False]},
                          {'title': 'Close Price',
                           'showlegend':True}]),
             dict(label = '14 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, True, False, False, False, False]},
                          {'title': '14 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '21 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, True, False, False, False]},
                          {'title': '21 Day Moving Average',
                           'showlegend':True}]),
             dict(label = '100 Day Moving Average',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, True, False, False]},
                          {'title': '100 Day Moving Average',
                           'showlegend':True}]),
            dict(label = 'Volume (not scaled)',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, False, False, True]},
                          {'title': 'Volume (not scaled)',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)

Calculate the Correlation Between Companies and their Stocks:

In [259]:
df_corr = pd.DataFrame()

df_corr['Facebook'] = facebook['Close']
df_corr['Apple'] = apple['Close']
df_corr['Amazon'] = amazon['Close']
df_corr['Netflix'] = netflix['Close']
df_corr['Google'] = google['Close']

retscomp = df_corr.pct_change()

corr = retscomp.corr()
corr
Out[259]:
Facebook Apple Amazon Netflix Google
Facebook 1.000000 0.444546 0.505884 0.345712 0.562611
Apple 0.444546 1.000000 0.431872 0.250707 0.522914
Amazon 0.505884 0.431872 1.000000 0.439284 0.601770
Netflix 0.345712 0.250707 0.439284 1.000000 0.413904
Google 0.562611 0.522914 0.601770 0.413904 1.000000

Box Plots for All Companies for all Price Features:

We plotted the box plots for all the price features (Open, Close, High, low, Adj Close) for all companies to see if they are similar. As we can see from the Box plot, all price features are fairly similar which can give us a good lead into our assumption in the next part (use of close price exclusively for regression and ML).

In [260]:
fig = go.Figure()

fig.add_trace(go.Box(y=facebook.Close, name='Close'))
fig.add_trace(go.Box(y=facebook.Open, name='Open'))
fig.add_trace(go.Box(y=facebook.Low, name='Low'))
fig.add_trace(go.Box(y=facebook.High, name='High'))
fig.add_trace(go.Box(y=facebook['Adj Close'], name='Adj Close'))

fig.add_trace(go.Box(y=apple.Close, name='Close', visible='legendonly'))
fig.add_trace(go.Box(y=apple.Open, name='Open', visible='legendonly'))
fig.add_trace(go.Box(y=apple.Low, name='Low', visible='legendonly'))
fig.add_trace(go.Box(y=apple.High, name='High', visible='legendonly'))
fig.add_trace(go.Box(y=apple['Adj Close'], name='Adj Close', visible='legendonly'))

fig.add_trace(go.Box(y=amazon.Close, name='Close', visible='legendonly'))
fig.add_trace(go.Box(y=amazon.Open, name='Open', visible='legendonly'))
fig.add_trace(go.Box(y=amazon.Low, name='Low', visible='legendonly'))
fig.add_trace(go.Box(y=amazon.High, name='High', visible='legendonly'))
fig.add_trace(go.Box(y=amazon['Adj Close'], name='Adj Close', visible='legendonly'))

fig.add_trace(go.Box(y=netflix.Close, name='Close', visible='legendonly'))
fig.add_trace(go.Box(y=netflix.Open, name='Open', visible='legendonly'))
fig.add_trace(go.Box(y=netflix.Low, name='Low', visible='legendonly'))
fig.add_trace(go.Box(y=netflix.High, name='High', visible='legendonly'))
fig.add_trace(go.Box(y=netflix['Adj Close'], name='Adj Close', visible='legendonly'))

fig.add_trace(go.Box(y=google.Close, name='Close', visible='legendonly'))
fig.add_trace(go.Box(y=google.Open, name='Open', visible='legendonly'))
fig.add_trace(go.Box(y=google.Low, name='Low', visible='legendonly'))
fig.add_trace(go.Box(y=google.High, name='High', visible='legendonly'))
fig.add_trace(go.Box(y=google['Adj Close'], name='Adj Close', visible='legendonly'))

fig.update_layout(title='Close prices for All Companies from Jan 2013 to Aug 2020',
                  yaxis_title='Close Price', template="plotly_dark")

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'Facebook',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True, 
                                       False, False, False, False, False, 
                                       False, False, False, False, False, 
                                       False, False, False, False, False, 
                                       False, False, False, False, False]},
                          {'title': 'Facebook',
                           'showlegend':True}]),
             dict(label = 'Apple',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, False, 
                                       True, True, True, True, True, 
                                       False, False, False, False, False, 
                                       False, False, False, False, False, 
                                       False, False, False, False, False]},
                          {'title': 'Apple',
                           'showlegend':True}]),
             dict(label = 'Amazon',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, False, 
                                       False, False, False, False, False, 
                                       True, True, True, True, True, 
                                       False, False, False, False, False, 
                                       False, False, False, False, False]},
                          {'title': 'Amazon',
                           'showlegend':True}]),
             dict(label = 'Netflix',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, False, 
                                       False, False, False, False, False, 
                                       False, False, False, False, False, 
                                       True, True, True, True, True, 
                                       False, False, False, False, False]},
                          {'title': 'Netflix',
                           'showlegend':True}]),
             dict(label = 'Gooogle',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, False, 
                                       False, False, False, False, False, 
                                       False, False, False, False, False, 
                                       False, False, False, False, False, 
                                       True, True, True, True, True]},
                          {'title': 'Google',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

iplot(fig,show_link=False)

Closing Price Correlation Plot:

The following plot is extremely visual as well as interactive. This matrix like graph has the companies listen on the left border and the bottom border. We have calculated the closing price correlation by using the .pct_change() function provided to us. Hence, we created a new dataframe that was responsible for storing these values which will then be used to make a matrix plot that will then be filled with colors that determine the correlation between two different companies and their stock prices. This correlation helps users to understand how rise and fall in stock prices in companies can also be used to see the rise and fall in prices for other companies in the same sector. We have provided a legend that will allow users to read this graph more efficiently and also understand it better.

ASSUMPTION:

We noticed that the Open, Close, High and Low prices were fairly similar also, when it comes to actually buying and selling stock, traders normally pick based on close price. This is why from this point forward, we've analyzed stocks based on their Close Price.

In [261]:
fig = px.imshow(corr)

fig.update_layout(title='Correlation between All FAANG Stocks Close Price')

iplot(fig,show_link=False)

Graph for Closing prices for FAANG Stocks from 2013 to 2020:

The graph below, as you can see, represents the closing prices for the stocks of FAANG companies. One of the most fascinating and useful feature about this graph is the fact that it is interactive. This graph allows the user to select a particular timeframe in the given range of dates and times and find the exact day, date and time what the closing prices of the concerned stock is. Furthermore, if the user finds it a little hard to understand this graph due to five different scatterplots in one, we have added functionality for the user to select the closing price of only one of these stocks which will make it easier to study the given graph. We have selected the closing price of each stock for every month in the years 2013 - 2020, by choosing if from the dataframe we created. After finding the closing value we went ahead and used different functions provided by plotly in order to make this interactive graph.

In [262]:
fig = go.Figure()

fig.add_trace(go.Scatter(x=facebook.Date, y=facebook.Close, name='FB'))
fig.add_trace(go.Scatter(x=apple.Date, y=apple.Close, name='AAPL'))
fig.add_trace(go.Scatter(x=amazon.Date, y=amazon.Close, name='AMZN'))
fig.add_trace(go.Scatter(x=netflix.Date, y=netflix.Close, name='NFLX'))
fig.add_trace(go.Scatter(x=google.Date, y=google.Close, name='GOOG'))

fig.update_layout(title='Close prices for All Companies from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Close Price', template="plotly_dark")

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Facebook',
                  method = 'update',
                  args = [{'visible': [True, False, False, False, False]},
                          {'title': 'FB',
                           'showlegend':True}]),
             dict(label = 'Apple',
                  method = 'update',
                  args = [{'visible': [False, True, False, False, False]},
                          {'title': 'APPL',
                           'showlegend':True}]),
             dict(label = 'Amazon',
                  method = 'update',
                  args = [{'visible': [False, False, True, False, False]},
                          {'title': 'AMZN',
                           'showlegend':True}]),
             dict(label = 'Netflix',
                  method = 'update',
                  args = [{'visible': [False, False, False, True, False]},
                          {'title': 'NFLX',
                           'showlegend':True}]),
             dict(label = 'Google',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, True]},
                          {'title': 'GOOG',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)

Standardizing Closing Prices for Stocks:

The prices of individual investment securities can vary widely and thus a common reporting practice is to standardize or index these values to a baseline value. Hence, in this section we have standardized the closing prices for the stocks of each FAANG company. Furthermore, we have made an interactive graph that determines the relationship between these stocks by taking into consideration the standardized closing values. The user can hover over the graph and get the values of closing prices for each stock by using the “Compare Data on Hover” function of the graph.

In [263]:
fig = px.line(result, x="Date", y="standard_close", color='Company')

fig.update_layout(title='Standardized Close prices for All Companies from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Standardized Close Price', template="plotly_dark")

iplot(fig,show_link=False)

Standardized Volume Graphs Grouped by Year:

Volume measures the number of shares traded in a stock or contracts traded in futures or options. Volume can be an indicator of market strength, as rising markets on increasing volume are typically viewed as strong and healthy. In this part we have plotted a histogram that represents a comparative study of the volumes of stocks for each company from 2013 - 2020.

In [264]:
result['Year'] = np.arange(len(result.index))
result['Date'] = pd.to_datetime(result['Date'])

for x, rows in result.iterrows():
    result.loc[x, 'Year'] = rows['Date'].year

comp = result.groupby(['Company', 'Year'])

vol_df = pd.DataFrame()
vol = []
company = []
year = []

x = 0

for key,val in comp:
    a,b = key
    company.append(a)
    year.append(b)
    vol.append(comp.get_group(key).mean()['Volume'])

vol_df['Company'] = company
vol_df['Year'] = year
vol_df['Volume Mean'] = vol

fig = go.Figure()

avg_vol = vol_df['Volume Mean'].mean()
stand_vol = vol_df['Volume Mean'].std()

vol_df['standard_vol'] = np.arange(len(vol_df.index))
vol_df = vol_df.reset_index(drop=True)

for x, rows in vol_df.iterrows():
    vol_df.loc[x, 'standard_vol'] = (rows['Volume Mean'] - avg_vol)/(stand_vol)
    
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Facebook']['Year'], y=vol_df[vol_df['Company'] == 'Facebook']['standard_vol'], name='FB'))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Apple']['Year'], y=vol_df[vol_df['Company'] == 'Apple']['standard_vol'], name='AAPL'))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Amazon']['Year'], y=vol_df[vol_df['Company'] == 'Amazon']['standard_vol'], name='AMZN'))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Netflix']['Year'], y=vol_df[vol_df['Company'] == 'Netflix']['standard_vol'], name='NFLX'))
fig.add_trace(go.Bar(x=vol_df[vol_df['Company'] == 'Google']['Year'], y=vol_df[vol_df['Company'] == 'Google']['standard_vol'], name='GOOG'))

fig.update_layout(title='Standardized Volume for All Companies from Jan 2013 to Aug 2020 Grouped by Year',
                   xaxis_title='Date',
                   yaxis_title='Standard Volume', template="plotly_dark")

iplot(fig,show_link=False)

Analysis, Hypothesis Testing and Machine Learning -

Fitted Regression Graphs:

This set of graphs represent the fitted regression models for stock prices of FAANG companies. This is where machine learning comes into use, we have used two sets of data to feed to this algorithm. These are training and testing values. The training values are those values which are responsible for making the machine understand patterns in the data and also improve the efficiency and accuracy of the algorithm. Consequently, the testing data here is used to check how well the algorithm can predict new answers based on its training. We have plotted regression lines to fit this data and find the best fitting method out of Linear, k-NN and Decision Tree Regression.

Fitted Regression Graph for Facebook:

This graph represents all different regression techniques to find the plot that best fits and depicts the relationship between the trained and tested data of stock prices of Facebook. It also shows us a line of best fit in the graph which makes it easier to read.

In [265]:
facebook['timestamp'] = pd.to_datetime(facebook.Date).astype(int) // (10**9)
X = np.array(facebook['timestamp']).reshape(-1,1)
y = np.array(facebook['Close'])
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

model = LinearRegression()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig = go.Figure()

fig.add_trace(go.Scatter(x=X_train.squeeze(), y=y_train, name='Training Data', mode='markers'))
fig.add_trace(go.Scatter(x=X_test.squeeze(), y=y_test, name='Testing Data', mode='markers'))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Linear Regression'))

model = KNeighborsRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='kNN Regressor'))

model = DecisionTreeRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Decision Tree', marker_color='gold'))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Linear Regression',
                  method = 'update',
                  args = [{'visible': [True, True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Regression Line Fit for Facebook from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Close Price', template="plotly_dark")

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)
In [266]:
results = sm.OLS(y,X).fit()
print(results.summary())
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.856
Model:                            OLS   Adj. R-squared (uncentered):              0.856
Method:                 Least Squares   F-statistic:                          1.145e+04
Date:                Mon, 21 Dec 2020   Prob (F-statistic):                        0.00
Time:                        11:24:38   Log-Likelihood:                         -10322.
No. Observations:                1921   AIC:                                  2.065e+04
Df Residuals:                    1920   BIC:                                  2.065e+04
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1          8.612e-08   8.05e-10    107.007      0.000    8.45e-08    8.77e-08
==============================================================================
Omnibus:                      372.431   Durbin-Watson:                   0.003
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               76.141
Skew:                           0.017   Prob(JB):                     2.93e-17
Kurtosis:                       2.025   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Hypothesis Testing:

Hypothesis Testing to Check for Relationship Between time and Close Price for a Facebook:

Hypothesis Test is conducted to see if there is a relationship between time and Close Price for Facebook at a 95% Confidence Interval

Hypothesis -

Null Hypothesis: There is no relationship between time and Close Price for Facebook

Alternative Hypothesis: There is relationship between time and Close Price for Facebook

Decision Rule -

The alpha value here is 0.05.

  • If the p-value is greater than the alpha value, we fail to reject the null hypothesis.
  • If the p-value is smaller than the alpha value, we reject the null hypothesis and accept the alternative hypothesis

Test Statistic -

p - Value = 0.000

Decision -

The p - value we get is 0.000, hence the alpha value is greater than the p - value, so we reject the null hypothesis and accept the alternative hypothesis.

Conclusion -

As the null hypothesis is rejected, we can conclude that there is a relationship between time and Close Price of Facebook.

Fitted Regression Graph for Apple:

This graph represents all different regression techniques to find the plot that best fits and depicts the relationship between the trained and tested data of stock prices of Apple. It also shows us a line of best fit in the graph which makes it easier to read.

In [267]:
apple['timestamp'] = pd.to_datetime(apple.Date).astype(int) // (10**9)
X = np.array(apple['timestamp']).reshape(-1,1)
y = np.array(apple['Close'])
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

model = LinearRegression()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig = go.Figure()

fig.add_trace(go.Scatter(x=X_train.squeeze(), y=y_train, name='Training Data', mode='markers'))
fig.add_trace(go.Scatter(x=X_test.squeeze(), y=y_test, name='Testing Data', mode='markers'))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Linear Regression'))

model = KNeighborsRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='kNN Regressor'))

model = DecisionTreeRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Decision Tree', marker_color='gold'))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Linear Regression',
                  method = 'update',
                  args = [{'visible': [True, True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Regression Line Fit for Apple from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Close Price', template="plotly_dark")

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)
In [268]:
results = sm.OLS(y,X).fit()
print(results.summary())
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.823
Model:                            OLS   Adj. R-squared (uncentered):              0.823
Method:                 Least Squares   F-statistic:                              8980.
Date:                Mon, 21 Dec 2020   Prob (F-statistic):                        0.00
Time:                        11:24:38   Log-Likelihood:                         -8302.5
No. Observations:                1931   AIC:                                  1.661e+04
Df Residuals:                    1930   BIC:                                  1.661e+04
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1          2.599e-08   2.74e-10     94.764      0.000    2.54e-08    2.65e-08
==============================================================================
Omnibus:                      675.510   Durbin-Watson:                   0.002
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             2247.580
Skew:                           1.755   Prob(JB):                         0.00
Kurtosis:                       6.951   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Hypothesis Testing:

Hypothesis Testing to Check for Relationship Between time and Close Price for a Apple:

Hypothesis Test is conducted to see if there is a relationship between time and Close Price for Apple at a 95% Confidence Interval

Hypothesis -

Null Hypothesis: There is no relationship between time and Close Price for Apple

Alternative Hypothesis: There is relationship between time and Close Price for Apple

Decision Rule -

The alpha value here is 0.05.

  • If the p-value is greater than the alpha value, we fail to reject the null hypothesis.
  • If the p-value is smaller than the alpha value, we reject the null hypothesis and accept the alternative hypothesis

Test Statistic -

p - Value = 0.000

Decision -

The p - value we get is 0.000, hence the alpha value is greater than the p - value, so we reject the null hypothesis and accept the alternative hypothesis.

Conclusion -

As the null hypothesis is rejected, we can conclude that there is a relationship between time and Close Price of Apple.

Fitted Regression Graph for Amazon:

This graph represents all different regression techniques to find the plot that best fits and depicts the relationship between the trained and tested data of stock prices of Amazon. It also shows us a line of best fit in the graph which makes it easier to read.

In [269]:
amazon['timestamp'] = pd.to_datetime(amazon.Date).astype(int) // (10**9)
X = np.array(amazon['timestamp']).reshape(-1,1)
y = np.array(amazon['Close'])
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

model = LinearRegression()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig = go.Figure()

fig.add_trace(go.Scatter(x=X_train.squeeze(), y=y_train, name='Training Data', mode='markers'))
fig.add_trace(go.Scatter(x=X_test.squeeze(), y=y_test, name='Testing Data', mode='markers'))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Linear Regression'))

model = KNeighborsRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='kNN Regressor'))

model = DecisionTreeRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Decision Tree', marker_color='gold'))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Linear Regression',
                  method = 'update',
                  args = [{'visible': [True, True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Regression Line Fit for Amazon from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Close Price', template="plotly_dark")

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)
In [270]:
results = sm.OLS(y,X).fit()
print(results.summary())
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.716
Model:                            OLS   Adj. R-squared (uncentered):              0.716
Method:                 Least Squares   F-statistic:                              4845.
Date:                Mon, 21 Dec 2020   Prob (F-statistic):                        0.00
Time:                        11:24:38   Log-Likelihood:                         -15159.
No. Observations:                1919   AIC:                                  3.032e+04
Df Residuals:                    1918   BIC:                                  3.033e+04
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1           7.01e-07   1.01e-08     69.603      0.000    6.81e-07    7.21e-07
==============================================================================
Omnibus:                      180.785   Durbin-Watson:                   0.001
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              234.645
Skew:                           0.855   Prob(JB):                     1.12e-51
Kurtosis:                       2.908   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Hypothesis Testing:

Hypothesis Testing to Check for Relationship Between time and Close Price for a Amazon:

Hypothesis Test is conducted to see if there is a relationship between time and Close Price for Amazon at a 95% Confidence Interval

Hypothesis -

Null Hypothesis: There is no relationship between time and Close Price for Amazon

Alternative Hypothesis: There is relationship between time and Close Price for Amazon

Decision Rule -

The alpha value here is 0.05.

  • If the p-value is greater than the alpha value, we fail to reject the null hypothesis.
  • If the p-value is smaller than the alpha value, we reject the null hypothesis and accept the alternative hypothesis

Test Statistic -

p - Value = 0.000

Decision -

The p - value we get is 0.000, hence the alpha value is greater than the p - value, so we reject the null hypothesis and accept the alternative hypothesis.

Conclusion -

As the null hypothesis is rejected, we can conclude that there is a relationship between time and Close Price of Amazon.

Fitted Regression Graph for Netflix:

This graph represents all different regression techniques to find the plot that best fits and depicts the relationship between the trained and tested data of stock prices of Netflix. It also shows us a line of best fit in the graph which makes it easier to read.

In [271]:
netflix['timestamp'] = pd.to_datetime(netflix.Date).astype(int) // (10**9)
X = np.array(netflix['timestamp']).reshape(-1,1)
y = np.array(netflix['Close'])
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

model = LinearRegression()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig = go.Figure()

fig.add_trace(go.Scatter(x=X_train.squeeze(), y=y_train, name='Training Data', mode='markers'))
fig.add_trace(go.Scatter(x=X_test.squeeze(), y=y_test, name='Testing Data', mode='markers'))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Linear Regression'))

model = KNeighborsRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='kNN Regressor'))

model = DecisionTreeRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Decision Tree', marker_color='gold'))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Linear Regression',
                  method = 'update',
                  args = [{'visible': [True, True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Regression Line Fit for Netflix from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Close Price', template="plotly_dark")

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)
In [272]:
results = sm.OLS(y,X).fit()
print(results.summary())
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.688
Model:                            OLS   Adj. R-squared (uncentered):              0.688
Method:                 Least Squares   F-statistic:                              4218.
Date:                Mon, 21 Dec 2020   Prob (F-statistic):                        0.00
Time:                        11:24:38   Log-Likelihood:                         -11892.
No. Observations:                1910   AIC:                                  2.379e+04
Df Residuals:                    1909   BIC:                                  2.379e+04
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1          1.231e-07   1.89e-09     64.949      0.000    1.19e-07    1.27e-07
==============================================================================
Omnibus:                      396.692   Durbin-Watson:                   0.002
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              215.406
Skew:                           0.684   Prob(JB):                     1.68e-47
Kurtosis:                       2.087   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Hypothesis Testing:

Hypothesis Testing to Check for Relationship Between time and Close Price for a Netflix:

Hypothesis Test is conducted to see if there is a relationship between time and Close Price for Netflix at a 95% Confidence Interval

Hypothesis -

Null Hypothesis: There is no relationship between time and Close Price for Netflix

Alternative Hypothesis: There is relationship between time and Close Price for Netflix

Decision Rule -

The alpha value here is 0.05.

  • If the p-value is greater than the alpha value, we fail to reject the null hypothesis.
  • If the p-value is smaller than the alpha value, we reject the null hypothesis and accept the alternative hypothesis

Test Statistic -

p - Value = 0.000

Decision -

The p - value we get is 0.000, hence the alpha value is greater than the p - value, so we reject the null hypothesis and accept the alternative hypothesis.

Conclusion -

As the null hypothesis is rejected, we can conclude that there is a relationship between time and Close Price of Netflix.

Fitted Regression Graph for Google:

This graph represents all different regression techniques to find the plot that best fits and depicts the relationship between the trained and tested data of stock prices of Google. It also shows us a line of best fit in the graph which makes it easier to read.

In [273]:
google['timestamp'] = pd.to_datetime(google.Date).astype(int) // (10**9)
X = np.array(google['timestamp']).reshape(-1,1)
y = np.array(google['Close'])
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

model = LinearRegression()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig = go.Figure()

fig.add_trace(go.Scatter(x=X_train.squeeze(), y=y_train, name='Training Data', mode='markers'))
fig.add_trace(go.Scatter(x=X_test.squeeze(), y=y_test, name='Testing Data', mode='markers'))
fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Linear Regression'))

model = KNeighborsRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='kNN Regressor'))

model = DecisionTreeRegressor()
model.fit(X_train, y_train)

x_range = np.linspace(X.min(), X.max(), 100)
y_range = model.predict(x_range.reshape(-1, 1))

fig.add_trace(go.Scatter(x=x_range, y=y_range, name='Decision Tree', marker_color='gold'))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Linear Regression',
                  method = 'update',
                  args = [{'visible': [True, True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Regressor',
                  method = 'update',
                  args = [{'visible': [True, True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Regression Line Fit for Google from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Close Price', template="plotly_dark")

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)
In [274]:
results = sm.OLS(y,X).fit()
print(results.summary())
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.910
Model:                            OLS   Adj. R-squared (uncentered):              0.910
Method:                 Least Squares   F-statistic:                          1.965e+04
Date:                Mon, 21 Dec 2020   Prob (F-statistic):                        0.00
Time:                        11:24:38   Log-Likelihood:                         -13599.
No. Observations:                1934   AIC:                                  2.720e+04
Df Residuals:                    1933   BIC:                                  2.721e+04
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1          5.899e-07   4.21e-09    140.167      0.000    5.82e-07    5.98e-07
==============================================================================
Omnibus:                      284.325   Durbin-Watson:                   0.003
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              108.466
Skew:                           0.374   Prob(JB):                     2.80e-24
Kurtosis:                       2.114   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Hypothesis Testing:

Hypothesis Testing to Check for Relationship Between time and Close Price for a Google:

Hypothesis Test is conducted to see if there is a relationship between time and Close Price for Google at a 95% Confidence Interval

Hypothesis -

Null Hypothesis: There is no relationship between time and Close Price for Google Alternative Hypothesis: There is relationship between time and Close Price for Google

Decision Rule -

The alpha value here is 0.05.

  • If the p-value is greater than the alpha value, we fail to reject the null hypothesis.
  • If the p-value is smaller than the alpha value, we reject the null hypothesis and accept the alternative hypothesis

Test Statistic -

p - Value = 0.000

Decision -

The p - value we get is 0.000, hence the alpha value is greater than the p - value, so we reject the null hypothesis and accept the alternative hypothesis.

Conclusion -

As the null hypothesis is rejected, we can conclude that there is a relationship between time and Close Price of Google.

Predicting Values for FAANG stocks:

In this section we will use Tree predictors, Linear predictors and k-NN predictors in order to predict the different values of any possible or new data points. Here, we will be predicting the new data points for closing values. This graph is an interactive plot and also has different “modes”, the user can select a different technique for predicting values by using the drop down menu right above the graph. Each mode will give a new graph that shows the actual value and predicted value of the closing price for FAANG stocks.

Predicting Closing Values for Facebook:

The graph below depicts the actual and predicted closing prices for Facebook. Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Facebook stocks.

In [275]:
df = facebook[['Close']].copy(deep=True)

future_days = 500
df['Prediction'] = df[['Close']].shift(-future_days)

X = np.array(df.drop(['Prediction'], 1))[:-future_days]
y = np.array(df['Prediction'])[:-future_days]

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

tree = DecisionTreeRegressor().fit(x_train, y_train)
lr = LinearRegression().fit(x_train, y_train)
knn = KNeighborsRegressor().fit(x_train, y_train)

x_future = df.drop(['Prediction'], 1)[:-future_days]
x_future = x_future.tail(future_days) 
x_future = np.array(x_future)

tree_prediction = tree.predict(x_future)
lr_prediction = lr.predict(x_future)
knn_prediction = knn.predict(x_future)

predictions = tree_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig = go.Figure()

fig.add_trace(go.Scatter(x=df.index.values, y=df['Close'], name='Actual Close',
                         line=dict(width=1.5)))
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close D-Tree', 
                         line=dict(width=1.5)))

predictions = lr_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close Lin Reg', 
                         line=dict(width=1.5)))

predictions = knn_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close k-NN',
                         marker_color='gold',
                         line=dict(width=1.5)))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Prediction',
                  method = 'update',
                  args = [{'visible': [True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'Linear Regression Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Predicted Values for Facebook For the last 500 Days',
                   xaxis_title='Date',
                   yaxis_title='Close Price', template="plotly_dark")

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)

Accuracy Score for the Different Models - Facebook:

Here we calculate the accuracy scores for the three different models, k-NN Regressor, Decision Tree Regressor, and Linear Regressor. k-NN Regressor model has an accuracy score of 0.8749537040637907, Decision Tree Regressor has an accuracy score of 0.7799458755648638, and Linear Regressor has an accuracy score of 0.7997626275855734. Looking at the accuracy scores, it is evident that all models have a very high accuracy, however, k-NN Regressor model is the most accurate prediciting model.

In [276]:
confidencetree = tree.score(x_test, y_test)
confidencelr = lr.score(x_test, y_test)
confidenceknn = knn.score(x_test, y_test)
print("k-NN Regressor Accuracy Score: " + str(confidenceknn))
print("Decision Tree Regressor Accuracy Score: " + str(confidencetree))
print("Linear Regressor Accuracy Score: " + str(confidencelr))
k-NN Regressor Accuracy Score: 0.8977785371429233
Decision Tree Regressor Accuracy Score: 0.8148490287980684
Linear Regressor Accuracy Score: 0.8142528762923802

Predicting Closing Values for Apple:

The graph below depicts the actual and predicted closing prices for Apple. Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Apple stocks.

In [277]:
df = apple[['Close']].copy(deep=True)

future_days = 500
df['Prediction'] = df[['Close']].shift(-future_days)

X = np.array(df.drop(['Prediction'], 1))[:-future_days]
y = np.array(df['Prediction'])[:-future_days]

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

tree = DecisionTreeRegressor().fit(x_train, y_train)
lr = LinearRegression().fit(x_train, y_train)
knn = KNeighborsRegressor().fit(x_train, y_train)

x_future = df.drop(['Prediction'], 1)[:-future_days]
x_future = x_future.tail(future_days) 
x_future = np.array(x_future)

tree_prediction = tree.predict(x_future)
lr_prediction = lr.predict(x_future)
knn_prediction = knn.predict(x_future)

predictions = tree_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig = go.Figure()

fig.add_trace(go.Scatter(x=df.index.values, y=df['Close'], name='Actual Close',
                         line=dict(width=1.5)))
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close D-Tree',
                         line=dict(width=1.5)))

predictions = lr_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close Lin Reg',
                         line=dict(width=1.5)))

predictions = knn_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close k-NN',
                         marker_color='gold',
                         line=dict(width=1.5)))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Prediction',
                  method = 'update',
                  args = [{'visible': [True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'Linear Regression Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Predicted Values for Apple For the last 500 Days',
                   xaxis_title='Date',
                   yaxis_title='Close Price', template="plotly_dark")

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)

Accuracy Score for the Different Models - Apple:

Here we calculate the accuracy scores for the three different models, k-NN Regressor, Decision Tree Regressor, and Linear Regressor. k-NN Regressor model has an accuracy score of 0.8810877897308647, Decision Tree Regressor has an accuracy score of 0.8153372738432569, and Linear Regressor has an accuracy score of 0.7450194573323083. Looking at the accuracy scores, it is evident that all models have a very high accuracy, however, k-NN Regressor model is the most accurate prediciting model.

In [278]:
confidencetree = tree.score(x_test, y_test)
confidencelr = lr.score(x_test, y_test)
confidenceknn = knn.score(x_test, y_test)
print("k-NN Regressor Accuracy Score: " + str(confidenceknn))
print("Decision Tree Regressor Accuracy Score: " + str(confidencetree))
print("Linear Regressor Accuracy Score: " + str(confidencelr))
k-NN Regressor Accuracy Score: 0.8649645803478626
Decision Tree Regressor Accuracy Score: 0.7156280801771819
Linear Regressor Accuracy Score: 0.688962251630823

Predicting Closing Values for Amazon:

The graph below depicts the actual and predicted closing prices for Amazon. Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Amazon stocks.

In [279]:
df = amazon[['Close']].copy(deep=True)

future_days = 500
df['Prediction'] = df[['Close']].shift(-future_days)

X = np.array(df.drop(['Prediction'], 1))[:-future_days]
y = np.array(df['Prediction'])[:-future_days]

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

tree = DecisionTreeRegressor().fit(x_train, y_train)
lr = LinearRegression().fit(x_train, y_train)
knn = KNeighborsRegressor().fit(x_train, y_train)

x_future = df.drop(['Prediction'], 1)[:-future_days]
x_future = x_future.tail(future_days) 
x_future = np.array(x_future)

tree_prediction = tree.predict(x_future)
lr_prediction = lr.predict(x_future)
knn_prediction = knn.predict(x_future)

predictions = tree_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig = go.Figure()

fig.add_trace(go.Scatter(x=df.index.values, y=df['Close'], name='Actual Close',
                         line=dict(width=1.5)))
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close D-Tree',
                         line=dict(width=1.5)))

predictions = lr_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close Lin Reg',
                         line=dict(width=1.5)))

predictions = knn_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close k-NN',
                         marker_color='gold',
                         line=dict(width=1.5)))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Prediction',
                  method = 'update',
                  args = [{'visible': [True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'Linear Regression Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Predicted Values for Amazon For the last 500 Days',
                   xaxis_title='Date',
                   yaxis_title='Close Price', template="plotly_dark")

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)

Accuracy Score for the Different Models - Amazon:

Here we calculate the accuracy scores for the three different models, k-NN Regressor, Decision Tree Regressor, and Linear Regressor. k-NN Regressor model has an accuracy score of 0.9481650428008439, Decision Tree Regressor has an accuracy score of 0.9183419691660628, and Linear Regressor has an accuracy score of 0.8465830403947987. Looking at the accuracy scores, it is evident that all models have a very high accuracy, however, k-NN Regressor model is the most accurate prediciting model.

In [280]:
confidencetree = tree.score(x_test, y_test)
confidencelr = lr.score(x_test, y_test)
confidenceknn = knn.score(x_test, y_test)
print("k-NN Regressor Accuracy Score: " + str(confidenceknn))
print("Decision Tree Regressor Accuracy Score: " + str(confidencetree))
print("Linear Regressor Accuracy Score: " + str(confidencelr))
k-NN Regressor Accuracy Score: 0.9571384027386828
Decision Tree Regressor Accuracy Score: 0.9221415994532336
Linear Regressor Accuracy Score: 0.8570893018037875

Predicting Closing Values for Netflix:

The graph below depicts the actual and predicted closing prices for Netflix. Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Netflix stocks.

In [281]:
df = netflix[['Close']].copy(deep=True)

future_days = 500
df['Prediction'] = df[['Close']].shift(-future_days)

X = np.array(df.drop(['Prediction'], 1))[:-future_days]
y = np.array(df['Prediction'])[:-future_days]

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

tree = DecisionTreeRegressor().fit(x_train, y_train)
lr = LinearRegression().fit(x_train, y_train)
knn = KNeighborsRegressor().fit(x_train, y_train)

x_future = df.drop(['Prediction'], 1)[:-future_days]
x_future = x_future.tail(future_days) 
x_future = np.array(x_future)

tree_prediction = tree.predict(x_future)
lr_prediction = lr.predict(x_future)
knn_prediction = knn.predict(x_future)

predictions = tree_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig = go.Figure()

fig.add_trace(go.Scatter(x=df.index.values, y=df['Close'], name='Actual Close',
                         line=dict(width=1.5)))
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close D-Tree',
                         line=dict(width=1.5)))

predictions = lr_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close Lin Reg',
                         line=dict(width=1.5)))

predictions = knn_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close k-NN',
                         marker_color='gold',
                         line=dict(width=1.5)))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Prediction',
                  method = 'update',
                  args = [{'visible': [True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'Linear Regression Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Predicted Values for Netflix For the last 500 Days',
                   xaxis_title='Date',
                   yaxis_title='Close Price', template="plotly_dark")

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)

Accuracy Score for the Different Models - Netflix:

Here we calculate the accuracy scores for the three different models, k-NN Regressor, Decision Tree Regressor, and Linear Regressor. k-NN Regressor model has an accuracy score of 0.8285444089046352, Decision Tree Regressor has an accuracy score of 0.7158090364057252, and Linear Regressor has an accuracy score of 0.6464066460021082. Looking at the accuracy scores, it is evident that all models have a very high accuracy, however, k-NN Regressor model is the most accurate prediciting model.

In [282]:
confidencetree = tree.score(x_test, y_test)
confidencelr = lr.score(x_test, y_test)
confidenceknn = knn.score(x_test, y_test)
print("k-NN Regressor Accuracy Score: " + str(confidenceknn))
print("Decision Tree Regressor Accuracy Score: " + str(confidencetree))
print("Linear Regressor Accuracy Score: " + str(confidencelr))
k-NN Regressor Accuracy Score: 0.8284935600290315
Decision Tree Regressor Accuracy Score: 0.7416468542393515
Linear Regressor Accuracy Score: 0.6582827843255206

Predicting Closing Values for Google:

The graph below depicts the actual and predicted closing prices for Google. Investors and users can study this and make decisions that could help benefit them or even get information about the past and present of the Google stocks.

In [283]:
df = google[['Close']].copy(deep=True)

future_days = 500
df['Prediction'] = df[['Close']].shift(-future_days)

X = np.array(df.drop(['Prediction'], 1))[:-future_days]
y = np.array(df['Prediction'])[:-future_days]

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

tree = DecisionTreeRegressor().fit(x_train, y_train)
lr = LinearRegression().fit(x_train, y_train)
knn = KNeighborsRegressor().fit(x_train, y_train)

x_future = df.drop(['Prediction'], 1)[:-future_days]
x_future = x_future.tail(future_days) 
x_future = np.array(x_future)

tree_prediction = tree.predict(x_future)
lr_prediction = lr.predict(x_future)
knn_prediction = knn.predict(x_future)

predictions = tree_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig = go.Figure()

fig.add_trace(go.Scatter(x=df.index.values, y=df['Close'], name='Actual Close',
                         line=dict(width=1.5)))
fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close D-Tree',
                         line=dict(width=1.5)))


predictions = lr_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close Lin Reg',
                         line=dict(width=1.5)))

predictions = knn_prediction
valid =  df[X.shape[0]:].copy(deep=True)
valid['Predictions'] = predictions
valid[['Close','Predictions']]

fig.add_trace(go.Scatter(x=valid.index.values, y=valid['Predictions'], name='Predicted Close k-NN',
                         marker_color='gold',
                         line=dict(width=1.5)))

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(label = 'All',
                  method = 'update',
                  args = [{'visible': [True, True, True, True]},
                          {'title': 'All',
                           'showlegend':True}]),
             dict(label = 'Decision Tree Prediction',
                  method = 'update',
                  args = [{'visible': [True, True, False, False]},
                          {'title': 'Linear Regression',
                           'showlegend':True}]),
             dict(label = 'Linear Regression Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, True, False]},
                          {'title': 'k-NN Regressor',
                           'showlegend':True}]),
             dict(label = 'k-NN Regressor Prediction',
                  method = 'update',
                  args = [{'visible': [True, False, False, True]},
                          {'title': 'Decision Tree Regressor',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(title='Predicted Values for Google For the last 500 Days',
                   xaxis_title='Date',
                   yaxis_title='Close Price', template="plotly_dark")

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)

Accuracy Score for the Different Models - Google:

Here we calculate the accuracy scores for the three different models, k-NN Regressor, Decision Tree Regressor, and Linear Regressor. k-NN Regressor model has an accuracy score of 0.9285098507660844, Decision Tree Regressor has an accuracy score of 0.8827985730839002, and Linear Regressor has an accuracy score of 0.8891613969513601. Looking at the accuracy scores, it is evident that all models have a very high accuracy, however, k-NN Regressor model is the most accurate prediciting model.

In [284]:
confidencetree = tree.score(x_test, y_test)
confidencelr = lr.score(x_test, y_test)
confidenceknn = knn.score(x_test, y_test)
print("k-NN Regressor Accuracy Score: " + str(confidenceknn))
print("Decision Tree Regressor Accuracy Score: " + str(confidencetree))
print("Linear Regressor Accuracy Score: " + str(confidencelr))
k-NN Regressor Accuracy Score: 0.9427052268519652
Decision Tree Regressor Accuracy Score: 0.9010518449774486
Linear Regressor Accuracy Score: 0.8923695525039155

Inference from Predictive Graphs:

The inference we can draw from this graph is that the k-NN Regressor model gives us one of the most accurate predictions looking at the accuracy scores. Although, there are a couple outliers in the graphs that could possibly change the predictions. However, these outliers happen to be very insignificant and can always change based on the trading model that companies decide to adopt.

Insights and Policy Decision -

Here we will outline what we saw and learned from the analysis of all the plots:

1) The general trend of all stocks' price features were increasing over time. This is in line with the reason FAANG stocks are popular. They are considered "Blue Chip" stocks since they're meant to be held for a long time due to their upward trend. The FAANG stocks are often grouped together since they relate to the larger technology companies in Silicon Valley based on their stock prices. We saw from our Close Prices plot that they have an increase over time which shows us why these stocks are "Blue Chip".

2) It's evident from the individual company price and volume plots that the stocks with higher volume have lower price and the ones with lower volume have higher prices. This is a trend that can also be put verbally since if a company has a larger volume of stocks available they can price them at a lower rate, and the opposite for companies with lower volumes. For example, Apple has the highest volumes overall among all companies for all years and their stock price is the lowest at a median price of USD37.78 per stock. On the other hand, the lowest volume available was for Amazon, which had a median stock price of USD765.7 per stock. This is sort of shocking since this way people can afford Apple stock more easily than Amazon stock.

3) Based on our correlation plots and the standardized close price plot, we noticed a strong correlation between the stock prices of Amazon and Google. The correlation plot gives us a value of 0.6017701 which was the highest value of correlation among the others. Also we can see on our standardized close plot that their prices were almost symmetrical along the y-axis. When Google's price when up, Amazon went down and vice-versa.

4) We noticed from our regression fit plots that the Decision Tree Regressor was a better fit, however later realized that k-Nearest Neighbors (k-NN) regressor gives us a better fit when analyzing the graphs and analyzing the r-squared scores and regression summaries of the OLS model, which showed us that there was a relationship between time and close price of the stocks. This was expected since k-NN Regressor and Classifier is commonly used as a regression algorithm to predict stocks in the real world. However, k-NN is often combined with better strategies and the market is narrowed before the stock prices are predicted. We also concluded in our Hypothesis Testing that the relationship between time and close price for these stocks exist. This was then analyzed using our regressions and we found that our regression analysis was valid.

5) Finally we used some predictive analysis from our learnings of the regression fits which helped us predict the prices of the stocks during their last 500 days in the time period we chose which was from Jan 2013 to Aug 2020. The reason we chose this time period was because Facebook hadn't had an IPO (initial price offering) until 2012. Which is why to keep our data and analysis clean, we used the time period between 2013 and 2020 to analyze our model. We saw in our predictions' plots that we were able to (at least to some extent) predict the future prices based on our model. The Linear Regression pdeictions were terrible and would not work in the real world, however they did give us a general increasing trend which was expected. Based on a simple visual analysis, we assumed that Decision Tree Regressor gave us a better prediction, which would not be in line with real world stock prediction. However, when analyzing the accuracy scores of our predictions, we saw that the k-NN regressor gave us the best predictions. This was in fact in line with real world usage as we said in our previous point. k-NN is often used to predict stock prices in the real world and several stock price analyses use k-NN to predict the movement of these stocks. We should however, add that k-NN on its own cannot be used as a "perfect" stock price predictor and must be coupled with better strategies and narrower markets to make accurate predictions.

Future Scope: We will examine the relationship between Google and Amazon to understand the reason behind their strong correlation. To do this, we will need to dig deeper into the working of these companies in real life. The research can be extended with more efficient algorithms to identify trends and patterns which can be used for beter predictive modelling. We can also use better strategies to predict the future of these stock prices, which may make their predictions more accurate. We can use more factors to analyze hidden trends and narrow the market to improve our predictability.

Just For Fun -

Just to see some better "stock" plots, we decided to plot the candlestick plots for all companies stocks, this is similar to the plot one would see on the trading terminal and when anyone googles "stocks":

In [285]:
fig = go.Figure()

fig.add_trace(go.Candlestick(x=facebook['Date'],
                open=facebook['Open'],
                high=facebook['High'],
                low=facebook['Low'],
                close=facebook['Close'], name='FB'))

fig.add_trace(go.Candlestick(x=apple['Date'],
                open=apple['Open'],
                high=apple['High'],
                low=apple['Low'],
                close=apple['Close'], name='AAPL', visible='legendonly'))

fig.add_trace(go.Candlestick(x=amazon['Date'],
                open=amazon['Open'],
                high=amazon['High'],
                low=amazon['Low'],
                close=amazon['Close'], name='AMZN', visible='legendonly'))

fig.add_trace(go.Candlestick(x=netflix['Date'],
                open=netflix['Open'],
                high=netflix['High'],
                low=netflix['Low'],
                close=netflix['Close'], name='NFLX', visible='legendonly'))

fig.add_trace(go.Candlestick(x=google['Date'],
                open=google['Open'],
                high=google['High'],
                low=google['Low'],
                close=google['Close'], name='GOOG', visible='legendonly'))

fig.update_layout(title='Candlestick Plots for All Companies from Jan 2013 to Aug 2020',
                   xaxis_title='Date',
                   yaxis_title='Ticks', template="plotly_dark")

fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
             dict(label = 'Facebook',
                  method = 'update',
                  args = [{'visible': [True, False, False, False, False]},
                          {'title': 'FB',
                           'showlegend':True}]),
             dict(label = 'Apple',
                  method = 'update',
                  args = [{'visible': [False, True, False, False, False]},
                          {'title': 'APPL',
                           'showlegend':True}]),
             dict(label = 'Amazon',
                  method = 'update',
                  args = [{'visible': [False, False, True, False, False]},
                          {'title': 'AMZN',
                           'showlegend':True}]),
             dict(label = 'Netflix',
                  method = 'update',
                  args = [{'visible': [False, False, False, True, False]},
                          {'title': 'NFLX',
                           'showlegend':True}]),
             dict(label = 'Google',
                  method = 'update',
                  args = [{'visible': [False, False, False, False, True]},
                          {'title': 'GOOG',
                           'showlegend':True}]),
            ]), 
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

fig.update_layout(xaxis_rangeslider_visible=False)

fig.update_layout(
    autosize=False,
    width=1000,
    height=650,)

iplot(fig,show_link=False)