By Ary Bandana, DIGITS Staff Writer
Today were going to dip our toes a bit in data science for finance to get you all interested in it. This will not be a comprehensive beginner tutorial which we will create in a couple of weeks, but it will paint a picture on how advance data science can be. Specifically, we are going to talk about data science using python by doing a simple project which is creating a Monte Carlo Simulation and how to apply them to generate randomized future prices.
What is a Monte Carlo Simulation?
A Monte Carlo simulation is a useful tool for predicting future results by calculating a formula multiple times with different random inputs. Monte Carlo simulations are used to model the probability of different outcomes in a process that cannot easily be predicted due to the intervention of random variables. It is a technique used to understand the impact of risk and uncertainty in prediction and forecasting models.
What do we need?
You will need Anaconda, which is a package manager, an environment manager, a Python/R data science distribution, and a collection of over 1,500+ open source packages. Anaconda is free and easy to install, and it offers free community support.
Getting the Data
We are going to use some modules and library in this project which is:
- Pandas: it takes data (like a CSV or TSV file, or a SQL database) and creates a Python object with rows and columns called data frame
- pandas_datareader.data: extract data from various Internet sources into a pandas DataFrame.
- numpy: Numpy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays.
- datetime: A date in Python is not a data type of its own, but we can import a module named datetime to work with dates as date objects.
- matplotlib: Matplotlib is a Python 2D plotting library. And in matplotlib we are going to use the matplotlib.pyplot module, which provides a plotting system similar to that of MATLAB. And style defined styles provided by matplotlib. For example, there’s a pre-defined style called “ggplot”, which emulates the aesthetics of ggplot
Most of them are included in Anaconda. In case it’s not included in your Python distribution, just simply use pip or conda install (pip install modulename / conda install modulename). Once installed, to use pandas, all one needs to do is import it.
import pandas_datareader.data as web import pandas as pd import datetime as dt import numpy as np import matplotlib.pyplot as plt from matplotlib import style style.use('ggplot')
You probably want to know what as means in the code, it assign a name for the module or library when you want to use it again in your code.
Now we are going to define our start and end date for the time series data we are going to use
start = dt.datetime(2018, 1, 1) end = dt.datetime(2019, 1, 1)
Now we are going to define our price
prices = web.DataReader('GOOGL', 'yahoo', start, end)['Close']
because were going to use data from Internet sources we use the pandas_datareader.data function that we already names web. After that we have to define the name for the stock we are going to use which for this ecample we are gonna use google’s stock price, the source of the data for example we use yahoo finance api, and the start and end date that we have already defined. And the last one [‘Close’]is a key that tells it that we only use the close collum because we use the closing price.
Now we are going to calculate and defin the daily returns
returns = prices.pct_change()
We do that by using the pct_change function from the prices data frame from the previous code. What pct_change do is it computes the percentage change from the immediately previous row by default. This is useful in comparing the percentage of change in a time series of elements.
Now we are going to defin the last price for our calculations
last_price = prices[-1]
Now we are going to work on the simulations so we need to define some variables
num_simulations = 1000 num_days = 252 simulation_df = pd.DataFrame()
What we defined was how many simulations we are going to do and how many days in a future we are going to go, for example we use 253 because the typical trading days in a year is 253 days. And we have to create a data frame for all the simulations that we will call simulation_df
for x in range(num_simulations): count = 0 daily_vol = returns.std() price_series =  price = last_price * (1 + np.random.normal(0, daily_vol)) price_series.append(price)
We are going to use a nested for loop here. The first thing we need to do is create a count and the reason why we need to keep a count is we need to break the loop when the total number of days is accounted for.
After that what we need to do is extracting what we need mathematicly for the monte carlo simulation which is the daily volatility which we will use the .std() function on the returns from the previous code
After that we need to create a list to append all the prices for the given year that we are calculating so we make a variable named price_series = with an empty list value which is 
Now we need to get the list going with one initial start value So we create the code
price = last_price * (1 + np.random.normal(0, daily_vol))
So for the first price it calculates it by multiplying the last price from the previous calculation by 1 plus a random number with a parameter of daily volatility. So this is going to function as a random shock to generate random prices, which will generate our future stock prices randomly.
After that, we are going append the price to the empty list we created before
for y in range(num_days): if count == 252: break price = price_series[count] * (1 + np.random.normal(0, daily_vol)) price_series.append(price) count += 1
Now because this is a nested for loop were going go to next for loop. Now were going to make a break function after we reach the number of days which 253 now because python indexing starts at 0 so we have to put the value of 252. After that were gonna make another calculation
Essentially we are taking that last price we appended to the list from there were not multiplying by the original price we are multiplying by the last price calculated in the simulation so it’s going to do this in a loop essentially gonna help us all of our future stock prices in this case so.
simulation_df[x] = price_series
The last thing we need to do is to make sure our data frame has a trial.
Now our simulation is done we are going to graph it
fig = plt.figure() fig.suptitle('Monte Carlo Simulation: GOOGLE') plt.plot(simulation_df) plt.axhline(y = last_price, color = 'r', linestyle = '-') plt.xlabel('Day') plt.ylabel('Price') plt.show()
Can we just use Excel?
This is a process you can execute in Excel but it is not simple to do without some VBA or potentially expensive third party plugins. Using numpy and pandas to build a model and generate multiple potential results and analyze them is relatively straightforward. The other added benefit is that analysts can run many scenarios by changing the inputs and can move on to much more sophisticated models in the future if the needs arise. Finally, the results can be shared with non-technical users and facilitate discussions around the uncertainty of the final results.
# -*- coding: utf-8 -*- """ Created on Sun Jun 23 01:49:55 2019 @author: Ary """ import pandas_datareader.data as web import pandas as pd import datetime as dt import numpy as np import matplotlib.pyplot as plt from matplotlib import style style.use('ggplot') start = dt.datetime(2018, 1, 1) end = dt.datetime(2019, 1, 1) prices = web.DataReader('GOOGL', 'yahoo', start, end)['Close'] returns = prices.pct_change() last_price = prices[-1] #Number of Simulations num_simulations = 1000 num_days = 252 simulation_df = pd.DataFrame() for x in range(num_simulations): count = 0 daily_vol = returns.std() price_series =  price = last_price * (1 + np.random.normal(0, daily_vol)) price_series.append(price) for y in range(num_days): if count == 251: break price = price_series[count] * (1 + np.random.normal(0, daily_vol)) price_series.append(price) count += 1 simulation_df[x] = price_series fig = plt.figure() fig.suptitle('Monte Carlo Simulation: GOOGLE') plt.plot(simulation_df) plt.axhline(y = last_price, color = 'r', linestyle = '-') plt.xlabel('Day') plt.ylabel('Price') plt.show()
Kenton, W. (2019, June 10). Monte Carlo Simulation Definition. Retrieved from https://www.investopedia.com/terms/m/montecarlosimulation.asp