# Beating the Odds: Machine Learning for Horse Racing

Inspired by the story of Bill Benter, a gambler who developed a computer model that made him close to a billion dollars1 betting on horse races in the Hong Kong Jockey Club (HKJC), I set out to see if I could use machine learning to identify inefficiencies in horse racing wagering.

# Histopathologic Cancer Detection with Transfer Learning

In this post we will be using a method known as transfer learning in order to detect metastatic cancer in patches of images from digital pathology scans. %matplotlib inline import pandas as pd import torch import matplotlib.pyplot as plt import cv2 import numpy as np plt.rcParams["figure.figsize"] = (5, 3) # (w, h) plt.rcParams["figure.dpi"] = 200 Data The data we will be using is located on Kaggle in the dataset Histopathologic Cancer Detection.

# Predicting Academic Collaboration with Logistic Regression

In my last post, we learned what Logistic Regression is, and how it can be used to classify flowers in the Iris Dataset. In this post we will see how Logistic Regression can be applied to social networks in order to predict future collaboration between researchers. As usual we’ll start by importing a few libraries: %matplotlib inline import pandas as pd import numpy as np import networkx as nx import matplotlib.

# Multi-Class Classification with Logistic Regression in Python

A few posts back I wrote about a common parameter optimization method known as Gradient Ascent. In this post we will see how a similar method can be used to create a model that can classify data. This time, instead of using gradient ascent to maximize a reward function, we will use gradient descent to minimize a cost function. Let’s start by importing all the libraries we need: %matplotlib inline import pandas as pd import numpy as np import matplotlib.

# Trading with Reinforcement Learning in Python Part II: Application

In my last post we learned what gradient ascent is, and how we can use it to maximize a reward function. This time, instead of using mean squared error as our reward function, we will use the Sharpe Ratio. We can use reinforcement learning to maximize the Sharpe ratio over a set of training data, and attempt to create a strategy with a high Sharpe ratio when tested on out-of-sample data.

In the next few posts, I will be going over a strategy that uses Machine Learning to determine what trades to execute. Before we start going over the strategy, we will go over one of the algorithms it uses: Gradient Ascent. What is Gradient Ascent? Gradient ascent is an algorithm used to maximize a given reward function. A common method to describe gradient ascent uses the following scenario: Imagine you are blindfolded and placed somewhere on a mountain.

# Momentum Strategy from "Stocks on the Move" in Python

In this post we will look at the momentum strategy from Andreas F. Clenow’s book Stocks on the Move: Beating the Market with Hedge Fund Momentum Strategy and backtest its performance using the survivorship bias-free dataset we created in my last post with Backtrader. Momentum strategies are almost the opposite of mean-reversion strategies. A typical momentum strategy will buy stocks that have been showing an upward trend in hopes that the trend will continue.

# Creating a Survivorship Bias-Free S&P 500 Dataset with Python

When developing a stock trading strategy, it is important that the backtest be as accurate as possible. In some of my previous strategies, I have noted that the backtest did not account for survivorship bias. Survivorship bias is a form of selection bias caused by only focusing on assets that have already passed some sort of selection process. A simple example would be a strategy that simply buys and holds an equal allocation of the current S&P 500 constituents.

# Improving Cross Sectional Mean Reversion Strategy in Python

In my last post we implemented a cross-sectional mean reversion strategy from Ernest Chan’s Algorithmic Trading: Winning Strategies and Their Rationale. In this post we will look at a few improvements we can make to the strategy so we can start live trading! Setup We will be using the same S&P 500 dataset we created in the last post. Let’s load it in. from datetime import datetime import pandas as pd import backtrader as bt import numpy as np import matplotlib.

# Backtesting a Cross-Sectional Mean Reversion Strategy in Python

In this post we will look at a cross-sectional mean reversion strategy from Ernest Chan’s book Algorithmic Trading: Winning Strategies and Their Rationale and backtest its performance using Backtrader. Typically, a cross-sectional mean reversion strategy is fed a universe of stocks, where each stock has its own relative returns compared to the mean returns of the universe. A stock with a positive relative return is shorted while a stock with a negative relative return is bought, in hopes that a stock that under or outperformed the universe will soon revert to the mean of the universe.

# Backtesting Portfolios of Leveraged ETFs in Python with Backtrader

In my last post we discussed simulation of the 3x leveraged S&P 500 ETF, UPRO, and demonstrated why a 100% long UPRO portfolio may not be the best idea. In this post we will analyze the simulated historical performance of another 3x leveraged ETF, TMF, and explore a leveraged variation of Jack Bogle’s 60 / 40 equity/bond allocation. First lets import the libraries we need. import pandas as pd import pandas_datareader.

# Simulating Historical Performance of Leveraged ETFs in Python

In this post we will look at the long term performance of leveraged ETFs, as well as simulate how they may have performed in time periods before their inception. Many people would recommend against holding a position in a leveraged ETF because of beta slippage. Lets take a look at the performance of SPY, an S&P 500 ETF, versus UPRO, a 3x leveraged S&P 500 ETF. First lets import a few libraries.