Skip to content

Analyzing ‘South Park’ for Stock Market Strategies – The Data

Is a South Park Stock Market trading strategy viable?

In the following, we will use Python to get a list of all the mentions of public companies mentioned in South Park episodes. We’ll scrape Wikipedia for this purpose.

Following this, we’ll try to use this “alt-data” to generate a South Park Stock Market trading strategy.

Step 1: Get a list of all the individual episode wikis

We can see that the full list of episodes is available on this page, split between a few different tables, over here: https://en.wikipedia.org/wiki/List_of_South_Park_episodes.

While manually taking every link is an option, it’s an excruciating and time-consuming one. Luckily, we do have a better way of doing this!

Using the requests library, alongside BeautifulSoup, we can extract all the info on the URL:

url = 'https://en.wikipedia.org/wiki/List_of_South_Park_episodes'
page = requests.get(url)
soup = BeautifulSoup(page.content)

However, our “soup” contains a lot more than we need, so let’s try and find our specific info.

Looking at the page source, we notice that our desired information is situated in blocks with the following specifics:

table_class = "wikitable plainrowheaders wikiepisodetable"
span_class = "bday dtstart published updated"

Using the information above, and a bit more tweaking, we arrive at a final code that loops through the tables in the page that have the appropriate class, and through their rows. The information is then gathered in a dictionary.

dict_sp={}
i=1
for table in soup.find_all("table", class_=table_class):
     for tr in table.find_all('tr', class_="vevent"):
         if 'id' in tr.find('th').attrs.keys():
             dict_sp[i]={'date':tr.find('span',class_=span_class).contents[0], 
                 'episode_id':tr.find('th').attrs['id'],
                 'episode_wiki_link':tr.find('a', href=True, title=True).attrs['href'],
                 'episode_title':tr.find('a', href=True, title=True).attrs['title']}
             i=i+1

An entry in our resulting dict looks like this:

{'date': '1997-08-13',  
'episode_id': 'ep1',  
'episode_wiki_link': '/wiki/Cartman_Gets_an_Anal_Probe',  
'episode_title': 'Cartman Gets an Anal Probe'}

Step 2: Crawl through each episode wiki and find all the wiki links in the pages

We first define a helper function:

def find_wiki_hrefs_in_soup(soup):
     wiki_hrefs=[]
     for elem in soup.find_all("a", href=True, title=True):
         elem_href=elem.attrs['href']
         if elem_href.split("/")[1]=='wiki':
             wiki_hrefs.append(elem_href)
     return np.unique(wiki_hrefs)

Using it, we go through each item of our dictionary, apply the function, and append the new info to our dictionary. We also add a small sleep in between calls, so that we don’t overwhelm our environment and also so that we don’t get blocked by Wikipedia.

for key in dict_sp.keys(): url_ep = 'https://en.wikipedia.org/'+dict_sp[key]['episode_wiki_link'] page_ep = requests.get(url_ep) soup_ep = BeautifulSoup(page_ep.content) dict_sp[key]['links']=find_wiki_hrefs_in_soup(soup_ep) time.sleep(0.1)

Now, are dictionary looks like this (when converted to a Pandas DataFrame):

Step 3: See if any of the links match public companies

While the most straightforward approach would be to start crawling through all the links in all the episodes, this would not be ideal, as a lot of the links are repeated (i.e. almost all pages include a link to https://en.wikipedia.org/wiki/South_Park ). Thus, for the sake of efficiency, we first generate a list of all unique URLs:

all_urls=[]
for key in dict_sp.keys():
    current_urls=dict_sp[key]['links']
    all_urls=np.unique([*all_urls, *current_urls])

For this list, we can start crawling. We notice that for all the public companies, Wikipedia has a VCard, containing a link to https://en.wikipedia.org/wiki/Public_company, but also the ticker. We’ll use this to check if our links refer to a public company, as well as to get the corresponding ticker. Using the page source, we find the class of the vcard table and define it as below, as well as the required match to the Public Company URL.

vcard_class = "infobox vcard"
key_match = '/wiki/Public_company'

In a new dictionary, we will collect all the vcard information for all of our 10k + URLs (if they have one!).

dict_url = {}
err = []
for wiki_url in all_urls:
    url = 'https://en.wikipedia.org'+wiki_url
    page = requests.get(url)
    soup = BeautifulSoup(page.content)
    try:
        dict_url[url]=find_wiki_hrefs_in_soup(soup.find("table",class_=vcard_class))
    except:
        err.append("No vcard for "+ wiki_url)
    time.sleep(0.1)

We end up with a list of 49 URLs, but some manual tinkering is required for the tickers, due to the very different formats used in the vcards, differences in ticker name across regions, over-the-counter securities, as well as for duplicate pages such as the below:

'https://en.wikipedia.org/wiki/ViacomCBS',  'https://en.wikipedia.org/wiki/Viacom_(2005%E2%80%93present)',

Final Results

Any South Park fan could name a few examples of situations in which Public Companies were mentioned in South Park episodes. For me, these would be the most table three examples:

Handicar – Uber, Lyft, and Tesla

End notes

We’ve managed to meet our current goal, getting all the data that we needed for developing a strategy, in a fairly quick and simple way, thanks to “Requests” and “BeautifulSoup”.

If you’re also interested in how this can be used further for developing the “South Park Stock Market” trading strategy and whether or not it performs better than the SP500 (Standard&Poor) index fund, jump over to this post.

Unfulfilled – Amazon

Doubling Down – Beyond Meat

Doing a quick check, we see that we’ve correctly mapped all three examples above!

result=pd.DataFrame.from_dict(dict_sp_stocks, orient='Index')
result[result['episode_title'].isin(['Handicar','Doubling Down','Unfulfilled'])]
episode_title episode_date tickers
Handicar 2014-10-15 [‘LYFT’ ‘TSLA’ ‘UBER’]
Doubling Down 2017-11-08 [‘BYND’]
Unfulfilled 2018-12-05 [‘AMZN’]

Full results below:

episode_title episode_date tickers
The Wacky Molestation Adventure 2000-12-13 [‘DENN’]
Osama bin Laden Has Farty Pants 2001-11-07 [‘DIS’]
Red Man’s Greed 2003-04-30 [‘CPB’]
Something Wall-Mart This Way Comes 2004-11-03 [‘WMT’]
Make Love, Not Warcraft 2006-10-04 [‘BBY’]
Britney’s New Look 2008-03-19 [‘CHH’]
Over Logging 2008-04-16 [‘SBUX’]
Pandemic 2: The Startling 2008-10-29 [‘BBY’]
The Ring 2009-03-11 [‘DIS’]
Margaritaville 2009-03-25 [‘AXP’]
Dead Celebrities 2009-10-07 [‘CMG’ ‘MCD’ ‘TWTR’]
W.T.F. 2009-10-21 [‘WWE’]
The F Word 2009-11-04 [‘HOG’]
The Tale of Scrotie McBoogerballs 2010-03-24 [‘TWTR’]
It’s a Jersey Thing 2010-10-13 [‘TWTR’]
Mysterion Rises 2010-11-03 [‘NKE’]
Coon vs. Coon and Friends 2010-11-10 [‘NKE’]
Crème Fraîche 2010-11-17 [‘PGR’]
HumancentiPad 2011-04-27 [‘AAPL’ ‘BBY’]
T.M.I. 2011-05-18 [‘FDX’]
1% 2011-11-02 [‘RRGB’]
Raising the Bar 2012-10-03 [‘WMT’]
Insecurity 2012-10-10 [‘AMZN’ ‘UPS’]
A Scause for Applause 2012-10-31 [‘NKE’]
Obama Wins! 2012-11-07 [‘DIS’]
Black Friday 2013-11-13 [‘SNE’]
A Song of Ass and Fire 2013-11-20 [‘MSFT’ ‘SNE’]
Titties and Dragons 2013-12-04 [‘MSFT’ ‘RRGB’]
Handicar 2014-10-15 [‘LYFT’ ‘TSLA’ ‘UBER’]
Freemium Isn’t Free 2014-11-05 [‘TWTR’]
Grounded Vindaloop 2014-11-12 [‘BBY’]
Rehash 2014-12-03 [‘TWTR’]
HappyHolograms 2014-12-10 [‘TWTR’]
You’re Not Yelping 2015-10-14 [‘YELP’]
Skank Hunt 2016-09-21 [‘TWTR’]
The Damned 2016-09-28 [‘TWTR’]
White People Renovating Houses 2017-09-13 [‘TWTR’]
Franchise Prequel 2017-10-11 [‘NFLX’]
Sons a Witches 2017-10-25 [‘ROST’]
Doubling Down 2017-11-08 [‘BYND’]
Super Hard PCness 2017-11-29 [‘NFLX’]
Unfulfilled 2018-12-05 [‘AMZN’]
Bike Parade 2018-12-12 [‘AMZN’]
Band in China 2019-10-02 [‘AAPL’ ‘DIS’]
The Pandemic Special 2020-09-30 [‘BBW’ ‘VIAC’]

End notes

We’ve managed to meet our current goal, getting all the data that we needed for developing a strategy, in a fairly quick and simple way, thanks to “Requests” and “BeautifulSoup”.

If you’re also interested in how this can be used further for developing the “South Park Stock Market” trading strategy and whether or not it performs better than the SP500 (Standard&Poor) index fund, jump over to this post.

Tags:

Leave a Reply

Your email address will not be published. Required fields are marked *