Insight

How funds can generate differentiated alpha from web data

Start leveraging web data to produce granular insights

5 mins
August 29, 2024
Rohit Shenoy
CEO

How funds can generate differentiated alpha from web data

Having worked with many hedge funds now, we are still surprised by how infrequently fund managers use web scraped data to generate alpha. The web is already a goldmine of information, and companies are only increasing their online presence. We are able to answer key questions and gain highly predictive insights  simply by scouring the web for data.

What funds often don’t realize is that there is far more information on the web than what a user can see on the front-end of a website. Oftentimes, there is highly predictive data in the network requests and code of a website. We’ve been able to generate incredible returns and save funds from major drawdowns through this “back-end” data.

Let’s dive into a couple of examples across industries that can drive alpha for your fund:

Retail

Perhaps the best industry to find web data is retail. The most common way we see fund managers leverage web scraped data in retail is tracking pricing through large providers. There are two main problems with this. The first is you’re likely not generating alpha as other funds see the same data at the same time. The second is that large providers standardize their extraction, at the expense of missing the metrics that provide the most predictive signals.

To solve the first problem, you need to extract data at a higher frequency than the large providers. They generally release retail data monthly to reduce their costs, making a higher extraction frequency relatively trivial. Even by setting up weekly scrapes, your analysts and PMs will know when a company is pushing promotions and eating into their gross margins weeks before other funds. At Durable Alpha, we helped one of our clients identify a reduction in promotion depth and intensity at $LULU weeks before other funds caught on, helping them generate incredible returns.

The second problem of missing metrics isa result of providers only scraping “front-end” data. There are many retail websites where we’ve found exact inventory counts in the retailer’s network requests. This provides insight into both how much volume they’re selling and if they’re having inventory issues. For instance, we helped a fund avoid a catastrophe with a stock because we were able to identify inventory issues from back-end data before earnings. Other metrics we’ve found to be helpful include 1) number of reviews across products as a directional view for sales, 2) star rating of products as a measure of NPS, and 3) special tags for products on the back-end. For example, Home Depot labels certain products as super SKUs on the back-end, which we can use to track how well industrial companies’ products are selling through the HD channel.

Enterprise Software

This is where we’ve seen the largest data gap for funds. Traditional datasets like credit card, foot traffic, app downloads etc. can’t tell you how Snowflake is performing. Yet many of these companies have a strong online presence that can reveal valuable insights. Below are three ways we track enterprise software companies.

  1. Forum questions - All enterprise software companies will either have a public community forum or Stack Overflow page where developers ask questions. We track the history of questions, responses, upvotes, views etc. to gauge inflection points of developer adoption and alert funds before the alpha erodes. This is especially helpful for VC funds to find companies before their inboxes get flooded. 
  2. Package downloads - To make use of enterprise software tools, developers need to download SDKs/software packages that interface with the enterprise software. For example, Python has the snowflake-connector-python package to connect to Snowflake from Python. We track the number of downloads for these packages to gauge developer adoption and growth.
  3. Github activity - Many enterprise software companies like MongoDB will have their repositories publicly available on Github. Developers can suggest changes (aka create pull requests), create issues, ask questions, fork the repository, give stars to show support etc. All of this is publicly available data that reveals customer growth and NPS and can be a major source of alpha if you have the right alerts set up.

Unique Inventory

Any website that contains information on unique inventory can provide incredible insights. Examples of this include car inventory (Carvana), home inventory (Pulte Homes), and room inventory (WeWork). Each of these have uniquely identifiable units that can be tracked when they are added and tagged as “sold” when they disappear from the website. By tracking the website frequently, you can predict revenue extremely accurately.

  1. Car Inventory - Cars can be uniquely identified by its VIN. Furthermore, you can track average selling price, the number of days an average car is held in inventory, makes and models that are selling faster, and pricing power across marketplaces. For example, we were able to prove a key thesis for a fund around new vs used vehicle sales on Camping World’s platform using this methodology.
  2. Home Inventory - Many homebuilders will post maps of available and sold homes across all their communities. Tracking these sitemap images and comparing them from one day to the next provides a real-time view into how many houses they’re selling per day, for what price, and in what locations.
  3. Room inventory - You can track which rooms become available and disappear to track room sales. For example WeWork lists all available office rooms across all their locations on the website. Each room has a unique ID on the back-end along with price and discounts that agents can offer. This is a must have dataset if you’re tracking Wework.

Conclusion

These are just a few of the dozens of approaches we’ve developed to generate alpha using web data. We’re always discovering more alongside our clients. If any of these sound interesting or if you’d simply like to bounce around ideas, we’re always open to having a conversation. Feel free to reach out to us using the link below.

Latest insights delivered to your inbox
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
© Copyright 2024 String AI
Created In New York City, NY