Real World Stats for Indie Hacking

19 Sep 2023

The tech twitter is filled with Indie Hackers spewing their often conflicting "wisdom". Unfortunately, their is no way to verify what is true and what is marketing. So I decided to crunch some numbers and make a few hard assertions.

Gathering The Data

IndieHackers is a platform where Bootstrappers generally post about their product and discuss specifics of their endeavors. They have a products page where you can see data for a lot of registered products. It looks something like thisindie-hackers-pic

I decided to scrape all the data from this page. After writing a lot of scroll down, scraping, proxy scripts and a few other hacks, I got this beautiful dataset. It contains info of 2868 bootstrapped startups.dataset-pic

Here's the github repository containing dataset and other artifacts. You can also visit this sample link to know more about what kind of data I've scraped.

Revenue

Startups earning(per month)Count
<0$36
0$1516
0$-100$97
100$-1000$98
1000$-10k$173
>10k+$915
Total2868

Inference: From the above data, it seems like either you hit it big with >10k$ MRR or it completely doesn't work out. Pursuing 100$/month passive income just from this data looks like a fool's errand.



....But people lie a lot right? Luckily IH also provides stripe-verified revenue. Let's see how many startups actually have verification enabledpie-chart

Startups earning(per month)Stripe Verified Revenue StartupsSelf Reported Revenue Startups
<0$333
0$1181398
0$-100$2869
100$-1000$2573
1000$-10k$37136
>10k+$74841
Total2852550

Just from eyeballing, it seems like Stripe verified startups also either gain no traction or a lot of it.

I'm low on time and energy so I'm not doing any “mean, median, deviation, skewness” checks to see if both distributions are same or not.

Keyword Analysis

Q1. Which fields are most pursued by startups?

Here are top 100 keywords in the dataset from taglines of all startups[You can look at the dataset example given above to know what a tagline looks like]

[('ai', 163), ('platform', 141), ('app', 128), ('tool', 114), ('online', 113), ('software', 91), ('website', 84), ('service', 74), ('build', 73), ('management', 73), ('web', 73), ('design', 72), ('wordpress', 68), ('data', 67), ('marketing', 66), ('create', 65), ('simple', 64), ('find', 64), ('business', 63), ('content', 63), ('saas', 59), ('get', 58), ('free', 58), ('social', 55), ('digital', 53), ('email', 53), ('video', 52), ('best', 50), ('product', 49), ('tools', 48), ('analytics', 45), ('job', 45), ('one', 44), ('apps', 42), ('way', 42), ('help', 40), ('code', 39), ('easy', 37), ('community', 37), ('videos', 37), ('feedback', 37), ('people', 36), ('based', 36), ('newsletter', 35), ('work', 35), ('time', 35), ('monitoring', 33), ('tech', 32), ('developers', 32), ('world', 31), ('builder', 31), ('api', 31), ('make', 30), ('media', 30), ('remote', 30), ('google', 30), ('websites', 29), ('subscription', 29), ('without', 29), ('startups', 29), ('teams', 28), ('businesses', 28), ('using', 28), ('mobile', 28), ('solution', 28), ('better', 28), ('discover', 28), ('powered', 28), ('new', 28), ('share', 28), ('youtube', 28), ('products', 27), ('development', 27), ('fast', 26), ('sales', 25), ('startup', 25), ('customer', 25), ('user', 25), ('learn', 25), ('podcast', 25), ('track', 25), ('minutes', 24), ('made', 24), ('daily', 24), ('marketplace', 23), ('automation', 23), ('cloud', 23), ('tracking', 23), ('seo', 23), ('generate', 23), ('use', 22), ('custom', 22), ('unlimited', 22), ('browser', 22), ('turn', 22), ('search', 22), ('board', 22), ('companies', 21), ('services', 21), ('real', 21)]

Q2.Which fields are most successful?

Here are top 100 keywords in the dataset from taglines of startups with >10k$ MRR

[('platform', 64), ('software', 61), ('online', 52), ('marketing', 39), ('service', 39), ('design', 33), ('digital', 31), ('app', 31), ('business', 30), ('build', 30), ('wordpress', 30), ('management', 28), ('tool', 28), ('website', 27), ('data', 24), ('saas', 23), ('content', 23), ('social', 20), ('product', 19), ('get', 18), ('free', 18), ('ai', 18), ('create', 17), ('email', 16), ('apps', 16), ('media', 16), ('video', 16), ('simple', 16), ('businesses', 16), ('way', 15), ('web', 15), ('one', 15), ('tech', 15), ('help', 15), ('best', 15), ('automation', 14), ('teams', 14), ('easy', 14), ('tools', 14), ('builder', 14), ('sales', 13), ('seo', 13), ('developers', 13), ('agency', 13), ('without', 13), ('subscription', 12), ('companies', 12), ('unlimited', 12), ('analytics', 12), ('products', 11), ('work', 11), ('startup', 11), ('tracking', 11), ('solution', 11), ('code', 11), ('time', 11), ('job', 11), ('marketplace', 10), ('make', 10), ('people', 10), ('services', 10), ('api', 10), ('real', 10), ('startups', 10), ('find', 10), ('generation', 9), ('solutions', 9), ('world', 9), ('small', 9), ('grow', 9), ('development', 9), ('minutes', 9), ('themes', 9), ('ad', 8), ('cloud', 8), ('support', 8), ('creators', 8), ('use', 8), ('community', 8), ('ecommerce', 8), ('based', 8), ('life', 8), ('experts', 8), ('using', 8), ('mobile', 8), ('turn', 8), ('scheduling', 8), ('automated', 8), ('forms', 8), ('google', 8), ('plugin', 8), ('powerful', 8), ('paid', 7), ('remote', 7), ('monitoring', 7), ('company', 7), ('agencies', 7), ('1', 7), ('team', 7), ('store', 7)]

Q3. Most Overhyped fields

I also did a diff between the two to get the most overhyped fields.

[('ai', 145), ('app', 97), ('tool', 86), ('platform', 77), ('online', 61), ('web', 58), ('website', 57), ('find', 54), ('simple', 48), ('create', 48), ('management', 45), ('build', 43), ('data', 43), ('get', 40), ('free', 40), ('content', 40), ('design', 39), ('wordpress', 38), ('email', 37), ('video', 36), ('saas', 36), ('social', 35), ('service', 35), ('best', 35), ('tools', 34), ('job', 34), ('business', 33), ('analytics', 33), ('videos', 33), ('newsletter', 31)]

Inference: AI is hugely Overhyped. Only 18 out of 163 AI startups listed are successful. Apps and tools seems like overhyped areas as well.

I got really interesting results calculating underhyped areas. I'll keep those results with me for now.

Tech vs Non Tech Founders

Companies with Founders whoCodeDont Code
Number of Companies2307515
Number of Companies With >10k$ MRR638267

Inference: This is a perfect example of "sampling bias". It seems like Founders who Code are less successful since they have 638/2307 = 27% success rate. Compared to Non Tech founders who have 267/515 = 51% success rate. This is just because Tech Founders don't need extensive planning to create a product. I recommend this lesson by Eddie Woo to learn more on this.

Solo vs Multiple Founders

-Solo FoundersMultiple Founders
All Companies1836990
Number of Companies With >10k$ MRR417492

Inference: Multiple founders are objectively better. Solopreneurship is really difficult.

B2B vs B2C

-B2CB2B
All Companies242937
Number of Companies With >10k$ MRR75461

Inference: B2B is pursued a lot more in bootstrapping. and it kind of makes sense. when you don't have a big budget, targetting general audience is wild. That's the mistake I made with my first startup.

Results

Chances of You being successful (if listed on IH): 31.9%

The chances part is very important.

X (My guess is 30000 startups) start bootstrapping —-> 2868 list on IH —-> 915 making 10k$ MRR

That puts the actual percentage to be : 3.5% chance of success.

Further Thoughts

There's a lot of survivorship bias, Twitter accounts like DHH, Naval glorify this journey and while they are successful, there are 1000 others who failed.

I also remember tyler cowen's talk about being suspicious of stories. Particularly, DHH ran 37signals for 5 years as a consultancy before basecamp.

At this point he had pile of $, a network of people, and essential skills that would help him move further up the ladders of wealth creation. DHH built himself a foundation that made it much easier for him to not have to work “as hard” after a certain point.





Thanks to Manan for proofreading this!!

Thats it!! If you want any other insights from the dataset, Feel free to contact me at prakhar897@gmail.com :)