The Fall of StackOverflow

A few days ago, the news went through social media that StackOverflow was struggling with ChatGPT and that because of that, visits, pageviews, and interactions had dropped sharply. Although the trend is undeniable based on the available data, the question is whether the connection between the release of ChatGPT and the rapid decrease in StackOverflow's visitors is actually causal. Or are we much more trying to confirm our own expectation?

Let's take a look at the data

Only certain members of StackOverflow have access to the site's interaction data. This was leaked recently. We found the data here: https://observablehq.com/@ayhanfuat/the-fall-of-stack-overflow. Since the linked original source is not accessible by us, we cannot verify the validity of the data; falsification is unlikely but therefore cannot be ruled out.

The data contains information in day-by-day resolution. Plotting them reveals little except the fact that a weekly effect dominates. It is therefore useful to apply a moving average with a multiple of 7 as the window width to the data. In the graphs below, you can see the raw data (Page views+Visits, Posts and Votes) and on top of that the respective 21-day mean mapped with a stronger line.* We have also added some important events that explain individual features of the data, as well as being relevant to the causality analysis.

(*The code to plot these and the following graphs follows at the end of this article).

If you look at the data, the "Google" event on the left-hand graph immediately catches your eye. This event seems to have a significant impact on the number of visits and page views. In fact, Google made a significant change to their algorithm on this date, which resulted in many search hits being weighted differently. As a consequence, so-called "new visits" in particular break away, so that the total user base must decrease over time due to normal diffusion effects.

In the middle graph, a significant peak appears shortly after the first COVID-19 wave in Europe. This peak is directly related to the increase in interest in COVID-19 data analysis. It can be analyzed that the increase in posts is directly related to this. Furthermore, the number of posts decreases quite steadily over time. This is hardly surprising, since as StackOverflow's lifetime increases, the number of "general" and open questions must also decrease: many things have simply already been answered. What remains open are very specific questions that are of less interest to a broad audience.

The graph on the right shows the votes that users can give to posts. These (especially the "accept votes") are inevitably directly correlated with the number of existing questions and must therefore also decrease over time.

It gets interesting: short vs. long-term effects

We've already talked about week-by-week effects up to this point. Of course, effects that occur on an annual basis are also immediately noticeable. These occur in addition to very long-term effects, which cannot be assigned to a specific rhythm and rather dictate the long-term course of the curves. Therefore, it seems logical to try to separate these two effects in order to study them separately.

There are several ways to perform this separation: High-pass and low-pass filters, smoothing, and other special filters. For simplicity, we have used here only a very broad smoothing of 357 days (about 1 year, but in particular exactly 51 weeks). Applying this smoothing to the data, we get the long-term effects (top row of the following graphs). Dividing the 21-day pre-smoothed data by the long-term effects gives the short-term effects (bottom row). A division makes sense, because one can assume that the short term effects depend on the total amount of visitors (= the long term mean) and therefore scales with them. Looking at the respective results, we find this assumption confirmed, since no relevant amplitude modulation is observed in the short-term effects.

Of course, the simple smoothing method is not perfect: the sharp dip observed in the page views (as well as the COVID-19 peak) is "smoothed" by this, and at the same time strong amplitudes near this point appear in the short-term effects. However, keeping this in mind for further analysis, one can still proceed with it.

Let's look at the long-term effects again in more detail: in the graph on the top left, it is clear that the overall trend of decreasing page views correlates directly with the decreasing new visits, which, as we noted earlier, were triggered by Google's algorithm change. One seems to see a recovery starting at the GPT3 mark, but this is merely an effect of smoothing (as noted earlier). GPT3 and GPT4 do not appear to have any effect here.

Similarly, in the other two graphs in this series, the long-term trend shows no influence from the effects of GPT3 and GPT4; however, the changes in Google additionally reduce the respective metrics. Thus, the assumption is to exclude an influence of GPT3 and GPT4.

In fact, however, this is only half the truth. The publications of GPT3 and GPT4 are not yet a full year in the past, so their effects can only have a very small influence on the long-term trend. If we look at the second row of graphs, we can see significant drops in all three graphs directly after the releases of both LLMs. At the same time, we know that there are still cyclical, seasonal effects in the data here (e.g., all metrics collapse sharply near New Year's Day). We should remove these first.

Correction of cyclical effects

In order to correct for cyclical effects, we must first isolate them. To do this, we use the three years in the data that are "reasonably uniform": 2019, 2020, and 2021. We average the metrics for these three years and use the result as the "typical" annual pattern (the blue curve in the graphs below). Dividing the data by this typical year progression, determining the 5% and 95% percentiles of the mean range (i.e. 2019-2021) and plotting this together, we get the second graphs in each of the following figures. Here, anything outside the shaded area represents a significant deviation.

For example, you can see that the COVID-19 peak is particularly visible in posts and votes, but made little difference in traffic. The algorithm change at Google, on the other hand, can be seen clearly in traffic, while having little effect in posts and votes. Equally interesting is that GPT3 appears to have no effect, but GPT4 produces a significant drop (this appears to be recovering, but may also be an artifact from smoothing). In particular, the shape of the drop itself roughly corresponds to a diffusion curve, which one would also expect here.

Conclusion

In summary, it appears that the biggest contributor to "StackOverflow's fall" is not one of the two LLMs GPT3 or GPT4, but rather the market power of Google's algorithm. The latter's changes in May 2022 caused a significant drop in traffic that is much more significant than the effect of GPT. However, since traffic also provides advertising revenue, this can be a big problem in the future.

Nonetheless, you can at least see GPT4 in the data as well, especially a strong impact in the interaction rate (i.e. posts and votes) here. Of course, this also reflects the previous analysis: most of the generally valid questions have already been answered (and their answers can also be generated by GPT). What remains are the few questions that are very specific.

Ready to give it a shot? Here's the code: