Over the years traffic to Hacker News (HN), “a social news website about computer hacking and startup companies” (Wikipedia), has grown consistently, with an average of 150,000 daily uniques. The growth in traffic may explain why load times seem increasingly variable. I couldn’t help but wonder if some optimizations could be made to decrease both variability and load times. I’ll propose two broad approaches, the first involves migrating away from table based layouts while the second involves consuming a JSON API.
Approach 1: Tables to Divs
Table 1. Hacker News Resource Statistics
Resource | Size (With Tables) | Size (With Divs) | % Change |
---|---|---|---|
HTML | 26KB | 15KB | -42% |
CSS | 1.7KB | 2.3KB | +35% |
Logo | 100B | 0 | -100% |
Up Arrow | 111B | 0 | -100% |
Total | 27.9KB | 17.3KB | -37.2% |
In the DIV version, the logo and up arrow were base 64 encoded and included in the HTML and CSS files.
HN’s front page is comprised of: 4 tables, 98 rows, 159 columns, 37 inline style declarations and numerous attributes that dictate style. To reduce the markup on the front page I created a new HN front page (Github link) that looks identical to the existing page but does not include tables or inline css. I also went a step further and base64 encoded both the logo and the up arrow to decrease the number of requests. The completed CSS file was run through a css minifyer to yield further reductions. With those changes only two requests are necessary, one for the HTML file and one for the CSS file. Table 1 shows that those changes yielded an overall reduction of 37%.
I also slightly modified the JavaScript responsible for sending up-votes to the server. Instead of grabbing a vote’s id from the id of the HTML node, it gets it from the ‘data-id’ attribute. Otherwise, the JavaScript remains identical. As an aside, if you have not examined the JavaScript that is responsible for sending votes to the server, I’ve included it below (the existing code). It’s a creative use of an image tag. An image node is created, but not added to the DOM. When the image node is assigned a ‘src’, which happens to include all the vote info, it then requests the ‘image’, using the constructed url. Thus the ‘image’ request becomes analogous to an AJAX GET request, but without a conventional response.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Approach 2: JSON API
Although approach 1 results in a 37% decrease in data transferred to the client, markup and data must be transferred to the client on every refresh. In approach two, the markup is only transferred to the client once, and then cached, while the data is sent the client via JSON. Using this approach would decrease the HTML file but no doubt increase the JavaScript file. However, both of those resources could be cached in browser, and cached on a CDN, drastically reducing the number of requests to HN’s server. Furthermore, the JSON representing the stories on the front page is 7.8KB, much smaller than the amount size of the existing solution or even approach 1.
Approach 2 is not without its drawbacks. It would require significant changes in both HN’s backend and large changes to the client-side. A JavaScript application and API would have to be created. This approach would likely be incompatible with bots that would not execute the JavaScript necessary to populate the page with stories. To get around this the agent-type could be detected and a static version could be served to bots. Alternatively, the webpage could be pre-populated with stories and subsequent requests would take advantage of AJAX get requests. This would simplify matters, but make caching more difficult, since the cache page would require updating every time the front page changes.
Conlusions
By transitioning from tables to divs, and inline css to external css, HN could dramatically reduce the bandwidth required to serve its web pages. The first approach would require minimal changes to HN’s back-end making it a good candidate for adoption. While the second approach could yield even better results, it would require drastic changes to both the server and the client, making it more suitable as a long term solution.
In addition to the two approaches above, gzip compressing both the .html and the .css would further reduce transferred data. It would also be beneficial to add the appropriate headers to enable browser caching for CSS.
While Paul Graham may have insufficient time, or interests, in implementing some of the above changes, I suspect he knows a few individuals who would be willing to help out.