How to profile your PHP apps

Part of my job at Nomad List, Remote OK et al. is to keep the sites fast and responsive, even from an influx of traffic to Pieter's latest app.

Launches are usually kept controlled by exporting the pages to static HTML files and serving using Nginx. However, in the best interest of server CPU load, processing time and futureproofing, I've recently begun profiling our PHP code and optimising out any issues.

This blog post is a guide/case study of how I do it and how it helps us.

1. Configuring Xdebug

Xdebug is a debugging and profiling tool for PHP. The website lists all the ways to install it. You'll probably want to do it using PECL or Apt. Everything is well documented, so you should change the parameters to fit your setup but we have the following in /etc/php/7.1/fpm/php.ini:

xdebug.profiler_enable = 0
xdebug.profiler_output_name = cachegrind.out.%t
xdebug.profiler_enable_trigger = 1
xdebug.profiler_output_dir = /tmp
xdebug.profiler_enable_trigger_value = "<secret key>"

In order, the configuration goes as follows: load the module, disable profiling by default, set the filename, enable triggering via GET/POST parameter, output in /tmp and only profile when given the key.

Restart php-fpm and you should be good to go.

2. Profiling a page

Now you should be able to go to visit any PHP page on your site, append ?XDEBUG_PROFILE=<secret key> and see the file in /tmp. The great thing about using this functionality is you have total control over which pages are profiled and it requires no in-app code.

It'll take a bit longer than usual to load the page, but it's storing all function calls and statistics so give it a minute.

3. Analysing the breakdown

From here, you can scp a copy of the file to yourself and use Kcachegrind to browse the file. This worked fine for me but I kept losing track of files and wondered if there were any web app solutions. I found webgrind, a PHP web app which loads Xdebug files and displays them in the browser, just like Kcachegrind. The advantage here is you don't need to download each file individually and you can even profile and analyse pages on a portable device.

We host webgrind on a separate domain, protected by .htaccess. This prevents unauthorised users from hunting through your source code.

The interface is simple to use. I generally show 100% of the statistics in order to get a grasp of what is being called, and where from. The most important columns are Invocation Count and Total Self Cost.

Invocation Count is how many times that function was called. A simple tip for reducing this is to pull out any repetitive function calls from a for-loop into a single variable.

Total Self Cost is the total percentage or time that the function is responsible for. You can improve this by reducing code complexity, using built-in native functions or removing unused variables.

Reduce either of these and you're winning.

Case Study

Pieter launched Mute this morning. The front page was static, generated by a cronjob running every minute or so. From our cronjob status page, I could see it was taking around 3 seconds to generate. I thought I'd take a look and see if anything could be improved.

Soon after, I tweeted this:

I just took performance optimisation to another level. https://t.co/nTsLSWy9Zr 175749 function calls -> 1224 (0.7%) (cc. @levelsio)โ€” Daniel ๐Ÿ—บ (@DanielLockyer) August 9, 2017

Then I was asked how I did it:

Holy cow! Care to share more details on how you did it? Sounds pretty interesting :)โ€” Nick van der Wildt (@nickvanderwildt) August 9, 2017

To which I replied:

Sure! Been using https://github.com/jokkedk/webgrindโ€ฆ a lot recently. Profiled old page, noticed loads of calls (check pic) so I checked out code [..] pic.twitter.com/EoJYTywdQ8โ€” Daniel ๐Ÿ—บ (@DanielLockyer) August 9, 2017
We had a few nested for-each loops. Pulled out anything static and turned repeated function calls into variables. Dropped heavy [..]โ€” Daniel ๐Ÿ—บ (@DanielLockyer) August 9, 2017
function call count to 1224 (see pic). Our static page cron job used to take 3 seconds, now it takes 0.16 seconds pic.twitter.com/PHIQ2jLA86โ€” Daniel ๐Ÿ—บ (@DanielLockyer) August 9, 2017

Pretty crazy huh?

Here are a few more improvements I made:

I just made https://t.co/2cHoleAXzf 54% faster by removing copious amounts of in_array and array_push... now onto @NomadList profilesโ€” Daniel ๐Ÿ—บ (@DanielLockyer) August 2, 2017
https://t.co/2e3NJZogwY is now 74.7% faster (5517ms -> 1394ms) by switching the sorting algorithmโ€” Daniel ๐Ÿ—บ (@DanielLockyer) August 6, 2017

If you're looking to performance-proof your server and web app before launch, email me at hi@daniellockyer.com!