If you’ve looked at my previous entry Getting WordPress Running on HHVM, you’ve already got a general idea for how to create a config.hdf file and get a web server up and running. Now you want to make sure it’s running as quickly, and efficiently as possible. But first, it’s worth understanding how HHVM runs your PHP scripts.
How HHVM serves up a page
When a request comes in, HHVM considers the hostname of the request against VirtualHost and Sandbox configuration directives to determine the right base directory to serve files from.
Once it’s identified the file, it checks a code cache stored in a SQLite database to see if it has already compiled the file. If so, and the file hasn’t changed, HHVM will execute that cached version very similarly to how APC caches PHP’s compilation process. If not, then HHVM’s analysis engine is invoked to compile the script, perform type inference, optimization, and store the result for next time.
As you might expect, there is a short warm-up period when a new server is started with a fresh, empty cache. Just like any other compilation cache, this only slows the first request of a given page. Unlike APC, however, HHVM’s cache is on disk, and therefore survives restarts.
If you’re curious about the code cache repository, you can find it in the location specified by the setting Repo.Central.Path, which defaults to ~/.hhvm.hhbc (you probably want to change that). Refer to doc/repo for more info on Repo settings.
Repo {
Central {
Path = /var/run/hhvm.hhbc.sq3
}
}
Just In Time
For production servers, you’ll almost certainly want to enable JIT (Just in Time) compilation to progressively turn the bytecode contained in hhvm.hhbc.sq3 into native code. Depending on exactly what your code base is doing, this will result in performance improvements of anywhere from 50% to 300% (100% seems to be a fairly common average).
To do so, just add the following snippet to your config.hdf file. Yes, it really is a “go faster” button.
Eval {
Jit = true
}
Bear in mind that JIT compilation is meant for long-running web servers where the extra effort of compiling to native code pays off after a few page views. For single-run uses like command line scripts, the extra work of compiling to native code will actually slow it down compared to just executing bytecode.
Note that as of Oct 18, 2012, HHVM will enable JIT by default for server, daemon, and replay modes, and disable by default for cli, debug, and translate modes. Ref: 57f063d2 e5824bea
Pre-analyzing
As stated already, all compilation from source to bytecode is cached on disk, so restarting your web server doesn’t lose any of that information and there will be no “warm-up” period on your second and later server starts; But why have that first warm-up period at all?
When you built HipHop (with USE_HHVM=1), the build process actually made two binaries. One, in src/hhvm/hhvm, is the web server and PHP engine you know and love. The other, in src/hphp/hphp (aka hhvm-analyze), is a version of HPHPc meant for HHVM use. It’s this latter binary which will allow you to pre-generate that cache before your web server ever starts up.
If you’re using our pre-built `hiphop-php` package for Ubuntu 12.04, this HHVM version of hphp has been delivered as /usr/bin/hhvm-analyze. The name has been changed to avoid confusion with /usr/bin/hphp which will be the normal HPHPc compiler in non-HHVM mode. For the purpose of this article, I’ll refer to running hhvm-analyze. Those who compiled HipHop themselves should take this to mean src/hphp/hphp built using USE_HHVM=1
To build the cache, you’ll need to identify the files you want included and invoke the analyzer:
$ cd /var/www
$ find . -name '*.php' > /tmp/files.list
$ hhvm --hphp --target hhbc --input-list /tmp/files.list --output-file hhvm.hhbc.sq3 -k1 -l3
The newly generated bytecode cache will be in /tmp/hphp_XXXXX/hhvm.hhbc.sq3 where XXXXX is a unique label produced in the output.
In addition to avoiding the initial warm-up by doing that ahead of time, hhvm-analyze will actually work a little harder than hhvm would at runtime. It’ll find a few more optimizations, be a little more sure about type inference, and produce better bytecode.
Repo.Authoritative
Even with the cache completely primed by running the analysis step above, HHVM is still going to default to checking the subject file each time it’s requested to make sure it hasn’t changed. That means disk I/O, and that means time. For a production server where source files are confidently not-changing, that’s just a waste. APC users will be familiar with this concept from the “apc.stat=0″ option.
To ignore the state of the source files during runtime on your production servers, just tell HHVM that the code cache (or Repo in HipHop parlance) is the authoritative source of all scripts on the system. With this setting enabled, if the file is found in the cache, it’s used without checking for updates. If it’s not found in the cache, the server will offer up a 404 without even trying to grab the file.
Repo {
Authoritative = true
}
Note that the bytecode cache is also tagged with the build ID of the hhvm which generated it. If a new hhvm binary is used, it will invalidate any previous cache and start fresh. So when running in Repo/Authoritative mode, be sure to regenerate your cache any time you replace your hhvm binary.