Friday, June 28, 2019

Workaround: HTTP 502 "Incomplete response received from application" in Rails app on Phusion Passenger server

I've recently been fighting an intermittent issue in a specific Ruby on Rails app running on a Phusion Passenger web server on my local macOS development machine: HTTP requests to the app would intermittently return the error "Incomplete response received from application" wrapped in a simple h2 element, instead of the desired resource.

When this occurred, I'd also see a message like the following appear in the Phusion Passenger web server log:

App 6510 output: [ 2019-06-27 12:07:49.0318 6510/0x00007f83639eb6f8(HTTP helper worker) utils.rb ]: *** Exception Errno::EBADF (Bad file descriptor) (process 6510, thread 0x00007f83639eb6f8(HTTP helper worker))

Oddly, no one else working on this application saw these errors while running the app on their machines, even though as far as we could tell, we were all running pretty much the exact same environment.

The workaround

 I found that I could get this issue to stop manifesting on my machine by appending the following parameter to my passenger server start-up command:

--spawn-method direct

This resulted in the complete Passenger start command for the application looking like:

bundle exec passenger start --port 3000 --max-pool-size 4 --min-instances 4 --spawn-method direct

Explanation

The key to coming up with this workaround was an article on the Phusion Passenger website: Spawn methods explained (for Ruby developers).

That article talks about how, in order to save memory and improve start-up time, Passenger uses a "spawn method" setting of "smart" for Ruby applications by default. That setting accomplishes that by having multiple server processes share certain objects in memory.

A caveat of the "smart" spawn mode noted by the article is unintentional file descriptor sharing. This caught my attention: Research I'd done on the "EBADF (Bad file descriptor)" error noted that it could be caused when a file descriptor -- that is, a handle pointing to a resource like a local file or a network socket -- is closed, but then accessed again after being closed. If such a handle was being unintentionally shared between web server processes, it does seem entirely plausible that one process could close a handle, then the other process could try to use it for something.

It seemed logical that configuring Passenger to not share objects between processes, via using the "direct" spawn mode setting instead of the "smart" setting, would avoid this. Sure enough, using the "direct" setting did immediately stop the "Incomplete response from application" errors I was seeing in my local environment.  Additionally, running the app on my local machine (with myself as the only user), I didn't observe any meaningful degradation in app performance.

What I'm still unsure about at this point is exactly what specific handle was being incorrectly shared between processes, and why no one else working on this application was having the same error that I was manifest in their environments. Thus, I'm considering this a "workaround" rather than a "solution" for the time being.  (And I'm not committing a change to have the app use the "direct" spawn mode in any environment other than my own.)


Other solutions attempted

Prior to finding the workaround, as all signs pointed to this being some kind of environmental issue with my local machine, I tried to "fix" my environment in quite a few different ways:
  • Use a different client browser
  • Restart the Passenger web server
  • Reboot 
  • Reinstall Ruby 2.6.3
  • Revert to an older Ruby version (2.5.3) using rvm (and uninstall 2.6.3)
  • Reinstall gems via bundle install --force
  • Clone a fresh copy of the Rails application code itself from the remote repository
  • Run the application on a different port (other than 3000)
  • Update the database (mySQL) to the latest patch version
  • Increase the limit of open files per process via ulimit -n 8192
  • Increase the database pool size in database.yml
None of those attempted fixes made a difference; the "Incomplete response received from application" response to HTTP requests, and the "Errno::EBADF (Bad file descriptor)" in the server log continued to manifest.

I had initially suspected the Ruby 2.6.3 upgrade to have something to do with the problem, as I had first started noticing the issue around the time I made that upgrade locally. However, as downgrading to 2.5.3 didn't fix the issue -- and others working on this app were running 2.6.3 with no issues -- I ended up concluding that the 2.6.3 upgrade did not seem to be the direct cause of the issue.

I'd very much like to understand what is actually causing this error in my environment. In the meantime, though, I'm very happy to finally have a viable workaround, so I can get work done again!