diff options
Diffstat (limited to 'public/java-segfault-redux.html')
-rw-r--r-- | public/java-segfault-redux.html | 218 |
1 files changed, 218 insertions, 0 deletions
diff --git a/public/java-segfault-redux.html b/public/java-segfault-redux.html new file mode 100644 index 0000000..d491dcd --- /dev/null +++ b/public/java-segfault-redux.html @@ -0,0 +1,218 @@ +<!DOCTYPE html> +<html lang="en"> +<head> + <meta charset="utf-8"> + <title>My favorite bug: segfaults in Java (redux) — Luke T. Shumaker</title> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <link rel="stylesheet" href="assets/style.css"> + <link rel="alternate" type="application/atom+xml" href="./index.atom" name="web log entries"/> +</head> +<body> +<header><a href="/">Luke T. Shumaker</a> » <a href=/blog>blog</a> » java-segfault-redux</header> +<article> +<h1 id="my-favorite-bug-segfaults-in-java-redux">My favorite bug: +segfaults in Java (redux)</h1> +<p>Two years ago, I <a href="./java-segfault.html">wrote</a> about one +of my favorite bugs that I’d squashed two years before that. About a +year after that, someone posted it <a +href="https://news.ycombinator.com/item?id=9283571">on Hacker +News</a>.</p> +<p>There was some fun discussion about it, but also some confusion. +After finishing a season of mentoring team 4272, I’ve decided that it +would be fun to re-visit the article, and dig up the old actual code, +instead of pseudo-code, hopefully improving the clarity (and providing a +light introduction for anyone wanting to get into modifying the current +SmartDashbaord).</p> +<h2 id="the-context">The context</h2> +<p>In 2012, I was a high school senior, and lead programmer programmer +on the FIRST Robotics Competition team 1024. For the unfamiliar, the +relevant part of the setup is that there are 2 minute and 15 second +matches in which you have a 120 pound robot that sometimes runs +autonomously, and sometimes is controlled over WiFi from a person at a +laptop running stock “driver station” software and modifiable +“dashboard” software.</p> +<p>That year, we mostly used the dashboard software to allow the human +driver and operator to monitor sensors on the robot, one of them being a +video feed from a web-cam mounted on it. This was really easy because +the new standard dashboard program had a click-and drag interface to add +stock widgets; you just had to make sure the code on the robot was +actually sending the data.</p> +<p>That’s great, until when debugging things, the dashboard would +suddenly vanish. If it was run manually from a terminal (instead of +letting the driver station software launch it), you would see a core +dump indicating a segmentation fault.</p> +<p>This wasn’t just us either; I spoke with people on other teams, +everyone who was streaming video had this issue. But, because it only +happened every couple of minutes, and a match is only 2:15, it didn’t +need to run very long, they just crossed their fingers and hoped it +didn’t happen during a match.</p> +<p>The dashboard was written in Java, and the source was available +(under a 3-clause BSD license) via read-only SVN at +<code>http://firstforge.wpi.edu/svn/repos/smart_dashboard/trunk</code> +(which is unfortunately no longer online, fortunately I’d posted some +snapshots on the web). So I dove in, hunting for the bug.</p> +<p>The repository was divided into several NetBeans projects (not +exhaustively listed):</p> +<ul> +<li><a +href="https://gitorious.org/absfrc/sources.git/?p=absfrc:sources.git;a=blob_plain;f=smartdashboard-client-2012-1-any.src.tar.xz;hb=HEAD"><code>client/smartdashboard</code></a>: +The main dashboard program, has a plugin architecture.</li> +<li><a +href="https://gitorious.org/absfrc/sources.git/?p=absfrc:sources.git;a=blob_plain;f=wpijavacv-208-1-any.src.tar.xz;hb=HEAD"><code>WPIJavaCV</code></a>: +A higher-level wrapper around JavaCV, itself a Java Native Interface +(JNI) wrapper to talk to OpenCV (C and C++).</li> +<li><a +href="https://gitorious.org/absfrc/sources.git/?p=absfrc:sources.git;a=blob_plain;f=smartdashboard-extension-wpicameraextension-210-1-any.src.tar.xz;hb=HEAD"><code>extensions/camera/WPICameraExtension</code></a>: +The standard camera feed plugin, processes the video through +WPIJavaCV.</li> +</ul> +<p>I figured that the bug must be somewhere in the C or C++ code that +was being called by JavaCV, because that’s the language where segfaults +happen. It was especially a pain to track down the pointers that were +causing the issue, because it was hard with native debuggers to see +through all of the JVM stuff to the OpenCV code, and the OpenCV stuff is +opaque to Java debuggers.</p> +<p>Eventually the issue lead me back into the WPICameraExtension, then +into WPIJavaCV—there was a native pointer being stored in a Java +variable; Java code called the native routine to <code>free()</code> the +structure, but then tried to feed it to another routine later. This lead +to difficulty again—tracking objects with Java debuggers was hard +because they don’t expect the program to suddenly segfault; it’s Java +code, Java doesn’t segfault, it throws exceptions!</p> +<p>With the help of <code>println()</code> I was eventually able to see +that some code was executing in an order that straight didn’t make +sense.</p> +<h2 id="the-bug">The bug</h2> +<p>The basic flow of WPIJavaCV is you have a <code>WPICamera</code>, and +you call <code>.getNewImage()</code> on it, which gives you a +<code>WPIImage</code>, which you could do all kinds of fancy OpenCV +things on, but then ultimately call <code>.getBufferedImage()</code>, +which gives you a <code>java.awt.image.BufferedImage</code> that you can +pass to Swing to draw on the screen. You do this every for frame. Which +is exactly what <code>WPICameraExtension.java</code> did, except that +“all kinds of fancy OpenCV things” consisted only of:</p> +<pre><code>public WPIImage processImage(WPIColorImage rawImage) { + return rawImage; +}</code></pre> +<p>The idea was that you would extend the class, overriding that one +method, if you wanted to do anything fancy.</p> +<p>One of the neat things about WPIJavaCV was that every OpenCV object +class extended had a <code>finalize()</code> method (via inheriting from +the abstract class <code>WPIDisposable</code>) that freed the underlying +C/C++ memory, so you didn’t have to worry about memory leaks like in +plain JavaCV. To inherit from <code>WPIDisposable</code>, you had to +write a <code>disposed()</code> method that actually freed the memory. +This was better than writing <code>finalize()</code> directly, because +it did some safety with NULL pointers and idempotency if you wanted to +manually free something early.</p> +<p>Now, <code>edu.wpi.first.WPIImage.disposed()</code> called <code><a +href="https://github.com/bytedeco/javacv/blob/svn/src/com/googlecode/javacv/cpp/opencv_core.java#L398">com.googlecode.javacv.cpp.opencv_core.IplImage</a>.release()</code>, +which called (via JNI) <code>IplImage:::release()</code>, which called +libc <code>free()</code>:</p> +<pre><code>@Override +protected void disposed() { + image.release(); +}</code></pre> +<p>Elsewhere, the C buffer for the image was copied into a Java buffer +via a similar chain kicked off by +<code>edu.wpi.first.WPIImage.getBufferedImage()</code>:</p> +<pre><code>/** + * Copies this {@link WPIImage} into a {@link BufferedImage}. + * This method will always generate a new image. + * @return a copy of the image + */ +public BufferedImage getBufferedImage() { + validateDisposed(); + + return image.getBufferedImage(); +}</code></pre> +<p>The <code>println()</code> output I saw that didn’t make sense was +that <code>someFrame.finalize()</code> was running before +<code>someFrame.getBuffereImage()</code> had returned!</p> +<p>You see, if it is waiting for the return value of a method +<code>m()</code> of object <code>a</code>, and code in <code>m()</code> +that is yet to be executed doesn’t access any other methods or +properties of <code>a</code>, then it will go ahead and consider +<code>a</code> eligible for garbage collection before <code>m()</code> +has finished running.</p> +<p>Put another way, <code>this</code> is passed to a method just like +any other argument. If a method is done accessing <code>this</code>, +then it’s “safe” for the JVM to go ahead and garbage collect it.</p> +<p>That is normally a safe “optimization” to make… except for when a +destructor method (<code>finalize()</code>) is defined for the object; +the destructor can have side effects, and Java has no way to know +whether it is safe for them to happen before <code>m()</code> has +finished running.</p> +<p>I’m not entirely sure if this is a “bug” in the compiler or the +language specification, but I do believe that it’s broken behavior.</p> +<p>Anyway, in this case it’s unsafe with WPI’s code.</p> +<h2 id="my-work-around">My work-around</h2> +<p>My work-around was to change this function in +<code>WPIImage</code>:</p> +<pre><code>public BufferedImage getBufferedImage() { + validateDisposed(); + + return image.getBufferedImage(); // `this` may get garbage collected before it returns! +}</code></pre> +<p>In the above code, <code>this</code> is a <code>WPIImage</code>, and +it may get garbage collected between the time that +<code>image.getBufferedImage()</code> is dispatched, and the time that +<code>image.getBufferedImage()</code> accesses native memory. When it is +garbage collected, it calls <code>image.release()</code>, which +<code>free()</code>s that native memory. That seems pretty unlikely to +happen; that’s a very small gap of time. However, running 30 times a +second, eventually bad luck with the garbage collector happens, and the +program crashes.</p> +<p>The work-around was to insert a bogus call to this to keep +<code>this</code> around until after we were also done with +<code>image</code>:</p> +<p>to this:</p> +<pre><code>public BufferedImage getBufferedImage() { + validateDisposed(); + BufferedImage ret = image.getBufferedImage(); + getWidth(); // bogus call to keep `this` around + return ret; +}</code></pre> +<p>Yeah. After spending weeks wading through though thousands of lines +of Java, C, and C++, a bogus call to a method I didn’t care about was +the fix.</p> +<p>TheLoneWolfling on Hacker News noted that they’d be worried about the +JVM optimizing out the call to <code>getWidth()</code>. I’m not, because +<code>WPIImage.getWidth()</code> calls <code>IplImage.width()</code>, +which is declared as <code>native</code>; the JVM must run it because it +might have side effects. On the other hand, looking back, I think I just +shrunk the window for things to go wrong: it may be possible for the +garbage collection to trigger in the time between +<code>getWidth()</code> being dispatched and <code>width()</code> +running. Perhaps there was something in the C/C++ code that made it +safe, I don’t recall, and don’t care quite enough to dig into OpenCV +internals again. Or perhaps I’m mis-remembering the fix (which I don’t +actually have a file of), and I called some other method that +<em>could</em> get optimized out (though I <em>do</em> believe that it +was either <code>getWidth()</code> or <code>getHeight()</code>).</p> +<h2 id="wpis-fix">WPI’s fix</h2> +<p>Four years later, the SmartDashboard is still being used! But it no +longer has this bug, and it’s not using my workaround. So, how did the +WPILib developers fix it?</p> +<p>Well, the code now lives <a +href="https://usfirst.collab.net/gerrit/#/admin/projects/">in git at +collab.net</a>, so I decided to take a look.</p> +<p>The stripped out WPIJavaCV from the main video feed widget, and now +use a purely Java implementation of MPJPEG streaming.</p> +<p>However, the old video feed widget is still available as an extension +(so that you can still do cool things with <code>processImage</code>), +and it also no longer has this bug. Their fix was to put a mutex around +all accesses to <code>image</code>, which should have been the obvious +solution to me.</p> + +</article> +<footer> + <aside class="sponsor"><p>I'd love it if you <a class="em" + href="/sponsor/">sponsored me</a>. It will allow me to continue + my work on the GNU/Linux ecosystem. Thanks!</p></aside> + +<p>The content of this page is Copyright © 2016 <a href="mailto:lukeshu@lukeshu.com">Luke T. Shumaker</a>.</p> +<p>This page is licensed under the <a href="https://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA 4.0</a> license.</p> +</footer> +</body> +</html> |