summaryrefslogtreecommitdiff
path: root/public/java-segfault.html
diff options
context:
space:
mode:
Diffstat (limited to 'public/java-segfault.html')
-rw-r--r--public/java-segfault.html121
1 files changed, 121 insertions, 0 deletions
diff --git a/public/java-segfault.html b/public/java-segfault.html
new file mode 100644
index 0000000..4da6dec
--- /dev/null
+++ b/public/java-segfault.html
@@ -0,0 +1,121 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+ <meta charset="utf-8">
+ <title>My favorite bug: segfaults in Java — Luke T. Shumaker</title>
+ <meta name="viewport" content="width=device-width, initial-scale=1">
+ <link rel="stylesheet" href="assets/style.css">
+ <link rel="alternate" type="application/atom+xml" href="./index.atom" name="web log entries"/>
+</head>
+<body>
+<header><a href="/">Luke T. Shumaker</a> » <a href=/blog>blog</a> » java-segfault</header>
+<article>
+<h1 id="my-favorite-bug-segfaults-in-java">My favorite bug: segfaults in
+Java</h1>
+<blockquote>
+<p>Update: Two years later, I wrote a more detailed version of this
+article: <a href="./java-segfault-redux.html">My favorite bug: segfaults
+in Java (redux)</a>.</p>
+</blockquote>
+<p>I’ve told this story orally a number of times, but realized that I
+have never written it down. This is my favorite bug story; it might not
+be my hardest bug, but it is the one I most like to tell.</p>
+<h2 id="the-context">The context</h2>
+<p>In 2012, I was a Senior programmer on the FIRST Robotics Competition
+team 1024. For the unfamiliar, the relevant part of the setup is that
+there are 2 minute and 15 second matches in which you have a 120 pound
+robot that sometimes runs autonomously, and sometimes is controlled over
+WiFi from a person at a laptop running stock “driver station” software
+and modifiable “dashboard” software.</p>
+<p>That year, we mostly used the dashboard software to allow the human
+driver and operator to monitor sensors on the robot, one of them being a
+video feed from a web-cam mounted on it. This was really easy because
+the new standard dashboard program had a click-and drag interface to add
+stock widgets; you just had to make sure the code on the robot was
+actually sending the data.</p>
+<p>That’s great, until when debugging things, the dashboard would
+suddenly vanish. If it was run manually from a terminal (instead of
+letting the driver station software launch it), you would see a core
+dump indicating a segmentation fault.</p>
+<p>This wasn’t just us either; I spoke with people on other teams,
+everyone who was streaming video had this issue. But, because it only
+happened every couple of minutes, and a match is only 2:15, it didn’t
+need to run very long, they just crossed their fingers and hoped it
+didn’t happen during a match.</p>
+<p>The dashboard was written in Java, and the source was available
+(under a 3-clause BSD license), so I dove in, hunting for the bug. Now,
+the program did use Java Native Interface to talk to OpenCV, which the
+video ran through; so I figured that it must be a bug in the C/C++ code
+that was being called. It was especially a pain to track down the
+pointers that were causing the issue, because it was hard with native
+debuggers to see through all of the JVM stuff to the OpenCV code, and
+the OpenCV stuff is opaque to Java debuggers.</p>
+<p>Eventually the issue lead me back into the Java code—there was a
+native pointer being stored in a Java variable; Java code called the
+native routine to <code>free()</code> the structure, but then tried to
+feed it to another routine later. This lead to difficulty again—tracking
+objects with Java debuggers was hard because they don’t expect the
+program to suddenly segfault; it’s Java code, Java doesn’t segfault, it
+throws exceptions!</p>
+<p>With the help of <code>println()</code> I was eventually able to see
+that some code was executing in an order that straight didn’t make
+sense.</p>
+<h2 id="the-bug">The bug</h2>
+<p>The issue was that Java was making an unsafe optimization (I never
+bothered to figure out if it is the compiler or the JVM making the
+mistake, I was satisfied once I had a work-around).</p>
+<p>Java was doing something similar to tail-call optimization with
+regard to garbage collection. You see, if it is waiting for the return
+value of a method <code>m()</code> of object <code>o</code>, and code in
+<code>m()</code> that is yet to be executed doesn’t access any other
+methods or properties of <code>o</code>, then it will go ahead and
+consider <code>o</code> eligible for garbage collection before
+<code>m()</code> has finished running.</p>
+<p>That is normally a safe optimization to make… except for when a
+destructor method (<code>finalize()</code>) is defined for the object;
+the destructor can have side effects, and Java has no way to know
+whether it is safe for them to happen before <code>m()</code> has
+finished running.</p>
+<h2 id="the-work-around">The work-around</h2>
+<p>The routine that the segmentation fault was occurring in was
+something like:</p>
+<pre><code>public type1 getFrame() {
+ type2 child = this.getChild();
+ type3 var = this.something();
+ // `this` may now be garbage collected
+ return child.somethingElse(var); // segfault comes here
+}</code></pre>
+<p>Where the destructor method of <code>this</code> calls a method that
+will <code>free()</code> native memory that is also accessed by
+<code>child</code>; if <code>this</code> is garbage collected before
+<code>child.somethingElse()</code> runs, the backing native code will
+try to access memory that has been <code>free()</code>ed, and receive a
+segmentation fault. That usually didn’t happen, as the routines were
+pretty fast. However, running 30 times a second, eventually bad luck
+with the garbage collector happens, and the program crashes.</p>
+<p>The work-around was to insert a bogus call to this to keep
+<code>this</code> around until after we were also done with
+<code>child</code>:</p>
+<pre><code>public type1 getFrame() {
+ type2 child = this.getChild();
+ type3 var = this.something();
+ type1 ret = child.somethingElse(var);
+ this.getSize(); // bogus call to keep `this` around
+ return ret;
+}</code></pre>
+<p>Yeah. After spending weeks wading through though thousands of lines
+of Java, C, and C++, a bogus call to a method I didn’t care about was
+the fix.</p>
+
+</article>
+<footer>
+ <aside class="sponsor"><p>I'd love it if you <a class="em"
+ href="/sponsor/">sponsored me</a>. It will allow me to continue
+ <a class="em" href="/imworkingon/">my work</a> on the GNU/Linux
+ ecosystem. Thanks!</p></aside>
+
+<p>The content of this page is Copyright © 2014 <a href="mailto:lukeshu@lukeshu.com">Luke T. Shumaker</a>.</p>
+<p>This page is licensed under the <a href="https://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA 4.0</a> license.</p>
+</footer>
+</body>
+</html>