Luke Shumaker's Web Log
2016-08-27T19:26:20-04:00
Luke Shumakerhttps://lukeshu.com/lukeshu@sbcglobal.net
https://lukeshu.com/blog/
https://lukeshu.com/blog/x11-systemd.html
2016-08-27T19:26:20-04:00
2016-02-28T00:00:00+00:00
My X11 setup with systemd
<h1 id="my-x11-setup-with-systemd">My X11 setup with systemd</h1>
<p>Somewhere along the way, I decided to use systemd user sessions to manage the various parts of my X11 environment would be a good idea. If that was a good idea or not... we'll see.</p>
<p>I've sort-of been running this setup as my daily-driver for <a href="https://lukeshu.com/git/dotfiles.git/commit/?id=a9935b7a12a522937d91cb44a0e138132b555e16">a bit over a year</a>, continually tweaking it though.</p>
<p>My setup is substantially different than the one on <a href="https://wiki.archlinux.org/index.php/Systemd/User">ArchWiki</a>, because the ArchWiki solution assumes that there is only ever one X server for a user; I like the ability to run <code>Xorg</code> on my real monitor, and also have <code>Xvnc</code> running headless, or start my desktop environment on a remote X server. Though, I would like to figure out how to use systemd socket activation for the X server, as the ArchWiki solution does.</p>
<p>This means that all of my graphical units take <code>DISPLAY</code> as an <code>@</code> argument. To get this to all work out, this goes in each <code>.service</code> file, unless otherwise noted:</p>
<pre><code>[Unit]
After=X11@%i.target
Requisite=X11@%i.target
[Service]
Environment=DISPLAY=%I</code></pre>
<p>We'll get to <code>X11@.target</code> later, what it says is "I should only be running if X11 is running".</p>
<p>I eschew complex XDMs or <code>startx</code> wrapper scripts, opting for the more simple <code>xinit</code>, which I either run on login for some boxes (my media station), or type <code>xinit</code> when I want X11 on others (most everything else). Essentially, what <code>xinit</code> does is run <code>~/.xserverrc</code> (or <code>/etc/X11/xinit/xserverrc</code>) to start the server, then once the server is started (which it takes a substantial amount of magic to detect) it runs run <code>~/.xinitrc</code> (or <code>/etc/X11/xinit/xinitrc</code>) to start the clients. Once <code>.xinitrc</code> finishes running, it stops the X server and exits. Now, when I say "run", I don't mean execute, it passes each file to the system shell (<code>/bin/sh</code>) as input.</p>
<p>Xorg requires a TTY to run on; if we log in to a TTY with <code>logind</code>, it will give us the <code>XDG_VTNR</code> variable to tell us which one we have, so I pass this to <code>X</code> in <a href="https://lukeshu.com/git/dotfiles.git/tree/.config/X11/serverrc">my <code>.xserverrc</code></a>:</p>
<pre><code>#!/hint/sh
if [ -z "$XDG_VTNR" ]; then
exec /usr/bin/X -nolisten tcp "$@"
else
exec /usr/bin/X -nolisten tcp "$@" vt$XDG_VTNR
fi</code></pre>
<p>This was the default for <a href="https://projects.archlinux.org/svntogit/packages.git/commit/trunk/xserverrc?h=packages/xorg-xinit&id=f9f5de58df03aae6c8a8c8231a83327d19b943a1">a while</a> in Arch, to support <code>logind</code>, but was <a href="https://projects.archlinux.org/svntogit/packages.git/commit/trunk/xserverrc?h=packages/xorg-xinit&id=5a163ddd5dae300e7da4b027e28c37ad3b535804">later removed</a> in part because <code>startx</code> (which calls <code>xinit</code>) started adding it as an argument as well, so <code>vt$XDG_VTNR</code> was being listed as an argument twice, which is an error. IMO, that was a problem in <code>startx</code>, and they shouldn't have removed it from the default system <code>xserverrc</code>, but that's just me. So I copy/pasted it into my user <code>xserverrc</code>.</p>
<p>That's the boring part, though. Where the magic starts happening is in <a href="https://lukeshu.com/git/dotfiles.git/tree/.config/X11/clientrc">my <code>.xinitrc</code></a>:</p>
<pre><code>#!/hint/sh
if [ -z "$XDG_RUNTIME_DIR" ]; then
printf "XDG_RUNTIME_DIR isn't set\n" >&2
exit 6
fi
_DISPLAY="$(systemd-escape -- "$DISPLAY")"
trap "rm -f $(printf '%q' "${XDG_RUNTIME_DIR}/x11-wm@${_DISPLAY}")" EXIT
mkfifo "${XDG_RUNTIME_DIR}/x11-wm@${_DISPLAY}"
cat < "${XDG_RUNTIME_DIR}/x11-wm@${_DISPLAY}" &
systemctl --user start "X11@${_DISPLAY}.target" &
wait
systemctl --user stop "X11@${_DISPLAY}.target"</code></pre>
<p>There are two contracts/interfaces here: the <code>X11@DISPLAY.target</code> systemd target, and the <code>${XDG_RUNTIME_DIR}/x11-wm@DISPLAY</code> named pipe. The systemd <code>.target</code> should be pretty self explanatory; the most important part is that it starts the window manager. The named pipe is just a hacky way of blocking until the window manager exits ("traditional" <code>.xinitrc</code> files end with the line <code>exec your-window-manager</code>, so this mimics that behavior). It works by assuming that the window manager will open the pipe at startup, and keep it open (without necessarily writing anything to it); when the window manager exits, the pipe will get closed, sending EOF to the <code>wait</code>ed-for <code>cat</code>, allowing it to exit, letting the script resume. The window manager (WMII) is made to have the pipe opened by executing it this way in <a href="https://lukeshu.com/git/dotfiles/tree/.config/systemd/user/wmii@.service">its <code>.service</code> file</a>:</p>
<pre><code>ExecStart=/usr/bin/env bash -c 'exec 8>${XDG_RUNTIME_DIR}/x11-wm@%I; exec wmii'</code></pre>
<p>which just opens the file on file descriptor 8, then launches the window manager normally. The only further logic required by the window manager with regard to the pipe is that in the window manager <a href="https://lukeshu.com/git/dotfiles.git/tree/.config/wmii-hg/config.sh">configuration</a>, I should close that file descriptor after forking any process that isn't "part of" the window manager:</p>
<pre><code>runcmd() (
...
exec 8>&- # xinit/systemd handshake
...
)</code></pre>
<p>So, back to the <code>X11@DISPLAY.target</code>; I configure what it "does" with symlinks in the <code>.requires</code> and <code>.wants</code> directories:</p>
<ul class="tree">
<li>
<p><a href="https://lukeshu.com/git/dotfiles/tree/.config/systemd/user">.config/systemd/user/</a></p>
<ul>
<li><a href="https://lukeshu.com/git/dotfiles/tree/.config/systemd/user/X11@.target">X11@.target</a></li>
<li><a href="https://lukeshu.com/git/dotfiles/tree/.config/systemd/user/X11@.target.requires">X11@.target.requires</a>/
<ul>
<li>wmii@.service -> ../<a href="https://lukeshu.com/git/dotfiles/tree/.config/systemd/user/wmii@.service">wmii@.service</a></li>
</ul></li>
<li><a href="https://lukeshu.com/git/dotfiles/tree/.config/systemd/user/X11@.target.wants">X11@.target.wants</a>/
<ul>
<li>xmodmap@.service -> ../<a href="https://lukeshu.com/git/dotfiles/tree/.config/systemd/user/xmodmap@.service">xmodmap@.service</a></li>
<li>xresources-dpi@.service -> ../<a href="https://lukeshu.com/git/dotfiles/tree/.config/systemd/user/xresources-dpi@.service">xresources-dpi@.service</a></li>
<li>xresources@.service -> ../<a href="https://lukeshu.com/git/dotfiles/tree/.config/systemd/user/xresources@.service">xresources@.service</a></li>
</ul></li>
</ul>
</li>
</ul>
<p>The <code>.requires</code> directory is how I configure which window manager it starts. This would allow me to configure different window managers on different displays, by creating a <code>.requires</code> directory with the <code>DISPLAY</code> included, e.g. <code>X11@:2.requires</code>.</p>
<p>The <code>.wants</code> directory is for general X display setup; it's analogous to <code>/etc/X11/xinit/xinitrc.d/</code>. All of the files in it are simple <code>Type=oneshot</code> service files. The <a href="https://lukeshu.com/git/dotfiles/tree/.config/systemd/user/xmodmap@.service">xmodmap</a> and <a href="https://lukeshu.com/git/dotfiles/tree/.config/systemd/user/xresources@.service">xresources</a> files are pretty boring, they're just systemd versions of the couple lines that just about every traditional <code>.xinitrc</code> contains, the biggest difference being that they look at <a href="https://lukeshu.com/git/dotfiles.git/tree/.config/X11/modmap"><code>~/.config/X11/modmap</code></a> and <a href="https://lukeshu.com/git/dotfiles.git/tree/.config/X11/resources"><code>~/.config/X11/resources</code></a> instead of the traditional locations <code>~/.xmodmap</code> and <code>~/.Xresources</code>.</p>
<p>What's possibly of note is <a href="https://lukeshu.com/git/dotfiles/tree/.config/systemd/user/xresources-dpi@.service"><code>xresources-dpi@.service</code></a>. In X11, there are two sources of DPI information, the X display resolution, and the XRDB <code>Xft.dpi</code> setting. It isn't defined which takes precedence (to my knowledge), and even if it were (is), application authors wouldn't be arsed to actually do the right thing. For years, Firefox (well, Iceweasel) happily listened to the X display resolution, but recently it decided to only look at <code>Xft.dpi</code>, which objectively seems a little silly, since the X display resolution is always present, but <code>Xft.dpi</code> isn't. Anyway, Mozilla's change drove me to to create a <a href="https://lukeshu.com/git/dotfiles/tree/.local/bin/xrdb-set-dpi">script</a> to make the <code>Xft.dpi</code> setting match the X display resolution. Disclaimer: I have no idea if it works if the X server has multiple displays (with possibly varying resolution).</p>
<pre><code>#!/usr/bin/env bash
dpi=$(LC_ALL=C xdpyinfo|sed -rn 's/^\s*resolution:\s*(.*) dots per inch$/\1/p')
xrdb -merge <<<"Xft.dpi: ${dpi}"</code></pre>
<p>Since we want XRDB to be set up before any other programs launch, we give both of the <code>xresources</code> units <code>Before=X11@%i.target</code> (instead of <code>After=</code> like everything else). Also, two programs writing to <code>xrdb</code> at the same time has the same problem as two programs writing to the same file; one might trash the other's changes. So, I stuck <code>Conflicts=xresources@:i.service</code> into <code>xresources-dpi.service</code>.</p>
<p>And that's the "core" of my X11 systemd setup. But, you generally want more things running than just the window manager, like a desktop notification daemon, a system panel, and an X composition manager (unless your window manager is bloated and has a composition manager built in). Since these things are probably window-manager specific, I've stuck them in a directory <code>wmii@.service.wants</code>:</p>
<ul class="tree">
<li>
<p><a href="https://lukeshu.com/git/dotfiles/tree/.config/systemd/user">.config/systemd/user/</a></p>
<ul>
<li><a href="https://lukeshu.com/git/dotfiles/tree/.config/systemd/user/wmii@.service.wants">wmii@.service.wants</a>/
<ul>
<li>dunst@.service -> ../<a href="https://lukeshu.com/git/dotfiles/tree/.config/systemd/user/dunst@.service">dunst@.service</a> # a notification daemon</li>
<li>lxpanel@.service -> ../<a href="https://lukeshu.com/git/dotfiles/tree/.config/systemd/user/lxpanel@.service">lxpanel@.service</a> # a system panel</li>
<li>rbar@97_acpi.service -> ../<a href="https://lukeshu.com/git/dotfiles/tree/.config/systemd/user/rbar@.service">rbar@.service</a> # wmii stuff</li>
<li>rbar@99_clock.service -> ../<a href="https://lukeshu.com/git/dotfiles/tree/.config/systemd/user/rbar@.service">rbar@.service</a> # wmii stuff</li>
<li>xcompmgr@.service -> ../<a href="https://lukeshu.com/git/dotfiles/tree/.config/systemd/user/xcompmgr@.service">xcompmgr@.service</a> # an X composition manager</li>
</ul></li>
</ul>
</li>
</ul>
<p>For the window manager <code>.service</code>, I <em>could</em> just say <code>Type=simple</code> and call it a day (and I did for a while). But, I like to have <code>lxpanel</code> show up on all of my WMII tags (desktops), so I have <a href="https://lukeshu.com/git/dotfiles.git/tree/.config/wmii-hg/config.sh">my WMII configuration</a> stick this in the WMII <a href="https://lukeshu.com/git/dotfiles.git/tree/.config/wmii-hg/rules"><code>/rules</code></a>:</p>
<pre><code>/panel/ tags=/.*/ floating=always</code></pre>
<p>Unfortunately, for this to work, <code>lxpanel</code> must be started <em>after</em> that gets inserted into WMII's rules. That wasn't a problem pre-systemd, because <code>lxpanel</code> was started by my WMII configuration, so ordering was simple. For systemd to get this right, I must have a way of notifying systemd that WMII's fully started, and it's safe to start <code>lxpanel</code>. So, I stuck this in <a href="https://lukeshu.com/git/dotfiles/tree/.config/systemd/user/wmii@.service">my WMII <code>.service</code> file</a>:</p>
<pre><code># This assumes that you write READY=1 to $NOTIFY_SOCKET in wmiirc
Type=notify
NotifyAccess=all</code></pre>
<p>and this in <a href="https://lukeshu.com/git/dotfiles.git/tree/.config/wmii-hg/wmiirc">my WMII configuration</a>:</p>
<pre><code>systemd-notify --ready || true</code></pre>
<p>Now, this setup means that <code>NOTIFY_SOCKET</code> is set for all the children of <code>wmii</code>; I'd rather not have it leak into the applications that I start from the window manager, so I also stuck <code>unset NOTIFY_SOCKET</code> after forking a process that isn't part of the window manager:</p>
<pre><code>runcmd() (
...
unset NOTIFY_SOCKET # systemd
...
exec 8>&- # xinit/systemd handshake
...
)</code></pre>
<p>Unfortunately, because of a couple of <a href="https://github.com/systemd/systemd/issues/2739">bugs</a> and <a href="https://github.com/systemd/systemd/issues/2737">race conditions</a> in systemd, <code>systemd-notify</code> isn't reliable. If systemd can't receive the <code>READY=1</code> signal from my WMII configuration, there are two consequences:</p>
<ol type="1">
<li><code>lxpanel</code> will never start, because it will always be waiting for <code>wmii</code> to be ready, which will never happen.</li>
<li>After a couple of minutes, systemd will consider <code>wmii</code> to be timed out, which is a failure, so then it will kill <code>wmii</code>, and exit my X11 session. That's no good!</li>
</ol>
<p>Using <code>socat</code> to send the message to systemd instead of <code>systemd-notify</code> "should" always work, because it tries to read from both ends of the bi-directional stream, and I can't imagine that getting EOF from the <code>UNIX-SENDTO</code> end will ever be faster than the systemd manager from handling the datagram that got sent. Which is to say, "we work around the race condition by being slow and shitty."</p>
<pre><code>socat STDIO UNIX-SENDTO:"$NOTIFY_SOCKET" <<<READY=1 || true</code></pre>
<p>But, I don't like that. I'd rather write my WMII configuration to the world as I wish it existed, and have workarounds encapsulated elsewhere; <a href="http://blog.robertelder.org/interfaces-most-important-software-engineering-concept/">"If you have to cut corners in your project, do it inside the implementation, and wrap a very good interface around it."</a>. So, I wrote a <code>systemd-notify</code> compatible <a href="https://lukeshu.com/git/dotfiles.git/tree/.config/wmii-hg/workarounds.sh">function</a> that ultimately calls <code>socat</code>:</p>
<pre><code>##
# Just like systemd-notify(1), but slower, which is a shitty
# workaround for a race condition in systemd.
##
systemd-notify() {
local args
args="$(getopt -n systemd-notify -o h -l help,version,ready,pid::,status:,booted -- "$@")"
ret=$?; [[ $ret == 0 ]] || return $ret
eval set -- "$args"
local arg_ready=false
local arg_pid=0
local arg_status=
while [[ $# -gt 0 ]]; do
case "$1" in
-h|--help) command systemd-notify --help; return $?;;
--version) command systemd-notify --version; return $?;;
--ready) arg_ready=true; shift 1;;
--pid) arg_pid=${2:-$$}; shift 2;;
--status) arg_status=$2; shift 2;;
--booted) command systemd-notify --booted; return $?;;
--) shift 1; break;;
esac
done
local our_env=()
if $arg_ready; then
our_env+=("READY=1")
fi
if [[ -n "$arg_status" ]]; then
our_env+=("STATUS=$arg_status")
fi
if [[ "$arg_pid" -gt 0 ]]; then
our_env+=("MAINPID=$arg_pid")
fi
our_env+=("$@")
local n
printf -v n '%s\n' "${our_env[@]}"
socat STDIO UNIX-SENDTO:"$NOTIFY_SOCKET" <<<"$n"
}</code></pre>
<p>So, one day when the systemd bugs have been fixed (and presumably the Linux kernel supports passing the cgroup of a process as part of its credentials), I can remove that from <code>workarounds.sh</code>, and not have to touch anything else in my WMII configuration (I do use <code>systemd-notify</code> in a couple of other, non-essential, places too; this wasn't to avoid having to change just 1 line).</p>
<p>So, now that <code>wmii@.service</code> properly has <code>Type=notify</code>, I can just stick <code>After=wmii@.service</code> into my <code>lxpanel@.service</code>, right? Wrong! Well, I <em>could</em>, but my <code>lxpanel</code> service has nothing to do with WMII; why should I couple them? Instead, I create <a href="https://lukeshu.com/git/dotfiles/tree/.config/systemd/user/wm-running@.target"><code>wm-running@.target</code></a> that can be used as a synchronization point:</p>
<pre><code># wmii@.service
Before=wm-running@%i.target
# lxpanel@.service
After=X11@%i.target wm-running@%i.target
Requires=wm-running@%i.target</code></pre>
<p>Finally, I have my desktop started and running. Now, I'd like for programs that aren't part of the window manager to not dump their stdout and stderr into WMII's part of the journal, like to have a record of which graphical programs crashed, and like to have a prettier cgroup/process graph. So, I use <code>systemd-run</code> to run external programs from the window manager:</p>
<pre><code>runcmd() (
...
unset NOTIFY_SOCKET # systemd
...
exec 8>&- # xinit/systemd handshake
exec systemd-run --user --scope -- sh -c "$*"
)</code></pre>
<p>I run them as a scope instead of a service so that they inherit environment variables, and don't have to mess with getting <code>DISPLAY</code> or <code>XAUTHORITY</code> into their units (as I <em>don't</em> want to make them global variables in my systemd user session).</p>
<p>I'd like to get <code>lxpanel</code> to also use <code>systemd-run</code> when launching programs, but it's a low priority because I don't really actually use <code>lxpanel</code> to launch programs, I just have the menu there to make sure that I didn't break the icons for programs that I package (I did that once back when I was Parabola's packager for Iceweasel and IceCat).</p>
<p>And that's how I use systemd with X11.</p>
Luke Shumakerhttps://lukeshu.com/lukeshu@sbcglobal.net
<p>The content of this page is Copyright © 2016 <a href="mailto:lukeshu@sbcglobal.net">Luke Shumaker</a>.</p>
<p>This page is licensed under the <a href="https://creativecommons.org/licenses/by-sa/3.0/">CC BY-SA-3.0</a> license.</p>
https://lukeshu.com/blog/java-segfault-redux.html
2016-05-02T02:28:19-04:00
2016-02-28T00:00:00+00:00
My favorite bug: segfaults in Java (redux)
<h1 id="my-favorite-bug-segfaults-in-java-redux">My favorite bug: segfaults in Java (redux)</h1>
<p>Two years ago, I <a href="./java-segfault.html">wrote</a> about one of my favorite bugs that I'd squashed two years before that. About a year after that, someone posted it <a href="https://news.ycombinator.com/item?id=9283571">on Hacker News</a>.</p>
<p>There was some fun discussion about it, but also some confusion. After finishing a season of mentoring team 4272, I've decided that it would be fun to re-visit the article, and dig up the old actual code, instead of pseudo-code, hopefully improving the clarity (and providing a light introduction for anyone wanting to get into modifying the current SmartDashbaord).</p>
<h2 id="the-context">The context</h2>
<p>In 2012, I was a high school senior, and lead programmer programmer on the FIRST Robotics Competition team 1024. For the unfamiliar, the relevant part of the setup is that there are 2 minute and 15 second matches in which you have a 120 pound robot that sometimes runs autonomously, and sometimes is controlled over WiFi from a person at a laptop running stock "driver station" software and modifiable "dashboard" software.</p>
<p>That year, we mostly used the dashboard software to allow the human driver and operator to monitor sensors on the robot, one of them being a video feed from a web-cam mounted on it. This was really easy because the new standard dashboard program had a click-and drag interface to add stock widgets; you just had to make sure the code on the robot was actually sending the data.</p>
<p>That's great, until when debugging things, the dashboard would suddenly vanish. If it was run manually from a terminal (instead of letting the driver station software launch it), you would see a core dump indicating a segmentation fault.</p>
<p>This wasn't just us either; I spoke with people on other teams, everyone who was streaming video had this issue. But, because it only happened every couple of minutes, and a match is only 2:15, it didn't need to run very long, they just crossed their fingers and hoped it didn't happen during a match.</p>
<p>The dashboard was written in Java, and the source was available (under a 3-clause BSD license) via read-only SVN at <code>http://firstforge.wpi.edu/svn/repos/smart_dashboard/trunk</code> (which is unfortunately no longer online, fortunately I'd posted some snapshots on the web). So I dove in, hunting for the bug.</p>
<p>The repository was divided into several NetBeans projects (not exhaustively listed):</p>
<ul>
<li><a href="https://gitorious.org/absfrc/sources.git/?p=absfrc:sources.git;a=blob_plain;f=smartdashboard-client-2012-1-any.src.tar.xz;hb=HEAD"><code>client/smartdashboard</code></a>: The main dashboard program, has a plugin architecture.</li>
<li><a href="https://gitorious.org/absfrc/sources.git/?p=absfrc:sources.git;a=blob_plain;f=wpijavacv-208-1-any.src.tar.xz;hb=HEAD"><code>WPIJavaCV</code></a>: A higher-level wrapper around JavaCV, itself a Java Native Interface (JNI) wrapper to talk to OpenCV (C and C++).</li>
<li><a href="https://gitorious.org/absfrc/sources.git/?p=absfrc:sources.git;a=blob_plain;f=smartdashboard-extension-wpicameraextension-210-1-any.src.tar.xz;hb=HEAD"><code>extensions/camera/WPICameraExtension</code></a>: The standard camera feed plugin, processes the video through WPIJavaCV.</li>
</ul>
<p>I figured that the bug must be somewhere in the C or C++ code that was being called by JavaCV, because that's the language where segfaults happen. It was especially a pain to track down the pointers that were causing the issue, because it was hard with native debuggers to see through all of the JVM stuff to the OpenCV code, and the OpenCV stuff is opaque to Java debuggers.</p>
<p>Eventually the issue lead me back into the WPICameraExtension, then into WPIJavaCV--there was a native pointer being stored in a Java variable; Java code called the native routine to <code>free()</code> the structure, but then tried to feed it to another routine later. This lead to difficulty again--tracking objects with Java debuggers was hard because they don't expect the program to suddenly segfault; it's Java code, Java doesn't segfault, it throws exceptions!</p>
<p>With the help of <code>println()</code> I was eventually able to see that some code was executing in an order that straight didn't make sense.</p>
<h2 id="the-bug">The bug</h2>
<p>The basic flow of WPIJavaCV is you have a <code>WPICamera</code>, and you call <code>.getNewImage()</code> on it, which gives you a <code>WPIImage</code>, which you could do all kinds of fancy OpenCV things on, but then ultimately call <code>.getBufferedImage()</code>, which gives you a <code>java.awt.image.BufferedImage</code> that you can pass to Swing to draw on the screen. You do this every for frame. Which is exactly what <code>WPICameraExtension.java</code> did, except that "all kinds of fancy OpenCV things" consisted only of:</p>
<pre><code>public WPIImage processImage(WPIColorImage rawImage) {
return rawImage;
}</code></pre>
<p>The idea was that you would extend the class, overriding that one method, if you wanted to do anything fancy.</p>
<p>One of the neat things about WPIJavaCV was that every OpenCV object class extended had a <code>finalize()</code> method (via inheriting from the abstract class <code>WPIDisposable</code>) that freed the underlying C/C++ memory, so you didn't have to worry about memory leaks like in plain JavaCV. To inherit from <code>WPIDisposable</code>, you had to write a <code>disposed()</code> method that actually freed the memory. This was better than writing <code>finalize()</code> directly, because it did some safety with NULL pointers and idempotency if you wanted to manually free something early.</p>
<p>Now, <code>edu.wpi.first.WPIImage.disposed()</code> called <code><a href="https://github.com/bytedeco/javacv/blob/svn/src/com/googlecode/javacv/cpp/opencv_core.java#L398">com.googlecode.javacv.cpp.opencv_core.IplImage</a>.release()</code>, which called (via JNI) <code>IplImage:::release()</code>, which called libc <code>free()</code>:</p>
<pre><code>@Override
protected void disposed() {
image.release();
}</code></pre>
<p>Elsewhere, the C buffer for the image was copied into a Java buffer via a similar chain kicked off by <code>edu.wpi.first.WPIImage.getBufferedImage()</code>:</p>
<pre><code>/**
* Copies this {@link WPIImage} into a {@link BufferedImage}.
* This method will always generate a new image.
* @return a copy of the image
*/
public BufferedImage getBufferedImage() {
validateDisposed();
return image.getBufferedImage();
}</code></pre>
<p>The <code>println()</code> output I saw that didn't make sense was that <code>someFrame.finalize()</code> was running before <code>someFrame.getBuffereImage()</code> had returned!</p>
<p>You see, if it is waiting for the return value of a method <code>m()</code> of object <code>a</code>, and code in <code>m()</code> that is yet to be executed doesn't access any other methods or properties of <code>a</code>, then it will go ahead and consider <code>a</code> eligible for garbage collection before <code>m()</code> has finished running.</p>
<p>Put another way, <code>this</code> is passed to a method just like any other argument. If a method is done accessing <code>this</code>, then it's "safe" for the JVM to go ahead and garbage collect it.</p>
<p>That is normally a safe "optimization" to make… except for when a destructor method (<code>finalize()</code>) is defined for the object; the destructor can have side effects, and Java has no way to know whether it is safe for them to happen before <code>m()</code> has finished running.</p>
<p>I'm not entirely sure if this is a "bug" in the compiler or the language specification, but I do believe that it's broken behavior.</p>
<p>Anyway, in this case it's unsafe with WPI's code.</p>
<h2 id="my-work-around">My work-around</h2>
<p>My work-around was to change this function in <code>WPIImage</code>:</p>
<pre><code>public BufferedImage getBufferedImage() {
validateDisposed();
return image.getBufferedImage(); // `this` may get garbage collected before it returns!
}</code></pre>
<p>In the above code, <code>this</code> is a <code>WPIImage</code>, and it may get garbage collected between the time that <code>image.getBufferedImage()</code> is dispatched, and the time that <code>image.getBufferedImage()</code> accesses native memory. When it is garbage collected, it calls <code>image.release()</code>, which <code>free()</code>s that native memory. That seems pretty unlikely to happen; that's a very small gap of time. However, running 30 times a second, eventually bad luck with the garbage collector happens, and the program crashes.</p>
<p>The work-around was to insert a bogus call to this to keep <code>this</code> around until after we were also done with <code>image</code>:</p>
<p>to this:</p>
<pre><code>public BufferedImage getBufferedImage() {
validateDisposed();
BufferedImage ret = image.getBufferedImage();
getWidth(); // bogus call to keep `this` around
return ret;
}</code></pre>
<p>Yeah. After spending weeks wading through though thousands of lines of Java, C, and C++, a bogus call to a method I didn't care about was the fix.</p>
<p>TheLoneWolfling on Hacker News noted that they'd be worried about the JVM optimizing out the call to <code>getWidth()</code>. I'm not, because <code>WPIImage.getWidth()</code> calls <code>IplImage.width()</code>, which is declared as <code>native</code>; the JVM must run it because it might have side effects. On the other hand, looking back, I think I just shrunk the window for things to go wrong: it may be possible for the garbage collection to trigger in the time between <code>getWidth()</code> being dispatched and <code>width()</code> running. Perhaps there was something in the C/C++ code that made it safe, I don't recall, and don't care quite enough to dig into OpenCV internals again. Or perhaps I'm mis-remembering the fix (which I don't actually have a file of), and I called some other method that <em>could</em> get optimized out (though I <em>do</em> believe that it was either <code>getWidth()</code> or <code>getHeight()</code>).</p>
<h2 id="wpis-fix">WPI's fix</h2>
<p>Four years later, the SmartDashboard is still being used! But it no longer has this bug, and it's not using my workaround. So, how did the WPILib developers fix it?</p>
<p>Well, the code now lives <a href="https://usfirst.collab.net/gerrit/#/admin/projects/">in git at collab.net</a>, so I decided to take a look.</p>
<p>The stripped out WPIJavaCV from the main video feed widget, and now use a purely Java implementation of MPJPEG streaming.</p>
<p>However, the old video feed widget is still available as an extension (so that you can still do cool things with <code>processImage</code>), and it also no longer has this bug. Their fix was to put a mutex around all accesses to <code>image</code>, which should have been the obvious solution to me.</p>
Luke Shumakerhttps://lukeshu.com/lukeshu@sbcglobal.net
<p>The content of this page is Copyright © 2016 <a href="mailto:lukeshu@sbcglobal.net">Luke Shumaker</a>.</p>
<p>This page is licensed under the <a href="https://creativecommons.org/licenses/by-sa/3.0/">CC BY-SA-3.0</a> license.</p>
https://lukeshu.com/blog/nginx-mediawiki.html
2015-05-19T23:53:52-06:00
2015-05-19T00:00:00+00:00
An Nginx configuration for MediaWiki
<h1 id="an-nginx-configuration-for-mediawiki">An Nginx configuration for MediaWiki</h1>
<p>There are <a href="http://wiki.nginx.org/MediaWiki">several</a> <a href="https://wiki.archlinux.org/index.php/MediaWiki#Nginx">example</a> <a href="https://www.mediawiki.org/wiki/Manual:Short_URL/wiki/Page_title_--_nginx_rewrite--root_access">Nginx</a> <a href="https://www.mediawiki.org/wiki/Manual:Short_URL/Page_title_-_nginx,_Root_Access,_PHP_as_a_CGI_module">configurations</a> <a href="http://wiki.nginx.org/RHEL_5.4_%2B_Nginx_%2B_Mediawiki">for</a> <a href="http://stackoverflow.com/questions/11080666/mediawiki-on-nginx">MediaWiki</a> floating around the web. Many of them don't block the user from accessing things like <code>/serialized/</code>. Many of them also <a href="https://labs.parabola.nu/issues/725">don't correctly handle</a> a wiki page named <code>FAQ</code>, since that is a name of a file in the MediaWiki root! In fact, the configuration used on the official Nginx Wiki has both of those issues!</p>
<p>This is because most of the configurations floating around basically try to pass all requests through, and blacklist certain requests, either denying them, or passing them through to <code>index.php</code>.</p>
<p>It's my view that blacklisting is inferior to whitelisting in situations like this. So, I developed the following configuration that instead works by whitelisting certain paths.</p>
<pre><code>root /path/to/your/mediawiki; # obviously, change this line
index index.php;
location / { try_files /var/empty @rewrite; }
location /images/ { try_files $uri $uri/ @rewrite; }
location /skins/ { try_files $uri $uri/ @rewrite; }
location /api.php { try_files /var/empty @php; }
location /api.php5 { try_files /var/empty @php; }
location /img_auth.php { try_files /var/empty @php; }
location /img_auth.php5 { try_files /var/empty @php; }
location /index.php { try_files /var/empty @php; }
location /index.php5 { try_files /var/empty @php; }
location /load.php { try_files /var/empty @php; }
location /load.php5 { try_files /var/empty @php; }
location /opensearch_desc.php { try_files /var/empty @php; }
location /opensearch_desc.php5 { try_files /var/empty @php; }
location /profileinfo.php { try_files /var/empty @php; }
location /thumb.php { try_files /var/empty @php; }
location /thumb.php5 { try_files /var/empty @php; }
location /thumb_handler.php { try_files /var/empty @php; }
location /thumb_handler.php5 { try_files /var/empty @php; }
location /wiki.phtml { try_files /var/empty @php; }
location @rewrite {
rewrite ^/(.*)$ /index.php?title=$1&$args;
}
location @php {
# obviously, change this according to your PHP setup
include fastcgi.conf;
fastcgi_pass unix:/run/php-fpm/wiki.sock;
}</code></pre>
<p>We are now using this configuration on <a href="https://wiki.parabola.nu/">ParabolaWiki</a>, but with an alias for <code>location = /favicon.ico</code> to the correct file in the skin, and with FastCGI caching for PHP.</p>
<p>The only thing I don't like about this is the <code>try_files /var/emtpy</code> bits--surely there is a better way to have it go to one of the <code>@</code> location blocks, but I couldn't figure it out.</p>
Luke Shumakerhttps://lukeshu.com/lukeshu@sbcglobal.net
<p>The content of this page is Copyright © 2015 <a href="mailto:lukeshu@sbcglobal.net">Luke Shumaker</a>.</p>
<p>This page is licensed under the <a href="https://creativecommons.org/licenses/by-sa/3.0/">CC BY-SA-3.0</a> license.</p>
https://lukeshu.com/blog/lp2015-videos.html
2015-03-22T07:52:39-04:00
2015-03-22T00:00:00+00:00
I took some videos at LibrePlanet
<h1 id="i-took-some-videos-at-libreplanet">I took some videos at LibrePlanet</h1>
<p>I'm at <a href="https://libreplanet.org/2015/">LibrePlanet</a>, and have been loving the talks. For most of yesterday, there was a series of short "lightning" talks in room 144. I decided to hang out in that room for the later part of the day, because while most of the talks were live streamed and recorded, there were no cameras in room 144; so I couldn't watch them later.</p>
<p>Way too late in the day, I remembered that I have the capability to record videos, so I cought the last two talks in 144.</p>
<p>I appologize for the changing orientation.</p>
<p><a href="https://lukeshu.com/dump/lp-2015-last-2-short-talks.ogg">Here's the video I took</a>.</p>
Luke Shumakerhttps://lukeshu.com/lukeshu@sbcglobal.net
<p>The content of this page is Copyright © 2015 <a href="mailto:lukeshu@sbcglobal.net">Luke Shumaker</a>.</p>
<p>This page is licensed under the <a href="https://creativecommons.org/licenses/by-sa/3.0/">CC BY-SA-3.0</a> license.</p>
https://lukeshu.com/blog/build-bash-1.html
2016-05-02T02:20:41-04:00
2015-03-18T00:00:00+00:00
Building Bash 1.14.7 on a modern system
<h1 id="building-bash-1.14.7-on-a-modern-system">Building Bash 1.14.7 on a modern system</h1>
<p>In a previous revision of my <a href="./bash-arrays.html">Bash arrays post</a>, I wrote:</p>
<blockquote>
<p>Bash 1.x won't compile with modern GCC, so I couldn't verify how it behaves.</p>
</blockquote>
<p>I recall spending a little time fighting with it, but apparently I didn't try very hard: getting Bash 1.14.7 to build on a modern box is mostly just adjusting it to use <code>stdarg</code> instead of the no-longer-implemented <code>varargs</code>. There's also a little fiddling with the pre-autoconf automatic configuration.</p>
<h2 id="stdarg">stdarg</h2>
<p>Converting to <code>stdarg</code> is pretty simple: For each variadic function (functions that take a variable number of arguments), follow these steps:</p>
<ol type="1">
<li>Replace <code>#include <varargs.h></code> with <code>#include <stdarg.h></code></li>
<li>Replace <code>function_name (va_alist) va_dcl</code> with <code>function_name (char *format, ...)</code>.</li>
<li>Removing the declaration and assignment for <code>format</code> from the function body.</li>
<li>Replace <code>va_start (args);</code> with <code>va_start (args, format);</code> in the function bodies.</li>
<li>Replace <code>function_name ();</code> with <code>function_name (char *, ...)</code> in header files and/or at the top of C files.</li>
</ol>
<p>There's one function that uses the variable name <code>control</code> instead of <code>format</code>.</p>
<p>I've prepared <a href="./bash-1.14.7-gcc4-stdarg.patch">a patch</a> that does this.</p>
<h2 id="configuration">Configuration</h2>
<p>Instead of using autoconf-style tests to test for compiler and platform features, Bash 1 used the file <code>machines.h</code> that had <code>#ifdefs</code> and a huge database of of different operating systems for different platforms. It's gross. And quite likely won't handle your modern operating system.</p>
<p>I made these two small changes to <code>machines.h</code> to get it to work correctly on my box:</p>
<ol type="1">
<li>Replace <code>#if defined (i386)</code> with <code>#if defined (i386) || defined (__x86_64__)</code>. The purpose of this is obvious.</li>
<li>Add <code>#define USE_TERMCAP_EMULATION</code> to the section for Linux [sic] on i386 (<code># if !defined (done386) && (defined (__linux__) || defined (linux))</code>). What this does is tell it to link against libcurses to use curses termcap emulation, instead of linking against libtermcap (which doesn't exist on modern GNU/Linux systems).</li>
</ol>
<p>Again, I've prepared <a href="./bash-1.14.7-machines-config.patch">a patch</a> that does this.</p>
<h2 id="building">Building</h2>
<p>With those adjustments, it should build, but with quite a few warnings. Making a couple of changes to <code>CFLAGS</code> should fix that:</p>
<pre><code>make CFLAGS='-O -g -Werror -Wno-int-to-pointer-cast -Wno-pointer-to-int-cast -Wno-deprecated-declarations -include stdio.h -include stdlib.h -include string.h -Dexp2=bash_exp2'</code></pre>
<p>That's a doozy! Let's break it down:</p>
<ul>
<li><code>-O -g</code> The default value for CFLAGS (defined in <code>cpp-Makefile</code>)</li>
<li><code>-Werror</code> Treat warnings as errors; force us to deal with any issues.</li>
<li><code>-Wno-int-to-pointer-cast -Wno-pointer-to-int-cast</code> Allow casting between integers and pointers. Unfortunately, the way this version of Bash was designed requires this.</li>
<li><code>-Wno-deprecated-declarations</code> The <code>getwd</code> function in <code>unistd.h</code> is considered deprecated (use <code>getcwd</code> instead). However, if <code>getcwd</code> is available, Bash uses it's own <code>getwd</code> wrapper around <code>getcwd</code> (implemented in <code>general.c</code>), and only uses the signature from <code>unistd.h</code>, not the actuall implementation from libc.</li>
<li><code>-include stdio.h -include stdlib.h -include string.h</code> Several files are missing these header file includes. If not for <code>-Werror</code>, the default function signature fallbacks would work.</li>
<li><code>-Dexp2=bash_exp2</code> Avoid a conflict between the parser's <code>exp2</code> helper function and <code>math.h</code>'s base-2 exponential function.</li>
</ul>
<p>Have fun, software archaeologists!</p>
Luke Shumakerhttps://lukeshu.com/lukeshu@sbcglobal.net
<p>The content of this page is Copyright © 2015 <a href="mailto:lukeshu@sbcglobal.net">Luke Shumaker</a>.</p>
<p>This page is licensed under the <a href="https://creativecommons.org/licenses/by-sa/3.0/">CC BY-SA-3.0</a> license.</p>
https://lukeshu.com/blog/purdue-cs-login.html
2016-03-21T02:34:10-04:00
2015-02-06T00:00:00+00:00
Customizing your login on Purdue CS computers (WIP, but updated)
<h1 id="customizing-your-login-on-purdue-cs-computers-wip-but-updated">Customizing your login on Purdue CS computers (WIP, but updated)</h1>
<blockquote>
<p>This article is currently a Work-In-Progress. Other than the one place where I say "I'm not sure", the GDM section is complete. The network shares section is a mess, but has some good information.</p>
</blockquote>
<p>Most CS students at Purdue spend a lot of time on the lab boxes, but don't know a lot about them. This document tries to fix that.</p>
<p>The lab boxes all run Gentoo.</p>
<h2 id="gdm-the-gnome-display-manager">GDM, the Gnome Display Manager</h2>
<p>The boxes run <code>gdm</code> (Gnome Display Manager) 2.20.11 for the login screen. This is an old version, and has a couple behaviors that are slightly different than new versions, but here are the important bits:</p>
<p>System configuration:</p>
<ul>
<li><code>/usr/share/gdm/defaults.conf</code> (lower precidence)</li>
<li><code>/etc/X11/gdm/custom.conf</code> (higher precidence)</li>
</ul>
<p>User configuration:</p>
<ul>
<li><code>~/.dmrc</code> (more recent versions use <code>~/.desktop</code>, but Purdue boxes aren't running more recent versions)</li>
</ul>
<h3 id="purdues-gdm-configuration">Purdue's GDM configuration</h3>
<p>Now, <code>custom.conf</code> sets</p>
<pre><code>BaseXsession=/usr/local/share/xsessions/Xsession
SessionDesktopDir=/usr/local/share/xsessions/</code></pre>
<p>This is important, because there are <em>multiple</em> locations that look like these files; I take it that they were used at sometime in the past. Don't get tricked into thinking that it looks at <code>/etc/X11/gdm/Xsession</code> (which exists, and is where it would look by default).</p>
<p>If you look at the GDM login screen, it has a "Sessions" button that opens a prompt where you can select any of several sessions:</p>
<ul>
<li>Last session</li>
<li>1. MATE (<code>mate.desktop</code>; <code>Exec=mate-session</code>)</li>
<li>2. CS Default Session (<code>default.desktop</code>; <code>Exec=default</code>)</li>
<li>3. Custom Session (<code>custom.desktop</code>; <code>Exec=custom</code>)</li>
<li>4. FVWM2 (<code>fvwm2.desktop</code>; <code>Exec=fvwm2</code>)</li>
<li>5. gnome.desktop (<code>gnome.desktop</code>; <code>Exec=gnome-session</code>)</li>
<li>6. KDE (<code>kde.desktop</code>, <code>Exec=startkde</code>)</li>
<li>Failsafe MATE (<code>ShowGnomeFailsafeSession=true</code>)</li>
<li>Failsafe Terminal (<code>ShowXtermFailsafeSession=true</code>)</li>
</ul>
<p>The main 6 are configured by the <code>.desktop</code> files in <code>SessionDesktopDir=/usr/local/share/xsessions</code>; the last 2 are auto-generated. The reason <code>ShowGnomeFailsafeSession</code> correctly generates a Mate session instead of a Gnome session is because of the patch <code>/p/portage/*/overlay/gnome-base/gdm/files/gdm-2.20.11-mate.patch</code>.</p>
<p>I'm not sure why Gnome shows up as <code>gnome.desktop</code> instead of <code>GNOME</code> as specified by <code>gnome.desktop:Name</code>. I imagine it might be something related to the aforementioned patch, but I can't find anything in the patch that looks like it would screw that up; at least not without a better understanding of GDM's code.</p>
<p>Which of the main 6 is used by default ("Last Session") is configured with <code>~/.dmrc:Session</code>, which contains the basename of the associated <code>.desktop</code> file (that is, without any directory part or file extension).</p>
<p>Every one of the <code>.desktop</code> files sets <code>Type=XSession</code>, which means that instead of running the argument in <code>Exec=</code> directly, it passes it as arguments to the <code>Xsession</code> program (in the location configured by <code>BaseXsession</code>).</p>
<h4 id="xsession">Xsession</h4>
<p>So, now we get to read <code>/usr/local/share/xsessions/Xsession</code>.</p>
<p>Before it does anything else, it:</p>
<ol type="1">
<li><code>. /etc/profile.env</code></li>
<li><code>unset ROOTPATH</code></li>
<li>Try to set up logging to one of <code>~/.xsession-errors</code>, <code>$TMPDIR/xses-$USER</code>, or <code>/tmp/xses-$USER</code> (it tries them in that order).</li>
<li><code>xsetroot -default</code></li>
<li>Fiddles with the maximum number of processes.</li>
</ol>
<p>After that, it handles these 3 "special" arguments that were given to it by various <code>.desktop</code> <code>Exec=</code> lines:</p>
<ul>
<li><code>failsafe</code>: Runs a single xterm window. NB: This is NOT run by either of the failsafe options. It is likey a vestiage from a prior configuration.</li>
<li><code>startkde</code>: Displays a message saying KDE is no longer available.</li>
<li><code>gnome-session</code>: Displays a message saying GNOME has been replaced by MATE.</li>
</ul>
<p>Assuming that none of those were triggered, it then does:</p>
<ol type="1">
<li><code>source ~/.xprofile</code></li>
<li><code>xrdb -merge ~/.Xresources</code></li>
<li><code>xmodmap ~/.xmodmaprc</code></li>
</ol>
<p>Finally, it has a switch statement over the arguments given to it by the various <code>.desktop</code> <code>Exec=</code> lines:</p>
<ul>
<li><code>custom</code>: Executes <code>~/.xsession</code>.</li>
<li><code>default</code>: Executes <code>~/.Xrc.cs</code>.</li>
<li><code>mate-session</code>: It has this whole script to start DBus, run the <code>mate-session</code> command, then cleanup when it's done.</li>
<li><code>*</code> (<code>fvwm2</code>): Runs <code>eval exec "$@"</code>, which results in it executing the <code>fvwm2</code> command.</li>
</ul>
<h2 id="network-shares">Network Shares</h2>
<p>Your data is on various hosts. I believe most undergrads have their data on <code>data.cs.purdue.edu</code> (or just <a href="https://en.wikipedia.org/wiki/Data_%28Star_Trek%29"><code>data</code></a>). Others have theirs on <a href="http://swfanon.wikia.com/wiki/Antor"><code>antor</code></a> or <a href="https://en.wikipedia.org/wiki/Tux"><code>tux</code></a> (that I know of).</p>
<p>Most of the boxes with tons of storage have many network cards; each with a different IP; a single host's IPs are mostly the same, but with varying 3rd octets. For example, <code>data</code> is 128.10.X.13. If you need a particular value of X, but don't want to remember the other octets; they are individually addressed with <code>BASENAME-NUMBER.cs.purdue.edu</code>. For example, <code>data-25.cs.purdu.edu</code> is 128.10.25.13.</p>
<p>They use <a href="https://www.kernel.org/pub/linux/daemons/autofs/">AutoFS</a> quite extensively. The maps are generated dynamically by <code>/etc/autofs/*.map</code>, which are all symlinks to <code>/usr/libexec/amd2autofs</code>. As far as I can tell, <code>amd2autofs</code> is custom to Purdue. Its source lives in <code>/p/portage/*/overlay/net-fs/autofs/files/amd2autofs.c</code>. The name appears to be a misnomer; seems to claim to dynamically translate from the configuration of <a href="http://www.am-utils.org/">Auto Mounter Daemon (AMD)</a> to AutoFS, but it actually talks to NIS. It does so using the <code>yp</code> interface, which is in Glibc for compatibility, but is undocumented. For documentation for that interface, look at the one of the BSDs, or Mac OS X. From the comments in the file, it appears that it once did look at the AMD configuration, but has since been changed.</p>
<p>There are 3 mountpoints using AutoFS: <code>/homes</code>, <code>/p</code>, and <code>/u</code>. <code>/homes</code> creates symlinks on-demand from <code>/homes/USERNAME</code> to <code>/u/BUCKET/USERNAME</code>. <code>/u</code> mounts NFS shares to <code>/u/SERVERNAME</code> on-demand, and creates symlinks from <code>/u/BUCKET</code> to <code>/u/SERVERNAME/BUCKET</code> on-demand. <code>/p</code> mounts on-demand various NFS shares that are organized by topic; the Xinu/MIPS tools are in <code>/p/xinu</code>, the Portage tree is in <code>/p/portage</code>.</p>
<p>I'm not sure how <code>scratch</code> works; it seems to be heterogenous between different servers and families of lab boxes. Sometimes it's in <code>/u</code>, sometimes it isn't.</p>
<p>This 3rd-party documentation was very helpful to me: <a href="http://www.linux-consulting.com/Amd_AutoFS/" class="uri">http://www.linux-consulting.com/Amd_AutoFS/</a> It's where Gentoo points for the AutoFS homepage, as it doesn't have a real homepage. Arch just points to FreshMeat. Debian points to kernel.org.</p>
<h3 id="lore">Lore</h3>
<p><a href="https://en.wikipedia.org/wiki/List_of_Star_Trek:_The_Next_Generation_characters#Lore"><code>lore</code></a></p>
<p>Lore is a SunOS 5.10 box running on Sun-Fire V445 (sun4u) hardware. SunOS is NOT GNU/Linux, and sun4u is NOT x86.</p>
<p>Instead of <code>/etc/fstab</code> it is <code>/etc/mnttab</code>.</p>
Luke Shumakerhttps://lukeshu.com/lukeshu@sbcglobal.net
<p>The content of this page is Copyright © 2015 <a href="mailto:lukeshu@sbcglobal.net">Luke Shumaker</a>.</p>
<p>This page is licensed under the <a href="https://creativecommons.org/licenses/by-sa/3.0/">CC BY-SA-3.0</a> license.</p>
https://lukeshu.com/blog/make-memoize.html
2016-03-21T02:34:10-04:00
2014-11-20T00:00:00+00:00
A memoization routine for GNU Make functions
<h1 id="a-memoization-routine-for-gnu-make-functions">A memoization routine for GNU Make functions</h1>
<p>I'm a big fan of <a href="https://www.gnu.org/software/make/">GNU Make</a>. I'm pretty knowledgeable about it, and was pretty active on the help-make mailing list for a while. Something that many experienced make-ers know of is John Graham-Cumming's "GNU Make Standard Library", or <a href="http://gmsl.sourceforge.net/">GMSL</a>.</p>
<p>I don't like to use it, as I'm capable of defining macros myself as I need them instead of pulling in a 3rd party dependency (and generally like to stay away from the kind of Makefile that would lean heavily on something like GMSL).</p>
<p>However, one really neat thing that GMSL offers is a way to memoize expensive functions (such as those that shell out). I was considering pulling in GMSL for one of my projects, almost just for the <code>memoize</code> function.</p>
<p>However, John's <code>memoize</code> has a couple short-comings that made it unsuitable for my needs.</p>
<ul>
<li>Only allows functions that take one argument.</li>
<li>Considers empty values to be unset; for my needs, an empty string is a valid value that should be cached.</li>
</ul>
<p>So, I implemented my own, more flexible memoization routine for Make.</p>
<pre><code># This definition of `rest` is equivalent to that in GMSL
rest = $(wordlist 2,$(words $1),$1)
# How to use: Define 2 variables (the type you would pass to $(call):
# `_<var>NAME</var>_main` and `_<var>NAME</var>_hash`. Now, `_<var>NAME</var>_main` is the function getting
# memoized, and _<var>NAME</var>_hash is a function that hashes the function arguments
# into a string suitable for a variable name.
#
# Then, define the final function like:
#
# <var>NAME</var> = $(foreach func,<var>NAME</var>,$(memoized))
_main = $(_$(func)_main)
_hash = __memoized_$(_$(func)_hash)
memoized = $(if $($(_hash)),,$(eval $(_hash) := _ $(_main)))$(call rest,$($(_hash)))</code></pre>
<p>However, I later removed it from the Makefile, as I <a href="https://projects.parabola.nu/~lukeshu/maven-dist.git/commit/?id=fec5a7281b3824cb952aa0bb76bbbaa41eaafdf9">re-implemented</a> the bits that it memoized in a more efficient way, such that memoization was no longer needed, and the whole thing was faster.</p>
<p>Later, I realized that my memoized routine could have been improved by replacing <code>func</code> with <code>$0</code>, which would simplify how the final function is declared:</p>
<pre><code># This definition of `rest` is equivalent to that in GMSL
rest = $(wordlist 2,$(words $1),$1)
# How to use:
#
# _<var>NAME</var>_main = <var>your main function to be memoized</var>
# _<var>NAME</var>_hash = <var>your hash function for parameters</var>
# <var>NAME</var> = $(memoized)
#
# The output of your hash function should be a string following
# the same rules that variable names follow.
_main = $(_$0_main)
_hash = __memoized_$(_$0_hash)
memoized = $(if $($(_hash)),,$(eval $(_hash) := _ $(_main)))$(call rest,$($(_hash)))</pre>
<p></code></p>
<p>Now, I'm pretty sure that should work, but I have only actually tested the first version.</p>
<h2 id="tldr">TL;DR</h2>
<p>Avoid doing things in Make that would make you lean on complex solutions like an external memoize function.</p>
<p>However, if you do end up needing a more flexible memoize routine, I wrote one that you can use.</p>
Luke Shumakerhttps://lukeshu.com/lukeshu@sbcglobal.net
<p>The content of this page is Copyright © 2014 <a href="mailto:lukeshu@sbcglobal.net">Luke Shumaker</a>.</p>
<p>This page is licensed under the <a href="http://www.wtfpl.net/txt/copying/">WTFPL-2</a> license.</p>
https://lukeshu.com/blog/ryf-routers.html
2014-09-12T00:21:05-04:00
2014-09-12T00:00:00+00:00
I'm excited about the new RYF-certified routers from ThinkPenguin
<h1 id="im-excited-about-the-new-ryf-certified-routers-from-thinkpenguin">I'm excited about the new RYF-certified routers from ThinkPenguin</h1>
<p>I just learned that on Wednesday, the FSF <a href="https://www.fsf.org/resources/hw/endorsement/thinkpenguin">awarded</a> the <abbr title="Respects Your Freedom">RYF</abbr> certification to the <a href="https://www.thinkpenguin.com/TPE-NWIFIROUTER">Think Penguin TPE-NWIFIROUTER</a> wireless router.</p>
<p>I didn't find this information directly published up front, but simply: It is a re-branded <strong>TP-Link TL-841ND</strong> modded to be running <a href="http://librecmc.com/">libreCMC</a>.</p>
<p>I've been a fan of the TL-841/740 line of routers for several years now. They are dirt cheap (if you go to Newegg and sort by "cheapest," it's frequently the TL-740N), are extremely reliable, and run OpenWRT like a champ. They are my go-to routers.</p>
<p>(And they sure beat the snot out of the Arris TG862 that it seems like everyone has in their homes now. I hate that thing, it even has buggy packet scheduling.)</p>
<p>So this announcement is <del>doubly</del>triply exciting for me:</p>
<ul>
<li>I have a solid recommendation for a router that doesn't require me or them to manually install an after-market firmware (buy it from ThinkPenguin).</li>
<li>If it's for me, or someone technical, I can cut costs by getting a stock TP-Link from Newegg, installing libreCMC ourselves.</li>
<li>I can install a 100% libre distribution on my existing routers (until recently, they were not supported by any of the libre distributions, not for technical reasons, but lack of manpower).</li>
</ul>
<p>I hope to get libreCMC installed on my boxes this weekend!</p>
Luke Shumakerhttps://lukeshu.com/lukeshu@sbcglobal.net
<p>The content of this page is Copyright © 2014 <a href="mailto:lukeshu@sbcglobal.net">Luke Shumaker</a>.</p>
<p>This page is licensed under the <a href="https://creativecommons.org/licenses/by-sa/3.0/">CC BY-SA-3.0</a> license.</p>
https://lukeshu.com/blog/what-im-working-on-fall-2014.html
2016-02-28T07:12:18-05:00
2014-09-11T00:00:00+00:00
What I'm working on (Fall 2014)
<h1 id="what-im-working-on-fall-2014">What I'm working on (Fall 2014)</h1>
<p>I realized today that I haven't updated my log in a while, and I don't have any "finished" stuff to show off right now, but I should just talk about all the cool stuff I'm working on right now.</p>
<h2 id="static-parsing-of-subshells">Static parsing of subshells</h2>
<p>Last year I wrote a shell (for my Systems Programming class); however, I went above-and-beyond and added some really novel features. In my opinion, the most significant is that it parses arbitrarily deep subshells in one pass, instead of deferring them until execution. No shell that I know of does this.</p>
<p>At first this sounds like a really difficult, but minor feature. Until you think about scripting, and maintenance of those scripts. Being able to do a full syntax check of a script is <em>crucial</em> for long-term maintenance, yet it's something that is missing from every major shell. I'd love to get this code merged into bash. It would be incredibly useful for <a href="/git/mirror/parabola/packages/libretools.git">some software I maintain</a>.</p>
<p>Anyway, I'm trying to publish this code, but because of a recent kerfuffle with a student publishing all of his projects on the web (and other students trying to pass it off as their own), I'm being cautious with this and making sure Purdue is alright with what I'm putting online.</p>
<h2 id="stateless-user-configuration-for-pamnss"><a href="https://lukeshu.com/git/mirror/parabola/hackers.git/log/?h=lukeshu/restructure">Stateless user configuration for PAM/NSS</a></h2>
<p>Parabola GNU/Linux-libre users know that over this summer, we had a <em>mess</em> with server outages. One of the servers is still out (due to things out of our control), and we don't have some of the data on it (because volunteer developers are terrible about back-ups, apparently).</p>
<p>This has caused us to look at how we manage our servers, back-ups, and several other things.</p>
<p>One thing that I've taken on as my pet project is making sure that if a server goes down, or we need to migrate (for example, Jon is telling us that he wants us to hurry up and switch to the new 64 bit hardware so he can turn off the 32 bit box), we can spin up a new server from scratch pretty easily. Part of that is making configurations stateless, and dynamic based on external data; having data be located in one place instead of duplicated across 12 config files and 3 databases... on the same box.</p>
<p>Right now, that's looking like some custom software interfacing with OpenLDAP and OpenSSH via sockets (OpenLDAP being a middle-man between us and PAM (Linux) and NSS (libc)). However, the OpenLDAP documentation is... inconsistent and frustrating. I might end up hacking up the LDAP modules for NSS and PAM to talk to our system directly, and cut OpenLDAP out of the picture. We'll see!</p>
<p>PS: Pablo says that tomorrow we should be getting out-of-band access to the drive of the server that is down, so that we can finally restore those services on a different server.</p>
<h2 id="project-leaguer"><a href="https://lukeshu.com/git/mirror/leaguer.git/">Project Leaguer</a></h2>
<p>Last year, some friends and I began writing some "eSports tournament management software", primarily targeting League of Legends (though it has a module system that will allow it to support tons of different data sources). We mostly got it done last semester, but it had some rough spots and sharp edges we need to work out. Because we were all out of communication for the summer, we didn't work on it very much (but we did a little!). It's weird that I care about this, because I'm not a gamer. Huh, I guess coding with friends is just fun.</p>
<p>Anyway, this year, <a href="https://github.com/AndrewMurrell">Andrew</a>, <a href="https://github.com/DavisLWebb">Davis</a>, and I are planning to get it to a polished state by the end of the semester. We could probably do it faster, but we'd all also like to focus on classes and other projects a little more.</p>
<h2 id="c1">C+=1</h2>
<p>People tend to lump C and C++ together, which upsets me, because I love C, but have a dislike for C++. That's not to say that C++ is entirely bad; it has some good features. My "favorite" code is actually code that is basically C, but takes advantage of a couple C++ features, while still being idiomatic C, not C++.</p>
<p>Anyway, with the perspective of history (what worked and what didn't), and a slightly opinionated view on language design (I'm pretty much a Rob Pike fan-boy), I thought I'd try to tackle "object-oriented C" with roughly the same design criteria as Stroustrup had when designing C++. I'm calling mine C+=1, for obvious reasons.</p>
<p>I haven't published anything yet, because calling it "working" would be stretching the truth. But I am using it for my assignments in CS 334 (Intro to Graphics), so it should move along fairly quickly, as my grade depends on it.</p>
<p>I'm not taking it too seriously; I don't expect it to be much more than a toy language, but it is an excuse to dive into the GCC internals.</p>
<h2 id="projects-that-ive-put-on-the-back-burner">Projects that I've put on the back-burner</h2>
<p>I've got several other projects that I'm putting on hold for a while.</p>
<ul>
<li><code>maven-dist</code> (was hosted with Parabola, apparently I haven't pushed it anywhere except the server that is down): A tool to build Apache Maven from source. That sounds easy, it's open source, right? Well, except that Maven is the build system from hell. It doesn't support cyclic dependencies, yet uses them internally to build itself. It <em>loves</em> to just get binaries from Maven Central to "optimize" the build process. It depends on code that depends on compiler bugs that no longer exist (which I guess means that <em>no one</em> has tried to build it from source after it was originally published). I've been working on-and-off on this for more than a year. My favorite part of it was writing a <a href="/dump/jflex2jlex.sed.txt">sed script</a> that translates a JFlex grammar specification into a JLex grammar, which is used to bootstrap JFlex; its both gross and delightful at the same time.</li>
<li>Integration between <code>dbscripts</code> and <code>abslibre</code>. If you search IRC logs, mailing lists, and ParabolaWiki, you can find numerous rants by me against <a href="/git/mirror/parabola/dbscripts.git/tree/db-sync"><code>dbscripts:db-sync</code></a>. I just hate the data-flow, it is almost designed to make things get out of sync, and broken. I mean, does <a href="/dump/parabola-data-flow.svg">this</a> look like a simple diagram? For contrast, <a href="/dump/parabola-data-flow-xbs.svg">here's</a> a rough (slightly incomplete) diagram of what I want to replace it with.</li>
<li>Git backend for MediaWiki (or, pulling out the rendering module of MediaWiki). I've made decent progress on that front, but there is <em>crazy</em> de-normalization going on in the MediaWiki schema that makes this very difficult. I'm sure some of it is for historical reasons, and some of it for performance, but either way it is a mess for someone trying to neatly gut that part of the codebase.</li>
</ul>
<h2 id="other">Other</h2>
<p>I should consider doing a write-up of deterministic-<code>tar</code> behavior (something that I've been implementing in Parabola for a while, meanwhile the Debian people have also been working on it).</p>
<p>I should also consider doing a "post-mortem" of <a href="https://lukeshu.com/git/mirror/parabola/packages/pbs-tools.git/">PBS</a>, which never actually got used, but launched XBS (part of the <code>dbscripts</code>/<code>abslibre</code> integration mentioned above), as well as serving as a good test-bed for features that did get implemented.</p>
<p>I over-use the word "anyway."</p>
Luke Shumakerhttps://lukeshu.com/lukeshu@sbcglobal.net
<p>The content of this page is Copyright © 2014 <a href="mailto:lukeshu@sbcglobal.net">Luke Shumaker</a>.</p>
<p>This page is licensed under the <a href="https://creativecommons.org/licenses/by-sa/3.0/">CC BY-SA-3.0</a> license.</p>
https://lukeshu.com/blog/rails-improvements.html
2016-02-28T07:12:18-05:00
2014-05-08T00:00:00+00:00
Miscellaneous ways to improve your Rails experience
<h1 id="miscellaneous-ways-to-improve-your-rails-experience">Miscellaneous ways to improve your Rails experience</h1>
<p>Recently, I've been working on <a href="https://github.com/LukeShu/leaguer">a Rails web application</a>, that's really the baby of a friend of mine. Anyway, through its development, I've come up with a couple things that should make your interactions with Rails more pleasant.</p>
<h2 id="auto-reload-classes-from-other-directories-than-app">Auto-(re)load classes from other directories than <code>app/</code></h2>
<p>The development server automatically loads and reloads files from the <code>app/</code> directory, which is extremely nice. However, most web applications are going to involve modules that aren't in that directory; and editing those files requires re-starting the server for the changes to take effect.</p>
<p>Adding the following lines to your <a href="https://github.com/LukeShu/leaguer/blob/c846cd71411ec3373a5229cacafe0df6b3673543/config/application.rb#L15"><code>config/application.rb</code></a> will allow it to automatically load and reload files from the <code>lib/</code> directory. You can of course change this to whichever directory/ies you like.</p>
<pre><code>module YourApp
class Application < Rails::Application
…
config.autoload_paths += ["#{Rails.root}/lib"]
config.watchable_dirs["#{Rails.root}/lib"] = [:rb]
…
end
end</code></pre>
<h2 id="have-submit_tag-generate-a-button-instead-of-an-input">Have <code>submit_tag</code> generate a button instead of an input</h2>
<p>In HTML, the <code><input type="submit"></code> tag styles slightly differently than other inputs or buttons. It is impossible to precisely controll the hight via CSS, which makes designing forms a pain. This is particularly noticable if you use Bootstrap 3, and put it next to another button; the submit button will be slightly shorter vertically.</p>
<p>The obvious fix here is to use <code><button type="submit"></code> instead. The following code will modify the default Rails form helpers to generate a button tag instead of an input tag. Just stick the code in <a href="https://github.com/LukeShu/leaguer/blob/521eae01be1ca3f69b47b3170a0548c3268f4a22/config/initializers/form_improvements.rb"><code>config/initializers/form_improvements.rb</code></a>; it will override <code>ActionView::Hlepers::FormTagHelper#submit_tag</code>. It is mostly the standard definition of the function, except for the last line, which has changed.</p>
<pre><code># -*- ruby-indent-level: 2; indent-tabs-mode: nil -*-
module ActionView
module Helpers
module FormTagHelper
# This is modified from actionpack-4.0.2/lib/action_view/helpers/form_tag_helper.rb#submit_tag
def submit_tag(value = "Save changes", options = {})
options = options.stringify_keys
if disable_with = options.delete("disable_with")
message = ":disable_with option is deprecated and will be removed from Rails 4.1. " \
"Use 'data: { disable_with: \'Text\' }' instead."
ActiveSupport::Deprecation.warn message
options["data-disable-with"] = disable_with
end
if confirm = options.delete("confirm")
message = ":confirm option is deprecated and will be removed from Rails 4.1. " \
"Use 'data: { confirm: \'Text\' }' instead'."
ActiveSupport::Deprecation.warn message
options["data-confirm"] = confirm
end
content_tag(:button, value, { "type" => "submit", "name" => "commit", "value" => value }.update(options))
end
end
end
end</code></pre>
<p>I'll probably update this page as I tweak other things I don't like.</p>
Luke Shumakerhttps://lukeshu.com/lukeshu@sbcglobal.net
<p>The content of this page is Copyright © 2014 <a href="mailto:lukeshu@sbcglobal.net">Luke Shumaker</a>.</p>
<p>This page is licensed under the <a href="https://creativecommons.org/licenses/by-sa/3.0/">CC BY-SA-3.0</a> license.</p>
https://lukeshu.com/blog/bash-redirection.html
2014-05-08T14:36:47-04:00
2014-02-13T00:00:00+00:00
Bash redirection
<h1 id="bash-redirection">Bash redirection</h1>
<p>Apparently, too many people don't understand Bash redirection. They might get the basic syntax, but they think of the process as declarative; in Bourne-ish shells, it is procedural.</p>
<p>In Bash, streams are handled in terms of "file descriptors" of "FDs". FD 0 is stdin, FD 1 is stdout, and FD 2 is stderr. The equivalence (or lack thereof) between using a numeric file descriptor, and using the associated file in <code>/dev/*</code> and <code>/proc/*</code> is interesting, but beyond the scope of this article.</p>
<h2 id="step-1-pipes">Step 1: Pipes</h2>
<p>To quote the Bash manual:</p>
<pre><code>A 'pipeline' is a sequence of simple commands separated by one of the
control operators '|' or '|&'.
The format for a pipeline is
[time [-p]] [!] COMMAND1 [ [| or |&] COMMAND2 ...]</code></pre>
<p>Now, <code>|&</code> is just shorthand for <code>2>&1 |</code>, the pipe part happens here, but the <code>2>&1</code> part doesn't happen until step 2.</p>
<p>First, if the command is part of a pipeline, the pipes are set up. For every instance of the <code>|</code> metacharacter, Bash creates a pipe (<code>pipe(3)</code>), and duplicates (<code>dup2(3)</code>) the write end of the pipe to FD 1 of the process on the left side of the <code>|</code>, and duplicate the read end of the pipe to FD 0 of the process on the right side.</p>
<h2 id="step-2-redirections">Step 2: Redirections</h2>
<p><em>After</em> the initial FD 0 and FD 1 fiddling by pipes is done, Bash looks at the redirections. <strong>This means that redirections can override pipes.</strong></p>
<p>Redirections are read left-to-right, and are executed as they are read, using <code>dup2(right-side, left-side)</code>. This is where most of the confusion comes from, people think of them as declarative, which leads to them doing the first of these, when they mean to do the second:</p>
<pre><code>cmd 2>&1 >file # stdout goes to file, stderr goes to stdout
cmd >file 2>&1 # both stdout and stderr go to file</code></pre>
Luke Shumakerhttps://lukeshu.com/lukeshu@sbcglobal.net
<p>The content of this page is Copyright © 2014 <a href="mailto:lukeshu@sbcglobal.net">Luke Shumaker</a>.</p>
<p>This page is licensed under the <a href="https://creativecommons.org/licenses/by-sa/3.0/">CC BY-SA-3.0</a> license.</p>
https://lukeshu.com/blog/java-segfault.html
2016-02-28T07:12:18-05:00
2014-01-13T00:00:00+00:00
My favorite bug: segfaults in Java
<h1 id="my-favorite-bug-segfaults-in-java">My favorite bug: segfaults in Java</h1>
<blockquote>
<p>Update: Two years later, I wrote a more detailed version of this article: <a href="./java-segfault-redux.html">My favorite bug: segfaults in Java (redux)</a>.</p>
</blockquote>
<p>I've told this story orally a number of times, but realized that I have never written it down. This is my favorite bug story; it might not be my hardest bug, but it is the one I most like to tell.</p>
<h2 id="the-context">The context</h2>
<p>In 2012, I was a Senior programmer on the FIRST Robotics Competition team 1024. For the unfamiliar, the relevant part of the setup is that there are 2 minute and 15 second matches in which you have a 120 pound robot that sometimes runs autonomously, and sometimes is controlled over WiFi from a person at a laptop running stock "driver station" software and modifiable "dashboard" software.</p>
<p>That year, we mostly used the dashboard software to allow the human driver and operator to monitor sensors on the robot, one of them being a video feed from a web-cam mounted on it. This was really easy because the new standard dashboard program had a click-and drag interface to add stock widgets; you just had to make sure the code on the robot was actually sending the data.</p>
<p>That's great, until when debugging things, the dashboard would suddenly vanish. If it was run manually from a terminal (instead of letting the driver station software launch it), you would see a core dump indicating a segmentation fault.</p>
<p>This wasn't just us either; I spoke with people on other teams, everyone who was streaming video had this issue. But, because it only happened every couple of minutes, and a match is only 2:15, it didn't need to run very long, they just crossed their fingers and hoped it didn't happen during a match.</p>
<p>The dashboard was written in Java, and the source was available (under a 3-clause BSD license), so I dove in, hunting for the bug. Now, the program did use Java Native Interface to talk to OpenCV, which the video ran through; so I figured that it must be a bug in the C/C++ code that was being called. It was especially a pain to track down the pointers that were causing the issue, because it was hard with native debuggers to see through all of the JVM stuff to the OpenCV code, and the OpenCV stuff is opaque to Java debuggers.</p>
<p>Eventually the issue lead me back into the Java code--there was a native pointer being stored in a Java variable; Java code called the native routine to <code>free()</code> the structure, but then tried to feed it to another routine later. This lead to difficulty again--tracking objects with Java debuggers was hard because they don't expect the program to suddenly segfault; it's Java code, Java doesn't segfault, it throws exceptions!</p>
<p>With the help of <code>println()</code> I was eventually able to see that some code was executing in an order that straight didn't make sense.</p>
<h2 id="the-bug">The bug</h2>
<p>The issue was that Java was making an unsafe optimization (I never bothered to figure out if it is the compiler or the JVM making the mistake, I was satisfied once I had a work-around).</p>
<p>Java was doing something similar to tail-call optimization with regard to garbage collection. You see, if it is waiting for the return value of a method <code>m()</code> of object <code>o</code>, and code in <code>m()</code> that is yet to be executed doesn't access any other methods or properties of <code>o</code>, then it will go ahead and consider <code>o</code> eligible for garbage collection before <code>m()</code> has finished running.</p>
<p>That is normally a safe optimization to make… except for when a destructor method (<code>finalize()</code>) is defined for the object; the destructor can have side effects, and Java has no way to know whether it is safe for them to happen before <code>m()</code> has finished running.</p>
<h2 id="the-work-around">The work-around</h2>
<p>The routine that the segmentation fault was occurring in was something like:</p>
<pre><code>public type1 getFrame() {
type2 child = this.getChild();
type3 var = this.something();
// `this` may now be garbage collected
return child.somethingElse(var); // segfault comes here
}</code></pre>
<p>Where the destructor method of <code>this</code> calls a method that will <code>free()</code> native memory that is also accessed by <code>child</code>; if <code>this</code> is garbage collected before <code>child.somethingElse()</code> runs, the backing native code will try to access memory that has been <code>free()</code>ed, and receive a segmentation fault. That usually didn't happen, as the routines were pretty fast. However, running 30 times a second, eventually bad luck with the garbage collector happens, and the program crashes.</p>
<p>The work-around was to insert a bogus call to this to keep <code>this</code> around until after we were also done with <code>child</code>:</p>
<pre><code>public type1 getFrame() {
type2 child = this.getChild();
type3 var = this.something();
type1 ret = child.somethingElse(var);
this.getSize(); // bogus call to keep `this` around
return ret;
}</code></pre>
<p>Yeah. After spending weeks wading through though thousands of lines of Java, C, and C++, a bogus call to a method I didn't care about was the fix.</p>
Luke Shumakerhttps://lukeshu.com/lukeshu@sbcglobal.net
<p>The content of this page is Copyright © 2014 <a href="mailto:lukeshu@sbcglobal.net">Luke Shumaker</a>.</p>
<p>This page is licensed under the <a href="https://creativecommons.org/licenses/by-sa/3.0/">CC BY-SA-3.0</a> license.</p>
https://lukeshu.com/blog/bash-arrays.html
2016-02-28T07:12:18-05:00
2013-10-13T00:00:00+00:00
Bash arrays
<h1 id="bash-arrays">Bash arrays</h1>
<p>Way too many people don't understand Bash arrays. Many of them argue that if you need arrays, you shouldn't be using Bash. If we reject the notion that one should never use Bash for scripting, then thinking you don't need Bash arrays is what I like to call "wrong". I don't even mean real scripting; even these little stubs in <code>/usr/bin</code>:</p>
<pre><code>#!/bin/sh
java -jar /…/something.jar $* # WRONG!</code></pre>
<p>Command line arguments are exposed as an array, that little <code>$*</code> is accessing it, and is doing the wrong thing (for the lazy, the correct thing is <code>-- "$@"</code>). Arrays in Bash offer a safe way preserve field separation.</p>
<p>One of the main sources of bugs (and security holes) in shell scripts is field separation. That's what arrays are about.</p>
<h2 id="what-field-separation">What? Field separation?</h2>
<p>Field separation is just splitting a larger unit into a list of "fields". The most common case is when Bash splits a "simple command" (in the Bash manual's terminology) into a list of arguments. Understanding how this works is an important prerequisite to understanding arrays, and even why they are important.</p>
<p>Dealing with lists is something that is very common in Bash scripts; from dealing with lists of arguments, to lists of files; they pop up a lot, and each time, you need to think about how the list is separated. In the case of <code>$PATH</code>, the list is separated by colons. In the case of <code>$CFLAGS</code>, the list is separated by whitespace. In the case of actual arrays, it's easy, there's no special character to worry about, just quote it, and you're good to go.</p>
<h2 id="bash-word-splitting">Bash word splitting</h2>
<p>When Bash reads a "simple command", it splits the whole thing into a list of "words". "The first word specifies the command to be executed, and is passed as argument zero. The remaining words are passed as arguments to the invoked command." (to quote <code>bash(1)</code>)</p>
<p>It is often hard for those unfamiliar with Bash to understand when something is multiple words, and when it is a single word that just contains a space or newline. To help gain an intuitive understanding, I recommend using the following command to print a bullet list of words, to see how Bash splits them up:</p>
<pre><code>printf ' -> %s\n' <var>words…</var><hr> -> word one
-> multiline
word
-> third word
</code></pre>
<p>In a simple command, in absence of quoting, Bash separates the "raw" input into words by splitting on spaces and tabs. In other places, such as when expanding a variable, it uses the same process, but splits on the characters in the <code>$IFS</code> variable (which has the default value of space/tab/newline). This process is, creatively enough, called "word splitting".</p>
<p>In most discussions of Bash arrays, one of the frequent criticisms is all the footnotes and "gotchas" about when to quote things. That's because they usually don't set the context of word splitting. <strong>Double quotes (<code>"</code>) inhibit Bash from doing word splitting.</strong> That's it, that's all they do. Arrays are already split into words; without wrapping them in double quotes Bash re-word splits them, which is almost <em>never</em> what you want; otherwise, you wouldn't be working with an array.</p>
<h2 id="normal-array-syntax">Normal array syntax</h2>
<table>
<caption>
<h1>Setting an array</h1>
<p><var>words…</var> is expanded and subject to word splitting
based on <code>$IFS</code>.</p>
</caption>
<tbody>
<tr>
<td><code>array=(<var>words…</var>)</code></td>
<td>Set the contents of the entire array.</td>
</tr><tr>
<td><code>array+=(<var>words…</var>)</code></td>
<td>Appends <var>words…</var> to the end of the array.</td>
</tr><tr>
<td><code>array[<var>n</var>]=<var>word</var></code></td>
<td>Sets an individual entry in the array, the first entry is at
<var>n</var>=0.</td>
</tr>
</tbody>
</table>
<p>Now, for accessing the array. The most important things to understanding arrays is to quote them, and understanding the difference between <code>@</code> and <code>*</code>.</p>
<table>
<caption>
<h1>Getting an entire array</h1>
<p>Unless these are wrapped in double quotes, they are subject to
word splitting, which defeats the purpose of arrays.</p>
<p>I guess it's worth mentioning that if you don't quote them, and
word splitting is applied, <code>@</code> and <code>*</code>
end up being equivalent.</p>
<p>With <code>*</code>, when joining the elements into a single
string, the elements are separated by the first character in
<code>$IFS</code>, which is, by default, a space.</p>
</caption>
<tbody>
<tr>
<td><code>"${array[@]}"</code></td>
<td>Evaluates to every element of the array, as a separate
words.</td>
</tr><tr>
<td><code>"${array[*]}"</code></td>
<td>Evaluates to every element of the array, as a single
word.</td>
</tr>
</tbody>
</table>
<p>It's really that simple—that covers most usages of arrays, and most of the mistakes made with them.</p>
<p>To help you understand the difference between <code>@</code> and <code>*</code>, here is a sample of each:</p>
<table>
<tbody>
<tr><th><code>@</code></th><th><code>*</code></th></tr>
<tr>
<td>Input:<pre><code>#!/bin/bash
array=(foo bar baz)
for item in "${array[@]}"; do
echo " - <${item}>"
done</code></pre></td>
<td>Input:<pre><code>#!/bin/bash
array=(foo bar baz)
for item in "${array[*]}"; do
echo " - <${item}>"
done</code></pre></td>
</tr>
<tr>
<td>Output:<pre><code> - <foo>
- <bar>
- <baz></code></pre></td>
<td>Output:<pre><code> - <foo bar baz><br><br><br></code></pre></td>
</tr>
</tbody>
</table>
<p>In most cases, <code>@</code> is what you want, but <code>*</code> comes up often enough too.</p>
<p>To get individual entries, the syntax is <code>${array[<var>n</var>]}</code>, where <var>n</var> starts at 0.</p>
<table>
<caption>
<h1>Getting a single entry from an array</h1>
<p>Also subject to word splitting if you don't wrap it in
quotes.</p>
</caption>
<tbody>
<tr>
<td><code>"${array[<var>n</var>]}"</code></td>
<td>Evaluates to the <var>n</var><sup>th</sup> entry of the
array, where the first entry is at <var>n</var>=0.</td>
</tr>
</tbody>
</table>
<p>To get a subset of the array, there are a few options:</p>
<table>
<caption>
<h1>Getting subsets of an array</h1>
<p>Substitute <code>*</code> for <code>@</code> to get the subset
as a <code>$IFS</code>-separated string instead of separate
words, as described above.</p>
<p>Again, if you don't wrap these in double quotes, they are
subject to word splitting, which defeats the purpose of
arrays.</p>
</caption>
<tbody>
<tr>
<td><code>"${array[@]:<var>start</var>}"</code></td>
<td>Evaluates to the entries from <var>n</var>=<var>start</var> to the end
of the array.</td>
</tr><tr>
<td><code>"${array[@]:<var>start</var>:<var>count</var>}"</code></td>
<td>Evaluates to <var>count</var> entries, starting at
<var>n</var>=<var>start</var>.</td>
</tr><tr>
<td><code>"${array[@]::<var>count</var>}"</code></td>
<td>Evaluates to <var>count</var> entries from the beginning of
the array.</td>
</tr>
</tbody>
</table>
<p>Notice that <code>"${array[@]}"</code> is equivalent to <code>"${array[@]:0}"</code>.</p>
<table>
<caption>
<h1>Getting the length of an array</h1>
<p>The is the only situation with arrays where quoting doesn't
make a difference.</p>
<p>True to my earlier statement, when unquoted, there is no
difference between <code>@</code> and <code>*</code>.</p>
</caption>
<tbody>
<tr>
<td>
<code>${#array[@]}</code>
<br>or<br>
<code>${#array[*]}</code>
</td>
<td>
Evaluates to the length of the array
</td>
</tr>
</tbody>
</table>
<h2 id="argument-array-syntax">Argument array syntax</h2>
<p>Accessing the arguments is mostly that simple, but that array doesn't actually have a variable name. It's special. Instead, it is exposed through a series of special variables (normal variables can only start with letters and underscore), that <em>mostly</em> match up with the normal array syntax.</p>
<p>Setting the arguments array, on the other hand, is pretty different. That's fine, because setting the arguments array is less useful anyway.</p>
<table>
<caption>
<h1>Accessing the arguments array</h1>
<aside>Note that for values of <var>n</var> with more than 1
digit, you need to wrap it in <code>{}</code>.
Otherwise, <code>"$10"</code> would be parsed
as <code>"${1}0"</code>.</aside>
</caption>
<tbody>
<tr><th colspan=2>Individual entries</th></tr>
<tr><td><code>${array[0]}</code></td><td><code>$0</code></td></tr>
<tr><td><code>${array[1]}</code></td><td><code>$1</code></td></tr>
<tr><td colspan=2 style="text-align:center">…</td></tr>
<tr><td><code>${array[9]}</code></td><td><code>$9</code></td></tr>
<tr><td><code>${array[10]}</code></td><td><code>${10}</code></td></tr>
<tr><td colspan=2 style="text-align:center">…</td></tr>
<tr><td><code>${array[<var>n</var>]}</code></td><td><code>${<var>n</var>}</code></td></tr>
<tr><th colspan=2>Subset arrays (array)</th></tr>
<tr><td><code>"${array[@]}"</code></td><td><code>"${@:0}"</code></td></tr>
<tr><td><code>"${array[@]:1}"</code></td><td><code>"$@"</code></td></tr>
<tr><td><code>"${array[@]:<var>pos</var>}"</code></td><td><code>"${@:<var>pos</var>}"</code></td></tr>
<tr><td><code>"${array[@]:<var>pos</var>:<var>len</var>}"</code></td><td><code>"${@:<var>pos</var>:<var>len</var>}"</code></td></tr>
<tr><td><code>"${array[@]::<var>len</var>}"</code></td><td><code>"${@::<var>len</var>}"</code></td></tr>
<tr><th colspan=2>Subset arrays (string)</th></tr>
<tr><td><code>"${array[*]}"</code></td><td><code>"${*:0}"</code></td></tr>
<tr><td><code>"${array[*]:1}"</code></td><td><code>"$*"</code></td></tr>
<tr><td><code>"${array[*]:<var>pos</var>}"</code></td><td><code>"${*:<var>pos</var>}"</code></td></tr>
<tr><td><code>"${array[*]:<var>pos</var>:<var>len</var>}"</code></td><td><code>"${*:<var>pos</var>:<var>len</var>}"</code></td></tr>
<tr><td><code>"${array[*]::<var>len</var>}"</code></td><td><code>"${*::<var>len</var>}"</code></td></tr>
<tr><th colspan=2>Array length</th></tr>
<tr><td><code>${#array[@]}</code></td><td><code>$#</code> + 1</td></tr>
<tr><th colspan=2>Setting the array</th></tr>
<tr><td><code>array=("${array[0]}" <var>words…</var>)</code></td><td><code>set -- <var>words…</var></code></td></tr>
<tr><td><code>array=("${array[0]}" "${array[@]:2}")</code></td><td><code>shift</code></td></tr>
<tr><td><code>array=("${array[0]}" "${array[@]:<var>n+1</var>}")</code></td><td><code>shift <var>n</var></code></td></tr>
</tbody>
</table>
<p>Did you notice what was inconsistent? The variables <code>$*</code>, <code>$@</code>, and <code>$#</code> behave like the <var>n</var>=0 entry doesn't exist.</p>
<table>
<caption>
<h1>Inconsistencies</h1>
</caption>
<tbody>
<tr>
<th colspan=3><code>@</code> or <code>*</code></th>
</tr><tr>
<td><code>"${array[@]}"</code></td>
<td>→</td>
<td><code>"${array[@]:0}"</code></td>
</tr><tr>
<td><code>"${@}"</code></td>
<td>→</td>
<td><code>"${@:1}"</code></td>
</tr><tr>
<th colspan=3><code>#</code></th>
</tr><tr>
<td><code>"${#array[@]}"</code></td>
<td>→</td>
<td>length</td>
</tr><tr>
<td><code>"${#}"</code></td>
<td>→</td>
<td>length-1</td>
</tr>
</tbody>
</table>
<p>These make sense because argument 0 is the name of the script—we almost never want that when parsing arguments. You'd spend more code getting the values that it currently gives you.</p>
<p>Now, for an explanation of setting the arguments array. You cannot set argument <var>n</var>=0. The <code>set</code> command is used to manipulate the arguments passed to Bash after the fact—similarly, you could use <code>set -x</code> to make Bash behave like you ran it as <code>bash -x</code>; like most GNU programs, the <code>--</code> tells it to not parse any of the options as flags. The <code>shift</code> command shifts each entry <var>n</var> spots to the left, using <var>n</var>=1 if no value is specified; and leaving argument 0 alone.</p>
<h2 id="but-you-mentioned-gotchas-about-quoting">But you mentioned "gotchas" about quoting!</h2>
<p>But I explained that quoting simply inhibits word splitting, which you pretty much never want when working with arrays. If, for some odd reason, you do what word splitting, then that's when you don't quote. Simple, easy to understand.</p>
<p>I think possibly the only case where you do want word splitting with an array is when you didn't want an array, but it's what you get (arguments are, by necessity, an array). For example:</p>
<pre><code># Usage: path_ls PATH1 PATH2…
# Description:
# Takes any number of PATH-style values; that is,
# colon-separated lists of directories, and prints a
# newline-separated list of executables found in them.
# Bugs:
# Does not correctly handle programs with a newline in the name,
# as the output is newline-separated.
path_ls() {
local IFS dirs
IFS=:
dirs=($@) # The odd-ball time that it needs to be unquoted
find -L "${dirs[@]}" -maxdepth 1 -type f -executable \
-printf '%f\n' 2>/dev/null | sort -u
}</code></pre>
<p>Logically, there shouldn't be multiple arguments, just a single <code>$PATH</code> value; but, we can't enforce that, as the array can have any size. So, we do the robust thing, and just act on the entire array, not really caring about the fact that it is an array. Alas, there is still a field-separation bug in the program, with the output.</p>
<h2 id="i-still-dont-think-i-need-arrays-in-my-scripts">I still don't think I need arrays in my scripts</h2>
<p>Consider the common code:</p>
<pre><code>ARGS=' -f -q'
…
command $ARGS # unquoted variables are a bad code-smell anyway</code></pre>
<p>Here, <code>$ARGS</code> is field-separated by <code>$IFS</code>, which we are assuming has the default value. This is fine, as long as <code>$ARGS</code> is known to never need an embedded space; which you do as long as it isn't based on anything outside of the program. But wait until you want to do this:</p>
<pre><code>ARGS=' -f -q'
…
if [[ -f "$filename" ]]; then
ARGS+=" -F $filename"
fi
…
command $ARGS</code></pre>
<p>Now you're hosed if <code>$filename</code> contains a space! More than just breaking, it could have unwanted side effects, such as when someone figures out how to make <code>filename='foo --dangerous-flag'</code>.</p>
<p>Compare that with the array version:</p>
<pre><code>ARGS=(-f -q)
…
if [[ -f "$filename" ]]; then
ARGS+=(-F "$filename")
fi
…
command "${ARGS[@]}"</code></pre>
<h2 id="what-about-portability">What about portability?</h2>
<p>Except for the little stubs that call another program with <code>"$@"</code> at the end, trying to write for multiple shells (including the ambiguous <code>/bin/sh</code>) is not a task for mere mortals. If you do try that, your best bet is probably sticking to POSIX. Arrays are not POSIX; except for the arguments array, which is; though getting subset arrays from <code>$@</code> and <code>$*</code> is not (tip: use <code>set --</code> to re-purpose the arguments array).</p>
<p>Writing for various versions of Bash, though, is pretty do-able. Everything here works all the way back in bash-2.0 (December 1996), with the following exceptions:</p>
<ul>
<li><p>The <code>+=</code> operator wasn't added until Bash 3.1.</p>
<ul>
<li>As a work-around, use <code>array[${#array[*]}]=<var>word</var></code> to append a single element.</li>
</ul></li>
<li><p>Accessing subset arrays of the arguments array is inconsistent if <var>pos</var>=0 in <code>${@:<var>pos</var>:<var>len</var>}</code>.</p>
<ul>
<li>In Bash 2.x and 3.x, it works as expected, except that argument 0 is silently missing. For example <code>${@:0:3}</code> gives arguments 1 and 2; where <code>${@:1:3}</code> gives arguments 1, 2, and 3. This means that if <var>pos</var>=0, then only <var>len</var>-1 arguments are given back.</li>
<li>In Bash 4.0, argument 0 can be accessed, but if <var>pos</var>=0, then it only gives back <var>len</var>-1 arguments. So, <code>${@:0:3}</code> gives arguments 0 and 1.</li>
<li>In Bash 4.1 and higher, it works in the way described in the main part of this document.</li>
</ul></li>
</ul>
<p>Now, Bash 1.x doesn't have arrays at all. <code>$@</code> and <code>$*</code> work, but using <code>:</code> to select a range of elements from them doesn't. Good thing most boxes have been updated since 1996!</p>
Luke Shumakerhttps://lukeshu.com/lukeshu@sbcglobal.net
<p>The content of this page is Copyright © 2013 <a href="mailto:lukeshu@sbcglobal.net">Luke Shumaker</a>.</p>
<p>This page is licensed under the <a href="https://creativecommons.org/licenses/by-sa/3.0/">CC BY-SA-3.0</a> license.</p>
https://lukeshu.com/blog/git-go-pre-commit.html
2014-01-26T17:00:58-05:00
2013-10-12T00:00:00+00:00
A git pre-commit hook for automatically formatting Go code
<h1 id="a-git-pre-commit-hook-for-automatically-formatting-go-code">A git pre-commit hook for automatically formatting Go code</h1>
<p>One of the (many) wonderful things about the Go programming language is the <code>gofmt</code> tool, which formats your source in a canonical way. I thought it would be nice to integrate this in my <code>git</code> workflow by adding it in a pre-commit hook to automatically format my source code when I committed it.</p>
<p>The Go distribution contains a git pre-commit hook that checks whether the source code is formatted, and aborts the commit if it isn't. I don't remember if I was aware of this at the time (or if it even existed at the time, or if it is new), but I wanted it to go ahead and format the code for me.</p>
<p>I found a few solutions online, but they were all missing something—support for partial commits. I frequently use <code>git add -p</code>/<code>git gui</code> to commit a subset of the changes I've made to a file, the existing solutions would end up adding the entire set of changes to my commit.</p>
<p>I ended up writing a solution that only formats the version of the that is staged for commit; here's my <code>.git/hooks/pre-commit</code>:</p>
<pre><code>#!/bin/bash
# This would only loop over files that are already staged for commit.
# git diff --cached --numstat |
# while read add del file; do
# …
# done
shopt -s globstar
for file in **/*.go; do
tmp="$(mktemp "$file.bak.XXXXXXXXXX")"
mv "$file" "$tmp"
git checkout "$file"
gofmt -w "$file"
git add "$file"
mv "$tmp" "$file"
done</code></pre>
<p>It's still not perfect. It will try to operate on every <code>*.go</code> file—which might do weird things if you have a file that hasn't been checked in at all. This also has the effect of formatting files that were checked in without being formatted, but weren't modified in this commit.</p>
<p>I don't remember why I did that—as you can see from the comment, I knew how to only select files that were staged for commit. I haven't worked on any projects in Go in a while—if I return to one of them, and remember why I did that, I will update this page.</p>
Luke Shumakerhttps://lukeshu.com/lukeshu@sbcglobal.net
<p>The content of this page is Copyright © 2013 <a href="mailto:lukeshu@sbcglobal.net">Luke Shumaker</a>.</p>
<p>This page is licensed under the <a href="http://www.wtfpl.net/txt/copying/">WTFPL-2</a> license.</p>
https://lukeshu.com/blog/fd_printf.html
2016-02-28T07:12:18-05:00
2013-10-12T00:00:00+00:00
`dprintf`: print formatted text directly to a file descriptor
<h1 id="dprintf-print-formatted-text-directly-to-a-file-descriptor"><code>dprintf</code>: print formatted text directly to a file descriptor</h1>
<p>This already existed as <code>dprintf(3)</code>. I now feel stupid for having Implemented <code>fd_printf</code>.</p>
<p>The original post is as follows:</p>
<hr />
<p>I wrote this while debugging some code, and thought it might be useful to others:</p>
<pre><code>#define _GNU_SOURCE /* vasprintf() */
#include <stdarg.h> /* va_start()/va_end() */
#include <stdio.h> /* vasprintf() */
#include <stdlib.h> /* free() */
#include <unistd.h> /* write() */
int
fd_printf(int fd, const char *format, ...)
{
va_list arg;
int len;
char *str;
va_start(arg, format);
len = vasprintf(&str, format, arg);
va_end(arg);
write(fd, str, len);
free(str);
return len;
}</code></pre>
<p>It is a version of <code>printf</code> that prints to a file descriptor—where <code>fprintf</code> prints to a <code>FILE*</code> data structure.</p>
<p>The appeal of this is that <code>FILE*</code> I/O is buffered—which means mixing it with raw file descriptor I/O is going to produce weird results.</p>
Luke Shumakerhttps://lukeshu.com/lukeshu@sbcglobal.net
<p>The content of this page is Copyright © 2013 <a href="mailto:lukeshu@sbcglobal.net">Luke Shumaker</a>.</p>
<p>This page is licensed under the <a href="http://www.wtfpl.net/txt/copying/">WTFPL-2</a> license.</p>
https://lukeshu.com/blog/emacs-as-an-os.html
2014-01-26T17:00:58-05:00
2013-08-29T00:00:00+00:00
Emacs as an operating system
<h1 id="emacs-as-an-operating-system">Emacs as an operating system</h1>
<p>This was originally published on <a href="https://news.ycombinator.com/item?id=6292742">Hacker News</a> on 2013-08-29.</p>
<p>Calling Emacs an OS is dubious, it certainly isn't a general purpose OS, and won't run on real hardware. But, let me make the case that Emacs is an OS.</p>
<p>Emacs has two parts, the C part, and the Emacs Lisp part.</p>
<p>The C part isn't just a Lisp interpreter, it is a Lisp Machine emulator. It doesn't particularly resemble any of the real Lisp machines. The TCP, Keyboard/Mouse, display support, and filesystem are done at the hardware level (the operations to work with these things are among the primitive operations provided by the hardware). Of these, the display being handled by the hardware isn't particularly uncommon, historically; the filesystem is a little stranger.</p>
<p>The Lisp part of Emacs is the operating system that runs on that emulated hardware. It's not a particularly powerful OS, it not a multitasking system. It has many packages available for it (though not until recently was there a official package manager). It has reasonably powerful IPC mechanisms. It has shells, mail clients (MUAs and MSAs), web browsers, web servers and more, all written entirely in Emacs Lisp.</p>
<p>You might say, "but a lot of that is being done by the host operating system!" Sure, some of it is, but all of it is sufficiently low level. If you wanted to share the filesystem with another OS running in a VM, you might do it by sharing it as a network filesystem; this is necessary when the VM OS is not designed around running in a VM. However, because Emacs OS will always be running in the Emacs VM, we can optimize it by having the Emacs VM include processor features mapping the native OS, and have the Emacs OS be aware of them. It would be slower and more code to do that all over the network.</p>
Luke Shumakerhttps://lukeshu.com/lukeshu@sbcglobal.net
<p>The content of this page is Copyright © 2013 <a href="mailto:lukeshu@sbcglobal.net">Luke Shumaker</a>.</p>
<p>This page is licensed under the <a href="https://creativecommons.org/licenses/by-sa/3.0/">CC BY-SA-3.0</a> license.</p>
https://lukeshu.com/blog/emacs-shells.html
2016-02-28T07:12:18-05:00
2013-04-09T00:00:00+00:00
A summary of Emacs' bundled shell and terminal modes
<h1 id="a-summary-of-emacs-bundled-shell-and-terminal-modes">A summary of Emacs' bundled shell and terminal modes</h1>
<p>This is based on a post on <a href="http://www.reddit.com/r/emacs/comments/1bzl8b/how_can_i_get_a_dumbersimpler_shell_in_emacs/c9blzyb">reddit</a>, published on 2013-04-09.</p>
<p>Emacs comes bundled with a few different shell and terminal modes. It can be hard to keep them straight. What's the difference between <code>M-x term</code> and <code>M-x ansi-term</code>?</p>
<p>Here's a good breakdown of the different bundled shells and terminals for Emacs, from dumbest to most Emacs-y.</p>
<h2 id="term-mode">term-mode</h2>
<p>Your VT100-esque terminal emulator; it does what most terminal programs do. Ncurses-things work OK, but dumping large amounts of text can be slow. By default it asks you which shell to run, defaulting to the environmental variable <code>$SHELL</code> (<code>/bin/bash</code> for me). There are two modes of operation:</p>
<ul>
<li>char mode: Keys are sent immediately to the shell (including keys that are normally Emacs keystrokes), with the following exceptions:</li>
<li><code>(term-escape-char) (term-escape-char)</code> sends <code>(term-escape-char)</code> to the shell (see above for what the default value is).</li>
<li><code>(term-escape-char) <anything-else></code> is like doing equates to <code>C-x <anything-else></code> in normal Emacs.</li>
<li><code>(term-escape-char) C-j</code> switches to line mode.</li>
<li>line mode: Editing is done like in a normal Emacs buffer, <code><enter></code> sends the current line to the shell. This is useful for working with a program's output.</li>
<li><code>C-c C-k</code> switches to char mode.</li>
</ul>
<p>This mode is activated with</p>
<pre><code>; Creates or switches to an existing "*terminal*" buffer.
; The default 'term-escape-char' is "C-c"
M-x term</code></pre>
<p>or</p>
<pre><code>; Creates a new "*ansi-term*" or "*ansi-term*<n>" buffer.
; The default 'term-escape-char' is "C-c" and "C-x"
M-x ansi-term</code></pre>
<h2 id="shell-mode">shell-mode</h2>
<p>The name is a misnomer; shell-mode is a terminal emulator, not a shell; it's called that because it is used for running a shell (bash, zsh, …). The idea of this mode is to use an external shell, but make it Emacs-y. History is not handled by the shell, but by Emacs; <code>M-p</code> and <code>M-n</code> access the history, while arrows/<code>C-p</code>/<code>C-n</code> move the point (which is is consistent with other Emacs REPL-type interfaces). It ignores VT100-type terminal colors, and colorizes things itself (it inspects words to see if they are directories, in the case of <code>ls</code>). This has the benefit that it does syntax highlighting on the currently being typed command. Ncurses programs will of course not work. This mode is activated with:</p>
<pre><code>M-x shell</code></pre>
<h2 id="eshell-mode">eshell-mode</h2>
<p>This is a shell+terminal, entirely written in Emacs lisp. (Interestingly, it doesn't set <code>$SHELL</code>, so that will be whatever it was when you launched Emacs). This won't even be running zsh or bash, it will be running "esh", part of Emacs.</p>
Luke Shumakerhttps://lukeshu.com/lukeshu@sbcglobal.net
<p>The content of this page is Copyright © 2013 <a href="mailto:lukeshu@sbcglobal.net">Luke Shumaker</a>.</p>
<p>This page is licensed under the <a href="https://creativecommons.org/licenses/by-sa/3.0/">CC BY-SA-3.0</a> license.</p>
https://lukeshu.com/blog/term-colors.html
2014-01-26T17:00:58-05:00
2013-03-21T00:00:00+00:00
An explanation of common terminal emulator color codes
<h1 id="an-explanation-of-common-terminal-emulator-color-codes">An explanation of common terminal emulator color codes</h1>
<p>This is based on a post on <a href="http://www.reddit.com/r/commandline/comments/1aotaj/solarized_is_a_sixteen_color_palette_designed_for/c8ztxpt?context=1">reddit</a>, published on 2013-03-21.</p>
<blockquote>
<p>So all terminals support the same 256 colors? What about 88 color mode: is that a subset?</p>
</blockquote>
<p>TL;DR: yes</p>
<p>Terminal compatibility is crazy complex, because nobody actually reads the spec, they just write something that is compatible for their tests. Then things have to be compatible with that terminal's quirks.</p>
<p>But, here's how 8-color, 16-color, and 256 color work. IIRC, 88 color is a subset of the 256 color scheme, but I'm not sure.</p>
<p><strong>8 colors: (actually 9)</strong> First we had 8 colors (9 with "default", which doesn't have to be one of the 8). These are always roughly the same color: black, red, green, yellow/orange, blue, purple, cyan, and white, which are colors 0-7 respectively. Color 9 is default.</p>
<p><strong>16 colors: (actually 18)</strong> Later, someone wanted to add more colors, so they added a "bright" attribute. So when bright is on, you get "bright red" instead of "red". Hence 8*2=16 (plus two more for "default" and "bright default").</p>
<p><strong>256 colors: (actually 274)</strong> You may have noticed, colors 0-7 and 9 are used, but 8 isn't. So, someone decided that color 8 should put the terminal into 256 color mode. In this mode, it reads another byte, which is an 8-bit RGB value (2 bits for red, 2 for green, 2 for blue). The bright property has no effect on these colors. However, a terminal can display 256-color-mode colors and 16-color-mode colors at the same time, so you actually get 256+18 colors.</p>
Luke Shumakerhttps://lukeshu.com/lukeshu@sbcglobal.net
<p>The content of this page is Copyright © 2013 <a href="mailto:lukeshu@sbcglobal.net">Luke Shumaker</a>.</p>
<p>This page is licensed under the <a href="https://creativecommons.org/licenses/by-sa/3.0/">CC BY-SA-3.0</a> license.</p>
https://lukeshu.com/blog/fs-licensing-explanation.html
2016-02-28T07:12:18-05:00
2013-02-21T00:00:00+00:00
An explanation of how "copyleft" licensing works
<h1 id="an-explanation-of-how-copyleft-licensing-works">An explanation of how "copyleft" licensing works</h1>
<p>This is based on a post on <a href="http://www.reddit.com/r/freesoftware/comments/18xplw/can_software_be_free_gnu_and_still_be_owned_by_an/c8ixwq2">reddit</a>, published on 2013-02-21.</p>
<blockquote>
<p>While reading the man page for readline I noticed the copyright section said "Readline is Copyright (C) 1989-2011 Free Software Foundation Inc". How can software be both licensed under GNU and copyrighted to a single group? It was my understanding that once code became free it didn't belong to any particular group or individual.</p>
<p>[LiveCode is GPLv3, but also sells non-free licenses] Can you really have the same code under two conflicting licenses? Once licensed under GPL3 wouldn't they too be required to adhere to its rules?</p>
</blockquote>
<p>I believe that GNU/the FSF has an FAQ that addresses this, but I can't find it, so here we go.</p>
<h3 id="glossary">Glossary:</h3>
<ul>
<li>"<em>Copyright</em>" is the right to control how copies are made of something.</li>
<li>Something for which no one holds the copyright is in the "<em>public domain</em>", because anyone ("the public") is allowed to do <em>anything</em> with it.</li>
<li>A "<em>license</em>" is basically a legal document that says "I promise not to sue you if make copies in these specific ways."</li>
<li>A "<em>non-free</em>" license basically says "There are no conditions under which you can make copies that I won't sue you."</li>
<li>A "<em>permissive</em>" (type of free) license basically says "You can do whatever you want, BUT have to give me credit", and is very similar to the public domain. If the copyright holder didn't have the copyright, they couldn't sue you to make sure that you gave them credit, and nobody would have to give them credit.</li>
<li>A "<em>copyleft</em>" (type of free) license basically says, "You can do whatever you want, BUT anyone who gets a copy from you has to be able to do whatever they want too." If the copyright holder didn't have the copyright, they couldn't sue you to make sure that you gave the source to people go got it from you, and non-free versions of these programs would start to exist.</li>
</ul>
<h3 id="specific-questions">Specific questions:</h3>
<p>Readline: The GNU GPL is a copyleft license. If you make a modified version of Readline, and give it to others without letting them have the source code, the FSF will sue you. They can do this because they have the copyright on Readline, and in the GNU GPL (the license they used) it only says that they won't sue you if you distribute the source with the modified version. If they didn't have the copyright, they couldn't sue you, and the GNU GPL would be worthless.</p>
<p>LiveCode: The copyright holder for something is not required to obey the license—the license is only a promise not to sue you; of course they won't sue themselves. They can also offer different terms to different people. They can tell most people "I won't sue you as long as you share the source," but if someone gave them a little money, they might say, "I also promise not sue sue this guy, even if he doesn't give out the source."</p>
Luke Shumakerhttps://lukeshu.com/lukeshu@sbcglobal.net
<p>The content of this page is Copyright © 2013 <a href="mailto:lukeshu@sbcglobal.net">Luke Shumaker</a>.</p>
<p>This page is licensed under the <a href="https://creativecommons.org/licenses/by-sa/3.0/">CC BY-SA-3.0</a> license.</p>
https://lukeshu.com/blog/pacman-overview.html
2016-02-28T07:12:18-05:00
2013-01-23T00:00:00+00:00
A quick overview of usage of the Pacman package manager
<h1 id="a-quick-overview-of-usage-of-the-pacman-package-manager">A quick overview of usage of the Pacman package manager</h1>
<p>This was originally published on <a href="https://news.ycombinator.com/item?id=5101416">Hacker News</a> on 2013-01-23.</p>
<p>Note: I've over-done quotation marks to make it clear when precise wording matters.</p>
<p><code>pacman</code> is a little awkward, but I prefer it to apt/dpkg, which have sub-commands, each with their own flags, some of which are undocumented. pacman, on the other hand, has ALL options documented in one fairly short man page.</p>
<p>The trick to understanding pacman is to understand how it maintains databases of packages, and what it means to "sync".</p>
<p>There are several "databases" that pacman deals with:</p>
<ul>
<li>"the database", (<code>/var/lib/pacman/local/</code>)<br> The database of currently installed packages</li>
<li>"package databases", (<code>/var/lib/pacman/sync/${repo}.db</code>)<br> There is one of these for each repository. It is a file that is fetched over plain http(s) from the server; it is not modified locally, only updated.</li>
</ul>
<p>The "operation" of pacman is set with a capital flag, one of "DQRSTU" (plus <code>-V</code> and <code>-h</code> for version and help). Of these, "DTU" are "low-level" (analogous to dpkg) and "QRS" are "high-level" (analogous to apt).</p>
<p>To give a brief explanation of cover the "high-level" operations, and which databases they deal with:</p>
<ul>
<li>"Q" Queries "the database" of locally installed packages.</li>
<li>"S" deals with "package databases", and Syncing "the database" with them; meaning it installs/updates packages that are in package databases, but not installed on the local system.</li>
<li>"R" Removes packages "the database"; removing them from the local system.</li>
</ul>
<p>The biggest "gotcha" is that "S" deals with all operations with "package databases", not just syncing "the database" with them.</p>
Luke Shumakerhttps://lukeshu.com/lukeshu@sbcglobal.net
<p>The content of this page is Copyright © 2013 <a href="mailto:lukeshu@sbcglobal.net">Luke Shumaker</a>.</p>
<p>This page is licensed under the <a href="https://creativecommons.org/licenses/by-sa/3.0/">CC BY-SA-3.0</a> license.</p>
https://lukeshu.com/blog/poor-system-documentation.html
2014-01-26T17:00:58-05:00
2012-09-12T00:00:00+00:00
Why documentation on GNU/Linux sucks
<h1 id="why-documentation-on-gnulinux-sucks">Why documentation on GNU/Linux sucks</h1>
<p>This is based on a post on <a href="http://www.reddit.com/r/archlinux/comments/zoffo/systemd_we_will_keep_making_it_the_distro_we_like/c66uu57">reddit</a>, published on 2012-09-12.</p>
<p>The documentation situation on GNU/Linux based operating systems is right now a mess. In the world of documentation, there are basically 3 camps, the "UNIX" camp, the "GNU" camp, and the "GNU/Linux" camp.</p>
<p>The UNIX camp is the <code>man</code> page camp, they have quality, terse but informative man pages, on <em>everything</em>, including the system's design and all system files. If it was up to the UNIX camp, <code>man grub.cfg</code>, <code>man grub.d</code>, and <code>man grub-mkconfig_lib</code> would exist and actually be helpful. The man page would either include inline examples, or point you to a directory. If I were to print off all of the man pages, it would actually be a useful manual for the system.</p>
<p>Then GNU camp is the <code>info</code> camp. They basically thought that each piece of software was more complex than a man page could handle. They essentially think that some individual pieces software warrant a book. So, they developed the <code>info</code> system. The info pages are usually quite high quality, but are very long, and a pain if you just want a quick look. The <code>info</code> system can generate good HTML (and PDF, etc.) documentation. But the standard <code>info</code> is awkward as hell to use for non-Emacs users.</p>
<p>Then we have the "GNU/Linux" camp, they use GNU software, but want to use <code>man</code> pages. This means that we get low-quality man pages for GNU software, and then we don't have a good baseline for documentation, developers each try to create their own. The documentation that gets written is frequently either low-quality, or non-standard. A lot of man pages are auto-generated from <code>--help</code> output or info pages, meaning they are either not helpful, or overly verbose with low information density. This camp gets the worst of both worlds, and a few problems of its own.</p>
Luke Shumakerhttps://lukeshu.com/lukeshu@sbcglobal.net
<p>The content of this page is Copyright © 2012 <a href="mailto:lukeshu@sbcglobal.net">Luke Shumaker</a>.</p>
<p>This page is licensed under the <a href="https://creativecommons.org/licenses/by-sa/3.0/">CC BY-SA-3.0</a> license.</p>
https://lukeshu.com/blog/arch-systemd.html
2016-02-28T07:12:18-05:00
2012-09-11T00:00:00+00:00
What Arch Linux's switch to systemd means for users
<h1 id="what-arch-linuxs-switch-to-systemd-means-for-users">What Arch Linux's switch to systemd means for users</h1>
<p>This is based on a post on <a href="http://www.reddit.com/r/archlinux/comments/zoffo/systemd_we_will_keep_making_it_the_distro_we_like/c66nrcb">reddit</a>, published on 2012-09-11.</p>
<p>systemd is a replacement for UNIX System V-style init; instead of having <code>/etc/init.d/*</code> or <code>/etc/rc.d/*</code> scripts, systemd runs in the background to manage them.</p>
<p>This has the <strong>advantages</strong> that there is proper dependency tracking, easing the life of the administrator and allowing for things to be run in parallel safely. It also uses "targets" instead of "init levels", which just makes more sense. It also means that a target can be started or stopped on the fly, such as mounting or unmounting a drive, which has in the past only been done at boot up and shut down.</p>
<p>The <strong>downside</strong> is that it is (allegedly) big, bloated<a href="#fn1" class="footnoteRef" id="fnref1"><sup>1</sup></a>, and does (arguably) more than it should. Why is there a dedicated systemd-fsck? Why does systemd encapsulate the functionality of syslog? That, and it means somebody is standing on my lawn.</p>
<p>The <strong>changes</strong> an Arch user needs to worry about is that everything is being moved out of <code>/etc/rc.conf</code>. Arch users will still have the choice between systemd and SysV-init, but rc.conf is becoming the SysV-init configuration file, rather than the general system configuration file. If you will still be using SysV-init, basically the only thing in rc.conf will be <code>DAEMONS</code>.<a href="#fn2" class="footnoteRef" id="fnref2"><sup>2</sup></a> For now there is compatibility for the variables that used to be there, but that is going away.</p>
<section class="footnotes">
<hr />
<ol>
<li id="fn1"><p><em>I</em> don't think it's bloated, but that is the criticism. Basically, I discount any argument that uses "bloated" without backing it up. I was trying to say that it takes a lot of heat for being bloated, and that there is be some truth to that (the systemd-fsck and syslog comments), but that these claims are largely unsubstantiated, and more along the lines of "I would have done it differently". Maybe your ideas are better, but you haven't written the code.</p>
<p>I personally don't have an opinion either way about SysV-init vs systemd. I recently migrated my boxes to systemd, but that was because the SysV init scripts for NFSv4 in Arch are problematic. I suppose this is another <strong>advantage</strong> I missed: <em>people generally consider systemd "units" to be more robust and easier to write than SysV "scripts".</em></p>
<p>I'm actually not a fan of either. If I had more time on my hands, I'd be running a <code>make</code>-based init system based on a research project IBM did a while ago. So I consider myself fairly objective; my horse isn't in this race.<a href="#fnref1">↩</a></p></li>
<li id="fn2"><p>You can still have <code>USEDMRAID</code>, <code>USELVM</code>, <code>interface</code>, <code>address</code>, <code>netmask</code>, and <code>gateway</code>. But those are minor.<a href="#fnref2">↩</a></p></li>
</ol>
</section>
Luke Shumakerhttps://lukeshu.com/lukeshu@sbcglobal.net
<p>The content of this page is Copyright © 2012 <a href="mailto:lukeshu@sbcglobal.net">Luke Shumaker</a>.</p>
<p>This page is licensed under the <a href="https://creativecommons.org/licenses/by-sa/3.0/">CC BY-SA-3.0</a> license.</p>