CLI Arcana: Saved webpage

A candidate installment of the mostly-only-imagined series Command-line arcana**

Perhaps the most arcane invocation I regularly direct into a terminal emulator window these last few years is:

mv !$??* !$

Those familiar with the intricacies of bash, particularly history expansion and parameter expansion patterns, probably won't be challenged by this very much. But to me, even after having used it quite a bit, it still retains a whiff of the bizarre, of dangerously powerful magic. I think what makes it so obscure compared to many other examples is that the arguments are entirely pattern-based and context-dependent.

It starts well enough with the move command, obligately abbreviated for mid twentieth-century tech dude fingers probably not that well versed in proper touch-typing. The command itself is as clear as these things usually ever are. What comes next in most complex command lines, though, usually includes some fragment or snippet recognizably part of a filename or directory name. Here, though, we see two invocations of the history expansion pattern !$, first as the source of the mv operation and next as the destination of the move. History expansion patterns in bash always start with the exclamation mark (!) so that's a helpful clue, but the second character is pretty much something that one just has to recognize. In this case, it refers to the last argument of the previous command.

I tend to use just a few history expansion patterns, the one used here, as well as a couple of others probably best left for discussing some other time. So, we have !$ that takes the argument of the last command, used twice. Then there's the ??*. Any given question mark (?) stands in as a pattern component matched by any single character. The asterisk (*) is probably the most recognized pattern-matching character, widely used for its voracity, the Kleene star. This matches zero or more characters.

The context in which I use this is not that arcane. At least, I don't think it is. Occasionally, I'll save a copy of web page in the Firefox browser on my laptop or desktop computers, which can readily be done via the menu as File -> Save Page as ... or in my case the keyboard shortcut, Control s. A dialogue window appears, and if I haven't saved something just previously in a more obscure place, it'll present a file-chooser dialogue box with the suggested location the downloads folder of my home directory. As the base filename it will offer the title of the web page.

This is where things get a little messy for me. We page titles generally are less concise than I prefer filenames to be. They are several words long and contain lots of spaces and other non-alphanumeric characters. What's more, a web page generally does not map well to a single file. So, on attempting to save a web page Firefox will bin things into two general components. One is simple enough, it contains the html of the web page itself in a single unified file, as presented to the browser (often assembled, no doubt, on the backend from any of a number of combinations of data sources). Other works associated with the web page, often most notably still images, it puts in a new directory. The html file has been saved in a form that refers to the contents of this directory so that the whole can generally be loaded after saving in a form that closely approximates the original appearance of the page. We are a long way away now from the days when any web page could be entirely represented always by a set of static files, but in general this approach of saving a file and a directory of assets works pretty well.

What doesn't work for me is that I now have two entries in my Download folder that are pretty long and ungainly. One has a name that ends in .html and the other has one that ends in _files but the rest of the name is the same. Trading a little bit of complexity for a slight shortening of the name and generally tidying up, I usually deal with these using the command above, thus:

In this example, I'm using a local copy of the Wikipedia page for the Kleene star.

Here we see the .html file and the _files directory, both named using the page title:

$ ls
'Kleene star - Wikipedia_files'  'Kleene star - Wikipedia.html'

The next bit has some additional magic, tab completion. I'll create a new directory, slightly but usefully shorter than either of the originals by 5 or 6 characters. I'll start by typing the command name to create the directory, and the first few shared characters of the existing file and folder and thus of the new folder I want to create:

$ mkdir Kl

Then I'll hit the TAB key and, if I've specified enough letters to make my intentions unambiguous to the shell, it will fill out the rest of what I want:

$ mkdir Kleene\ star\ -\ Wikipedia

In this case, the filled-out version does not unambiguously specify an existing folder or file, it could be either the folder or the file. Were I wanting to operate on something existing, that would not be enough. Since I want to create something new that has exactly this form, the portion shared by both file and folder, it is perfect. So, hit the Enter key and create my new folder.

Now, it is tempting here to issue an ls command to confirm that the folder was created as specified. But that would ruin the flow. So, our little bit of arcana must follow immediately after the mkdir command, because we rely on re-using its argument:

$ mv !$??* !$

The shell helpful shows the command with the history expansion effected:

mv Kleene\ star\ -\ Wikipedia??* Kleene\ star\ -\ Wikipedia

So, I want to move anything two or more characters (??*) longer than, but inclusive of, the basename Kleene\ star\ -\ Wikipedia into the newly created directory with exactly that name.

Inspecting the results shows at the top level only the new, more shortly-named directory remains:

$ ls
'Kleene star - Wikipedia'

If one looks inside that directory, though, we see the two components of the saved page have been moved down a level into it:

$ ls Kleene\ star\ -\ Wikipedia/
'Kleene star - Wikipedia_files'  'Kleene star - Wikipedia.html'

Pages

Categories

Tags