Renaming files, redirecting visitors

July 12, 2009 / Learning myself about the right HTTP codes to use when changing the location of site resources.

I changed a lot of my site’s image names and locations when I switched to the new domain, breaking their file paths. So even after I had set up my global redirect on the old domain there were still a lot of 404s (file not found errors) because even though the requests got from zero2180.net to ztoe.net the rest of the path was different. (And in some cases the original images were gone for good.) A few things were helpful in cleaning up this mess.

Batch renaming files

I wanted to change the names of my main images, like harvey001.jpg, to remove the word “harvey”, 001.jpg, etc. One way to do this is with a batch operation using for, mv and regular expressions, like so:

for f in harvey* ; do mv "$f" "${f#harvey}" ; done

This loops through the files in the current directory that match “harvey-something,” and moves them so that the matched phrase is not replaced with anything. For thumbnails I wanted to rename them with the word “thumb,” thus:

for f in harvey* ; do mv "$f" "thumb${f#harvey}" ; done

This time the matched string to be removed is preceded by a new string, “thumb” (# matches the beginning of the filename and % matches the end).

Permanent Redirects and RedirectMatch

Because I renamed the files and moved them as well I needed a way to redirect (using mod_alias) requests for each image. In some cases it was easiest to use Redirect permanent, where there was a clear one-to-one change, and in others RedirectMatch permanent, where I wanted to apply a pattern match to a set of files.

There’s probably a better way to do this, but this is what I used:

Redirect permanent /photos/0304/harvey http://ztoe.net/photos/harvey
RedirectMatch permanent images/harvey(.*)\.jpg$ 
  http://ztoe.net/photos/harvey/$1.jpg
RedirectMatch permanent thumbs/harvey(.*)\.jpg$
  http://ztoe.net/photos/harvey/thumb$1.jpg

The first line redirects everything requested via the first string to the URL at the end of the line. It works recursively so anything more specific than /photos/0304/harvey is also redirected. The second and third lines match any of the images in /photos/0304/harvey/images (called, for example, harvey001.jpg in both directories) and /photos/0304/harvey/thumbs respectively, and send them to the right place on the new domain, with the correct filename (e.g. 001.jpg and thumb001.jpg).

Telling visitors that something is gone

Where images are gone for good, because I want it that way, sending a 404 can mean that Google, et al. will continue spidering the dead URL for a long time. This is because HTTP error 404 is a catch-all that means the resource is not here, but doesn’t attempt to say anything in particular about that.

To indicate that a resource is permanently gone, with no forwarding address, the correct code is error 410 Gone:

Redirect gone /old/path/to/image.jpg

You can use an ErrorDocument directive to make the resulting message more palatable. I haven’t done this yet because I’d like to see if I can get WordPress to handle 410s while sending the correct header response to the client. (I’m aware of the Ask Apache plugin, but not pysched about the Google Search API Key requirement.)

A better 404 page

For WordPress users the 404 codex page is required reading. Apparently, it was only in recent versions that WordPress began to send 404 header responses when it served a 404 page (meaning that humans might know a file was “not found” but servers didn’t). Less than ideal.

If you want to be notified whenever a client follows a link to your WordPress site that results in the 404 error page, then add the script in the “Writing Friendly Messages” section to your 404.php file. It tells the user what URL they requested that caused the 404, but it also e-mails the site administrator when the referrer variable is set (meaning that your visitor clicked on a link that brought them to the non-existent resource).

Afterthought

I use analytics software on my site, but I rarely inspect the raw server logs. The number of 404s I found in the log files recently was surprising, especially since I learned that 404s don’t necessarily result in web crawlers learning that the resource is never coming back. If you don’t set a forwarding address for resource requests or indicate that the resources is gone, the “mail” keeps coming—and I guess that makes your site look as though nobody’s home.

Comments are closed.


Zero to One-Eighty contains writing on design, opinion, stories and technology.