HTML Proof Your Site in a CI Build Pipeline
Following up on my blog post Removing Exif Data from Images in Your Website With Rake and CI Build Pipeline, I have a another nice feature that I added to this website’s CI build (for me Rake + Netlify) that I would like to share. Basically what I wanted to have is automatic detection links that I have in blog posts that are dead and needs to be updated. I soon found the tool HTMLProofer which does exactly this and more like general sanity check of the HTML.
I wanted to integrate this to my build flow so that a deployment will be stopped if there are an broken links in a new or old blog post. Daniel Sieger wrote a nice post on how to do this in Jekyll that built my own integration with. It’s actually really simple. Start by adding the html-proofer
Gem to the project:
$ gem 'html-proofer'
Then add a new task :htmlproof
in Rakefile
:
desc "Validates HTML files wit htmlproofer..."
task :htmlproof => :build do
puts "Checking HTML with htmlproofer...".bold
require 'html-proofer'
options = { :assume_extension => true,
:allow_hash_href => true,
:alt_ignore => [%r{/assets/images/teasers/.*}],
:check_favicon => true,
:file_ignore => [
%r{/blog/\d+/(\d+/(\d+/)?)?index.html},
/google.*\.html/,
],
:url_ignore => [
"/blog/blog",
"/blog/general",
"/blog/management",
"/blog/tech",
%r{.*erikw.me/page\d+},
%r{/tags/.*},
],
}
HTMLProofer.check_directory("./_site", options).run
end
Wooha what is going on here? A lot! Actually not so much. Let’s break it down:
- A new task is created, that depends on that the project is already
:build
‘t - Define a few options for the tool. As you can see, there are quite a few files or URLs that I decided to ignore. Most of these are generated automatically by the theme that I use as a base for this site, and I’m OK with that the tool ignore these.
- Then we simply ask htmlproofer to check HTML files in my
_site/
directory
Then, as the base command I’ve instructed Netlify to use for building my project is $ bundle exec rake ci
, it’s as simple as just making my :ci
task depend on this task:
I found it too distracting to integrate it directly to the :test
task which is run by the default rake task when I develop locally. For me it’s fine to run the :ci
locally if I’m unsure or rather just push to remote and ask for forgiveness in case the CI build would complain. For this personal website, this makes sense as only I have access to the git and a mistake is not going to block deployment for anyone but me.
That’s it – no more broken links or HTML tags; at least until the next CI build!
Leave a comment
Your email address will not be published. Required fields are marked *