Removing Exif Data from Images in Your Website With Rake and CI Build Pipeline
Exif?
You know that when you take a photo, or edit one you found on the internet with an image program, there is going to be a lot of extra data attached to those images. It’s going to be what is known as EXIF Exchangeable Image File Format
data. So what might this be? Let’s take an example. If you have ImageMagic installed via your standard package manager you can query any image for Exif data like:
There’s a lot more to the funny cat that we first think! This output is already heavily truncated but we can learn about the camera settings that was used, that some sort of Samsung device took the photo and even at which GPS coordinates it was taken at! That’s pretty cool, because photo apps can then draw a map and show your photos on it, or show them to you in a timeline.
However if you’re publishing images to your website, maybe you want to remove some of this information for whatever reason. Maybe you have the need to keep your anonymity? Then this data can be used to profile you easily. When you edit an image with Gimp, Photoshop etc., there will be some metadata stamping that such an image editor was used - which you may or may not want people to know that you used. Maybe an attacker could learn what version of an image program you use and make this information to their advantage for an attack? What do you I know, there might be any reason that you want to not share huge amount of extra data about your photos when you publish them on your website.
So what might you be able to do about this?
Creating Rake Tasks to Detect and Remove Exif Data
Task automation using great tools to the rescue! ExifTool let’s you easily view, modify and remove Exif data for one or many files. Check out the manual here. For my blog, this blog, I’m using Jekyll A Static Site Generator (SSG) built with ruby. Popularized as of its adoption in GitHub Pages.
which is a ruby project. Thus it’s natural for me to use Rake create myself some build tasks in a Rakefile
just like the good old days with Makefile
! I’ll show you here how you can create rake tasks that lets you detect if any of your images have Exif data, a task for removing all of those and how you can integrate this to you ci build pipeline.
Installing the Tooling
Given the ruby setting, the most available way to get ExifTool installed for my Jekyll project was by using the exiftool_vendored Gem. Simply add to your Gemfile
something like
and run the usual $ gem install
after.
Alright, so how can we use it? Checking the manual tells us that we can require the gem in a ruby file and then we can access the path to the exiftool binary via Exiftool.command
:
That’s a great first step; we can now from ruby code make Exif operations on images!
Creating Rake Tasks
Now let’s make use of these new powers by looking for all images in the website source directory, then check for each of them if there are any Exif data in there. I’m using Jekyll and thus I would like to scan asset/images
which is where you would typically put your images in a Jekyll project. There happen to be a specific folder in there, favicons-gen
which I want to stay untouched as I want them to be the way that realfavicongenerator.net produced them. To my Rakefile
I start by adding this utility function that we will use in rake tasks later on:
It can surely be done in many different ways, but for now this does the job. With this handy function, it’s now a breeze to create us a rake task that simply prints all images in the source tree that does have Exif data:
We could have made this just 3 lines if we wanted. Now it’s as easy as to type $ bundle exec rake exif_find
.
Let’s say that we have a workflow of uploading many photos, and we just always want to remove any Exif data. Then we can remove those by again using the utility function we created and then let the ExifTool remove the data for all those images. From the ExifTool manual we find that to simply remove all Exif data, we should use the -all=
argument to the program.
Not so hard!
CI Integration
If you want, you could automatically run the exif_clean
task as part of your build step in you ci setup even. For me however, I want to review any changes locally and commit them to my git repo first after I’m sure the files are good. Thus I could create a simple task that just fail the build if there are any Exif data detected. I have a meta rake task called ci
which I would extend with this check. In my case, I let Netlify to build & host my site and I have configured their build system to simply call bundle exec rake ci
. This is the relevant portion from the Rakefile
:
git commit && git push
and that’s it :)
Leave a comment
Your email address will not be published. Required fields are marked *