website/articles/photos_sorting.md
2020-05-15 17:44:25 +02:00

69 lines
2 KiB
Markdown

---
title: How to sort multiples photo directories
date: 2020-05-15
---
Too many backups
----------------
I have a bad habit of fearing losing data, and making a hasty backup when I need
to change machines. Since it's always laborious to compare a directory of photos
with what I already archive, I usually backup the whole directory to deal with
it later.
After a few months (sometimes years), I end up with many backup directories.
Some images are duplicated across these directories, with various sizes and
naming. This make the task of archiving even more difficult.
I recently had to do it and I created some tiny utilities that made this task
much easier. Let's dig in!
Proper naming
-------------
The first thing to do is to properly name an image. I chose the format
`year-month-day_hour-minutes-seconds.ext`, eg. `2020-05-15_14h30m05s.jpg`.
Usually this data is available in the photo
[Exif](https://en.wikipedia.org/wiki/Exif). You can extract it with
[exiftool](https://exiftool.org/).
For instance you can create a small script named `rename-exif-date`:
```sh
#!/bin/sh
# Rename wrt date and hour of shot: 2020-12-25_20h03m12s.jpg
exiftool -d %Y-%m-%d_%Hh%Mm%Ss%%-c.%%le "-filename<CreateDate" "$@"
```
And run it with [fd](https://github.com/sharkdp/fd):
`> fd -t f -x rename-exif-date {}`
That will rename all your files.
Detect thumbnails
-----------------
If you have thumbnails cluttering your directories, you can detect and remove
them with [feh](https://feh.finalrewind.org/).
To remove files smaller than 300x300, create a script name
`remove-small-images`:
```sh
#!/bin/sh
feh --recursive --list --max-dimension $1 --action 'rm %F'
```
`> remove-small-images 300x300`
Detect duplicates
-----------------
Now you should have clean directories, but potentially full of duplicates.
The tool I recommend to detecting (and removing) them is
[dupeGuru](https://dupeguru.voltaicideas.net/).
It can detect perfect duplicates (basically the same file), or similar images
with a similarity threshold based on the content.