Jump to content

In fact, I'm using ChatGPT to help me create a script, just in case you weren't convinced. But before you leave in AI frustration, what I am attempting to do is create a script that will help me automate sorting through terabytes, and terabytes of data.

 

My home server is an absolute disaster. When I was first getting into home servers I developed a bad habit of copying everything from one server to another anytime I was scared something was about to go horribly wrong. And, I never deleted those copies. So now I may have twelve copies of a file, all in different directories and possibly with different file names. I've searched, and searched for an existing program that could help me sort through this mess, but the closest I've come to a turn-key solution is a program called AllDup. It allowed me to at least scan and compare all files on the server, bit by bit, and generate a table. I've been using that table to manually go through all my files but it's becoming increasingly difficult to find motivation to do so; I've been working on solving this problem for at least five years, off and on.

 

I know there's a way (probably Python) to automate this process based on my parameters. Maybe not all of it, but some would be better than none. All I really want to do is retain a copy of each file before I wipe the volume. The problem is some, like all my image files from Clonezilla, depend on maintaining the directory and file structure, while others might be a random desktop background or text document in some random place. I worked out a process (see attached) but I can already see issues where I could end up with more duplicate files.

 

I don't want anyone to solve this problem for me, but if you could be so kind as to look over my process and comment on efficiency, possible issues, anything I'd appreciate it.

Process01.png

Member 4250

Link to comment
https://linustechtips.com/topic/1574848-i-am-not-a-programmer-duplicate-file-script/
Share on other sites

Link to post
Share on other sites

Before you proceed I suggest you make a 1:1 cold backup (attach some drives, copy to them, disconnect them).

I know kinda ironic, and potentially expensive (depending how much data you have)... but if you script goes horribly wrong you won't have to pay for data recovery and hope something important isn't lost.
And the chances of your script doing something horribly wrong just increased with:

54 minutes ago, RockerBug17 said:

In fact, I'm using ChatGPT to help me create a script

😆

 

54 minutes ago, RockerBug17 said:

I know there's a way (probably Python)

Sure, any language will do, pick the one you are most confortable with.

 

54 minutes ago, RockerBug17 said:

I worked out a process (see attached) but I can already see issues where I could end up with more duplicate files.

Yep:
image.png.135e08d2306f08b1f54ee821435a4be3.png
Extremely simplified example, say you have two files, work.docx and dog.jpg:

  • backup1/desktop/work.docx 
  • backup2/desktop/work.docx
  • backup2/desktop/dog.jpg
  • backup3/documents/work.docx

The end result would be:

  • unified/desktop/work.docx
  • unified/desktop/dog.jpg
  • unified/documents/work.docx

Where work.docx is a duplicte. (obviously don't compare based on file names, just keeping it simple here).

 

And to be honest I don't see a way around this, you will either have to

  1. Go manually through duplicates in different subdirectories and pick which ones to keep.
  2. Keep only one file and create symbolic links (or shortcutcs) in other directories pointing to that file.
    This would save on space, but it won't fix organisational issues.


Now all that being said, maybe it is not as big of an issue as it might seem...
Why not create a dry run?
Create a script which would do as you've desribed in your (kinda) flow diagram, but without actually moving/copying anything.
Instead just log how many duplicates will remain after all the simulated work is done.

If it is something you can go through in an afternoon, then it isn't worth solving/automating.

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×