-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hashdeep manifest/audit file handling #3
Comments
It sounds like you're currently storing them as sidecars which sounds like it makes sense to me? |
Of course you wanna store them in a DB as well, but sidecars are very handy.. Unless I'm misreading.. |
Yeah! Ideally I would store them as sidecars along with the AIP on tape. This is a first pass, and since I'm using In the current setup/folder structure I have there's nowhere sane for the sidecar to live and not screw up the audit. So... I have to think about it a bit. In a previous iteration I was using bagit... which I am not sure I want to do. I also looked at your manifest script which has an option to store the manifest in the target directory. I may try to adopt that one? |
I dunno if my manifest script would help..
Looks like the manifest is not part of the audit there, I'm curious to know what's happening on your end - the audit is failing cos it's picking up the manifest? I was using md5deep at the start and then moved to custom hashlib based scripts just to get some more control and to limite dependencies. i like your manifest though via hashdeep, never seen them like that before. I don't like bagit either.. |
Also @kieranjol I just read your blog post... So timely! Thanks for sharing that real life application of this stuff. |
Cheers! |
Hey thanks for checking it out! The audit was failing as i first wrote it bc the manifest was getting written to the directory getting manifested... So the audit was like 'wtf is this manifest doing here??' Both the manifest and audit functions now write to Yes my hands are kind of tied with the built-in hashdeep behavior but interestingly there's an open issue from 2012 the hashdeep source that proposes an 'ignore this file' function. I just need to learn C and fix the problem myself! |
Ah... I'd never heard of i think the bagit style way of doing things might make sense. We have a slightly bloated system like this:
and the manifest looks like this:
There's reasons why we have that parentID on its own, with the UUID folder and manifest beneath.. Basically that OE id relates to the SPECTRUM collections management procedures. Prior to accessioning, we register objects in an an Object Entry register - if they are accessioned, they get an aaaXXXXX number and that OE number gets renamed. Kinda awkward, but the UUID is a sort of permanent ID that doesn't change from when the file gets OE'd. It's not the best solution, but it as a compromise that allowed us to make progress. it works very well with our systems anyhow. |
Neat! Lol, Thanks for your continued input! It's so appreciated! |
BTW - I see that you have colons in your audit filenames, I think these might start to cause problems when writing to LTO - we definitely had issues anyhow -https://www.ibm.com/support/knowledgecenter/en/STQNYL_2.4.0/ltfs_restricted_characters.html |
Thanks for the heads up! I thought they might be an issue somewhere down
the line....
On Apr 1, 2018 5:42 AM, "kieranjol" <notifications@github.com> wrote:
BTW - I see that you have colons in your audit filenames, I think these
might start to cause problems when writing to LTO - we definitely had
issues anyhow -
https://www.ibm.com/support/knowledgecenter/en/STQNYL_2.4.0/ltfs_restricted_characters.html
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#3 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/APR9X7nzVTOcsWEf2N4hzcauZPHdw4cYks5tkMsqgaJpZM4TBB_3>
.
|
I restructured the SIP directory to include a parent folder that encloses the package and is named the same as the ingest UUID. This feels a little dirty to me but maybe it's no big deal. I will close this issue once I think that name clash through a bit more. The This brings to mind another question: I previously had the SIPs rsync from the output directory to a staging area for writing to LTO. This was deemed necessary since the processing computer and the machine attached to the LTO decks were not the same. Since they are now on the same host, I should introduce logic for that and skip the pointless movement of files on the same filesystem. (*open a new issue) Note to self: check viability of One issue (is it an issue?) this SIP structure does create is that it breaks the logic that checks if a given object has been ingested and is already sitting in the output dir (based on the temp id that is formed from a hash of the filepath of the input object). I should change that logic to check something more robust (reading the db for an entry for an object with the same name? is that too strict?) or get rid of it and trust ourselves to not ingest something a million times. (*open new issue) Here's the current SIP structure: 1a9430d4-e36e-47ae-85dd-9e7800e50ea5/ |
I guess the audit/manifest/log could sit within that UUID folder alongside your objects and metadata dirs, it's not the prettiest but it could work and it would remove the need for the parent duplicate ID? Something like:
You might need to alter copy scripts and such to reflect it but it could work... |
Yeah I think if I just make manifest for and validate the Also, I don't think there's a pretty way to do this at all, so meh. The And thanks again! Your comments are really appreciated! |
Yeah, I guess the Actually a lot of my recent scripts have been based around logging changes and updating manifests. I think I'd feel a bit better having checksums for the whole package, but it definitely does complicate things if you need to update stuff. |
I had poked around in your update scripts and I think at some point I'll need to consider that type of process. I'm not totally sure yet how to approach it and until I come up with a specific need I think I will lurk a bit on this question. In the mean time I think I will try out the structure mentioned above that validates just the |
Now hashdeep manifests and audit files are written to the parent directory of the dir being hashed/audited. That's silly, so come up with a better plan.
Plan A: write the files to
/tmp
and then store them as blobs (and as text?) in the db.Plan B: store the files locally in a permanent storage place (really don't like this idea, but will prob start here until the db is ready)
The text was updated successfully, but these errors were encountered: