问题描述
在这种情况下对md5sum file1"和md5sum file2"进行系统调用并比较两个返回值是否足够?
Is making system call to "md5sum file1" and "md5sum file2" and compare two return values enough in this case?
推荐答案
好吧,这将告诉您它们是绝对不同还是可能相同.可能两个文件具有相同的散列但实际上不具有相同的数据...只是不太可能.
Well, that will tell you whether they're definitely different or probably the same. It's possible for two files to have the same hash but not actually have the same data... just very unlikely.
在您的情况下,如果您得到误报(即,如果您认为它们相同,但事实并非如此),会有什么影响?如果冲突只会意外发生,MD5 可能就足够了,不用担心冲突......但如果你的安全(或金钱)处于危险之中,并且有人可能会用与好"文件相同的哈希,你不应该依赖它.
In your situation, what is the impact if you get a false positive (i.e. if you think they're the same, but they're not)? MD5 is probably good enough not to worry about collisions if they would only occur accidentally... but if you've got security (or money) at stake and someone could plant a "bad" file with the same hash as a "good" file, you shouldn't rely on it.
就个人而言,我可能只是读取两个文件,比较每个字节 - 对于一次性比较,散列和这种方法都需要在它们相等时读取整个文件;正如丹尼尔在评论中指出的那样,进行逐字节比较可以让您在看到差异时尽早退出.首先比较文件大小是另一个快速优化:)
Personally, I'd probably just read both files, comparing each byte - for a one off comparison, both the hashing and this approach will require reading the whole file when they're equal; as Daniel points out in the comments, doing a byte-by-byte comparison lets you exit early as soon as you see a difference. Comparing the file sizes first is another quick optimization :)
当您将现有文件的哈希存储在某处时,哈希的一般优势就会出现,这样下次您可以只需读取新文件.
The general advantage of hashing occurs when you store the hash of the existing file somewhere, so that next time you can just read the new file.
这篇关于如何在Python中检测两个文件是否相同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!


大气响应式网络建站服务公司织梦模板
高端大气html5设计公司网站源码
织梦dede网页模板下载素材销售下载站平台(带会员中心带筛选)
财税代理公司注册代理记账网站织梦模板(带手机端)
成人高考自考在职研究生教育机构网站源码(带手机端)
高端HTML5响应式企业集团通用类网站织梦模板(自适应手机端)