使用纯 PHP 验证两个文件是否相同?

Verifying that two files are identical using pure PHP?(使用纯 PHP 验证两个文件是否相同?)
本文介绍了使用纯 PHP 验证两个文件是否相同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

TL;DR:我有一个 CMS 系统,它使用文件内容的 SHA-1 作为文件名来存储附件(不透明文件).鉴于我已经知道两个文件的 SHA-1 哈希匹配,如何验证上传的文件是否真的与存储中的文件匹配?我想要高性能.

TL;DR: I have an CMS system that stores attachments (opaque files) using SHA-1 of the file contents as the filename. How to verify if uploaded file really matches one in the storage, given that I already know that SHA-1 hash matches for both files? I'd like to have high performance.

长版:

当用户向系统上传新文件时,我计算上传文件内容的 SHA-1 哈希值,然后检查存储后端中是否已存在具有相同哈希值的文件.PHP 在我的代码运行之前将上传的文件放在 /tmp 中,然后我对上传的文件运行 sha1sum 以获得文件内容的 SHA-1 哈希值.然后我从计算出的 SHA-1 散列计算扇出并决定 NFS 挂载目录层次结构下的存储目录.(例如,如果文件内容的 SHA-1 哈希值为 37aefc1e145992f2cc16fabadcfe23eede5fb094,则永久文件名是 /nfs/data/files/37/ae/fc1e145992f2cc16eedefadcfe>.除了保存实际的文件内容,我INSERT 为用户提交的元数据(例如Content-Type、原始文件名、日期戳等)在SQL 数据库中添加一个新行.

When an user uploads a new file to the system, I compute SHA-1 hash of the uploaded file contents and then check if a file with identical hash already exists in the storage backend. PHP puts the uploaded file in /tmp before my code gets to run and then I run sha1sum against the uploaded file to get SHA-1 hash of the file contents. I then compute fanout from the computed SHA-1 hash and decide storage directory under NFS mounted directory hierarchy. (For example, if the SHA-1 hash for a file contents is 37aefc1e145992f2cc16fabadcfe23eede5fb094 the permanent file name is /nfs/data/files/37/ae/fc1e145992f2cc16fabadcfe23eede5fb094.) In addition to saving the actual file contents, I INSERT a new line into a SQL database for the user submitted meta data (e.g. Content-Type, original filename, datestamp, etc).

我目前想出的极端情况是,新上传的文件具有 SHA-1 哈希值,该哈希值与存储后端中的现有哈希值相匹配.我知道这种意外发生的变化是天文数字低,但我想确定.(对于故意案例,请参阅 https://shattered.io/)

The corner case I'm currently figuring out is the case where a new uploaded file has SHA-1 hash that matches existing hash in the storage backend. I know that the changes for this happening by accident are astronomically low, but I'd like to be sure. (For on purpose case, see https://shattered.io/)

给定两个文件名$file_a$file_b,如何快速检查两个文件的内容是否相同?假设文件太大加载到内存中.对于 Python,我会使用 filecmp.cmp() 但 PHP 似乎没有任何类似的东西.我知道这可以通过 fread() 完成并在找到不匹配的字节时中止,但我宁愿不编写该代码.

Given two filenames $file_a and $file_b, how to quickly check if both files have identical contents? Assume that files are too big to be loaded into memory. With Python, I'd use filecmp.cmp() but PHP does not seem to have anything similar. I know that this can be done with fread() and aborting if a non-matching byte is found, but I'd rather not write that code.

推荐答案

如果你已经有一个 SHA1 和,你可以简单地做:

If you already have one SHA1 sum, you can simply do:

if ($known_sha1 == sha1_file($new_file))

否则

if (filesize($file_a) == filesize($file_b)
    && md5_file($file_a) == md5_file($file_b)
)

也检查文件大小,以在一定程度上防止散列冲突(这已经非常不可能了).也使用 MD5,因为它比 SHA 算法快得多(但不那么独特).

Checking file size too, to somewhat prevent a hash collision (which is already very unlikely). Also using MD5 because it's significantly faster than the SHA algorithms (but a little less unique).

更新:

这是如何准确地相互比较两个文件.

This is how to exactly compare two files against each other.

function compareFiles($file_a, $file_b)
{
    if (filesize($file_a) != filesize($file_b))
        return false;

    $chunksize = 4096;
    $fp_a = fopen($file_a, 'rb');
    $fp_b = fopen($file_b, 'rb');
        
    while (!feof($fp_a) && !feof($fp_b))
    {
        $d_a = fread($fp_a, $chunksize)
        $d_b = fread($fp_b, $chunksize);
        if ($d_a === false || $d_b === false || $d_a !== $d_b)
        {
            fclose($fp_a);
            fclose($fp_b);
            return false;
        }
    }
 
    fclose($fp_a);
    fclose($fp_b);
          
    return true;
}

这篇关于使用纯 PHP 验证两个文件是否相同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

相关文档推荐

DeepL的翻译效果还是很强大的,如果我们要用php实现DeepL翻译调用,该怎么办呢?以下是代码示例,希望能够帮到需要的朋友。 在这里需要注意,这个DeepL的账户和api申请比较难,不支持中国大陆申请,需要拥有香港或者海外信用卡才行,没账号的话,目前某宝可以
PHP通过phpspreadsheet导入Excel日期,导入系统后,全部变为了4开头的几位数字,这是为什么呢?原因很简单,将Excel的时间设置问文本,我们就能看到该日期本来的数值,上图对应的数值为: 要怎么解决呢?进行数据转换就行,这里可以封装方法,或者用第三方的
mediatemple - can#39;t send email using codeigniter(mediatemple - 无法使用 codeigniter 发送电子邮件)
Laravel Gmail Configuration Error(Laravel Gmail 配置错误)
Problem with using PHPMailer for SMTP(将 PHPMailer 用于 SMTP 的问题)
Issue on how to setup SMTP using PHPMailer in GoDaddy server(关于如何在 GoDaddy 服务器中使用 PHPMailer 设置 SMTP 的问题)