Earlier today, while working on troubleshooting an issue, I started seeing some odd behavior in cffile. When uploading a file supplied from a form, the accept attribute was rejecting files that should not be rejected. I decided to set up a little test, to get to the root of the issue.
[gist id=8482307 file=WithoutAccept.cfm bump=3]
I started by uploading a docx file created with LibreOffice 220.127.116.11.
Everything looks as it should. LO-test.docx is being reported as ‘application/vnd.openxmlformats-officedocument.wordprocessingml.document‘. So, next I tweaked the code to only accept files with the MIME type of ‘application/vnd.openxmlformats-officedocument.wordprocessingml.document‘.
[gist id=8482307 file=WithAccept.cfm bump=3]
So, next, I submitted the same docx file to the page.
So, we know from above that it has the MIME type of ‘application/vnd.openxmlformats-officedocument.wordprocessingml.document‘. Why does it now think that it is ‘application/zip‘?
Maybe it is just an issue with LibreOffice. Let’s see how it handles a docx created in Microsoft Office 2010. I removed the accept attribute and reran the script. This time, though, I fed in a docx from MS Word.
Everything looks OK, so far. Next, let’s see what happens when we add the accept attribute back.
So, it accepted it as a ‘application/vnd.openxmlformats-officedocument.wordprocessingml.document‘ and cfdirectory is listing it as a ‘application/vnd.openxmlformats-officedocument.wordprocessingml.document‘ file but the result variable is listing it as ‘application/x-tika-ooxml‘? What the heck? :/
At first, I thought that it might be that particular server configuration, so I ran the same on my local Windows 7 machine.
I get the same exact result on my local machine. I’m thinking something in ACF is borked. What do you think? I have been looking through Adobe’s bug tracker and so far, I do not see this issue. I think my next step is going to be to submit a ticket.
With my code seeing a different MIME type in different contexts, I figured that I would try to identify the actual MIME type. Let’s see what the file command returns.
Joes-MacBook-Air:desktop joe$ file –mime-type test.docx
Joes-MacBook-Air:desktop joe$ file –mime-type LO-test.docx
Let us see what coldfusion.util.MimeTypeUtils says about our two test files. I wrote a very basic little script to see what guessMimeType() says.
<cfset theFile = “#expandpath(‘/mimetype/’)#LO-test.docx”>
<cfset obj = createObject(“java”,”coldfusion.util.MimeTypeUtils”)>
<cfset variables.contenttype = obj.guessMimeType(variables.theFile)>
I started by testing the docx from MS Office. It shows as ‘undefined’.
Is the docx file, generated by LibreOffice detected any better? No.
Is guessMimeType() broken? Let’s try some other file type.
It seems to be able to identify a doc file and a pdf file without a problem. Am I going to have trouble finding a way of identifying a legitimate docx file at all?
Well, I have submitted bug 3695879 to Adobe’s BugBase site. It looks like the only reliable way of determining the MIME type. I am a little curious how cfdirectory determines it, though, since nothing else seems to be able to.
Next week, I’ll have to start working on formulating a work-around.