text/plain MIME Type and Python

When you do echo "x" > my_file and then check its MIME type using file --mime-type my_file it would say text/plain. But, when you do the same in Python by

with open("my_file_2", "w") as fp:
fp.write("x")

and then check the MIME type it would say application/octet-stream. What’s the difference?

For the impatient

echo adds a new line to file which tells the file utility it is a text file.

For the curious

When I saw this question on StackOverflow, I was really stumped due to the following reasons:

  1. I didn’t know the file utility can be used to get the mime-type of the file. I thought MIME Type is only relevant in the context of web server and clients. After all, MIME stands for Multipurpose Internet Mail Extensions
  2. I thought operating systems usually use the file extension to decide the file type, by extension the mime type. Don’t the OSes warn when we touch the extension part of the files while renaming, all the time? So, how does file utility do this on a files without any extension?

Adding extensions

Lets try adding extensions:

$ echo "x" > some_file.txt
$ file --mime-type some_file.txt
some_file.txt: text/plain

Okay, that’s all good. Now to the Python side:

with open("some_file_2.txt", "w") as fp:
fp.write("x")
$ file --mime-type some_file_2.txt
some_file_2.txt: application/octet-stream

What? file doesn’t recognise file extensions?

The OS conspiracy theory

Maybe echo writes the mimetype as a metadata onto the disk because echo is a system utility and it knows to do that and in Python the user (me) doesn’t know how to? Clearly the operating system utilities are a cabal of some forbidden knowledge. And I am going to uncover that today, starting with the file utility which seems to have different answers to different programs.

How does ‘file’ determine MIME Type?

Answers to this question has some useful information:

How do you change the MIME type of a file from the terminal?

  1. MIME Type is a fictional value. There is no inherent metadata field that stores MIME Types of files.
  2. Each operating system uses a different technique to decide file type. Windows uses file extension, Mac OS uses type creator & type codes and Unix uses magic numbers.
  3. The file command guesses file type by reading the content and looking for magic numbers and strings.

Time to reveal the magic

Let us peer into the souls of these files in their purest forms where there is no magic but only 1s and 0s. So, I printed the binary representation of the two files.

$ xxd -b my_file
00000000: 01111000 00001010 x.

$ xxd -b my_file_2
00000000: 01111000 x

The file generated by echo has two bytes (notice the . after the x) whereas the file I created with Python only has one byte. What is that second byte?

>>> number = int('00001010', 2)
>>> chr(number)
'n'

And it turns out like every movie on magic, there is no such thing as magic. Just clever people putting new lines to tell file it is a text file.

Creating a trick

Now that the trick is revealed, lets create our own magic trick

$ echo "<file></file>" > xml_file
$ file --mime-type xml_file
xml_file: text/plain

$ echo "<?xml version="1.0"?><file></file>" > xml_file
$ file --mime-type xml_file
xml_file: text/xml
  1. https://www.baeldung.com/linux/file-mime-types
  2. https://unix.stackexchange.com/questions/185216/file-command-apparently-returning-wrong-mime-type
  3. https://stackoverflow.com/questions/29017725/how-do-you-change-the-mime-type-of-a-file-from-the-terminal