15.10 Guessing MIME Content Types

     

If this were the best of all possible worlds , every protocol and every server would use MIME types to specify the kind of file being transferred. Unfortunately, that's not the case. Not only do we have to deal with older protocols such as FTP that predate MIME, but many HTTP servers that should use MIME don't provide MIME headers at all or lie and provide headers that are incorrect (usually because the server has been misconfigured). The URLConnection class provides two static methods to help programs figure out the MIME type of some data; you can use these if the content type just isn't available or if you have reason to believe that the content type you're given isn't correct. The first of these is URLConnection.guessContentTypeFromName() :

 public static String guessContentTypeFromName(String name)  [1]    

[1] This method is protected in Java 1.3 and earlier, public in Java 1.4 and later.

This method tries to guess the content type of an object based upon the extension in the filename portion of the object's URL. It returns its best guess about the content type as a String . This guess is likely to be correct; people follow some fairly regular conventions when thinking up filenames.

The guesses are determined by the content-types.properties file, normally located in the jre/lib directory. On Unix, Java may also look at the mailcap file to help it guess. Table 15-1 shows the guesses the JDK 1.5 makes. These vary a little from one version of the JDK to the next .

Table 15-1. Java extension content-type mappings

Extension

MIME content type

No extension, or unrecognized extension

content/unknown

.saveme, .dump, .hqx, .arc, .o, .a, .z, .bin, .exe, .zip, .gz

application/octet-stream

.oda

application/oda

.pdf

application/pdf

.eps, .ai, .ps

application/postscript

.dvi

application/x-dvi

.hdf

application/x-hdf

.latex

application/x-latex

.nc, .cdf

application/x-netcdf

.tex

application/x-tex :

.texinfo, .texi

application/x-texinfo

.t, .tr, .roff

application/x-troff

.man

application/x-troff-man

.me

application/x-troff-me

.ms

application/x-troff-ms

.src, .wsrc

application/x-wais-source

.zip

application/zip

.bcpio

application/x-bcpio

.cpio

application/x-cpio

.gtar

application/x-gtar

.sh, .shar

application/x-shar

.sv4cpio

application/x-sv4cpio :

.sv4crc

application/x-sv4crc

.tar

application/x-tar

.ustar

application/x-ustar

.snd, .au

audio/basic

.aifc, .aif, .aiff

audio/x-aiff

.wav

audio/x-wav

.gif

image/gif

.ief

image/ief

.jfif, .jfif-tbnl, .jpe, .jpg, .jpeg

image/jpeg

.tif, .tiff

image/tiff

.fpx, .fpix

image/vnd.fpx

.ras

image/x-cmu-rast

.pnm

image/x-portable-anymap

.pbm

image/x-portable-bitmap

.pgm

image/x-portable-graymap

.ppm

image/x-portable-pixmap

.rgb

image/x-rgb

.xbm, .xpm

image/x-xbitmap

.xwd

image/x-xwindowdump

.png

image/png

.htm, .html

text/html

.text, .c, .cc, .c++, .h, .pl, .txt, .java, .el

text/plain

.tsv

text/tab-separated-values

.etx

text/x-setext

.mpg, .mpe, .mpeg

video/mpeg

.mov, .qt

video/quicktime

.avi

application/x-troff-msvideo

.movie, .mv

video/x-sgi-movie

.mime

message/rfc822

.xml

application/xml


This list is not complete by any means. For instance, it omits various XML applications such as RDF ( .rdf ), XSL ( .xsl ), and so on that should have the MIME type application/xml . It also doesn't provide a MIME type for CSS stylesheets ( .css ). However, it's a good start.

The second MIME type guesser method is URLConnection.guessContentTypeFromStream() :

 public static String guessContentTypeFromStream(InputStream in) 

This method tries to guess the content type by looking at the first few bytes of data in the stream. For this method to work, the InputStream must support marking so that you can return to the beginning of the stream after the first bytes have been read. Java 1.5 inspects the first 11 bytes of the InputStream , although sometimes fewer bytes are needed to make an identification. Table 15-2 shows how Java 1.5 guesses. Note that these guesses are often not as reliable as the guesses made by the previous method. For example, an XML document that begins with a comment rather than an XML declaration would be mislabeled as an HTML file. This method should be used only as a last resort.

Table 15-2. Java first bytes content-type mappings

First bytes in hexadecimal

First bytes in ASCII

MIME content type

0xACED

 

application/x-java-serialized-object

0xCAFEBABE

 

application/java-vm

0x47494638

GIF8

image/gif

0x23646566

#def

image/x-bitmap

0x2158504D32

!XPM2

image/x-pixmap

0x89504E 470D0A1A0A

 

image/png

0x2E736E64

 

audio/basic

0x646E732E

 

audio/basic

0x3C3F786D6C

<?xml

application/xml

0xFEFF003C003F00F7

 

application/xml

0xFFFE3C003F00F700

 

application/xml

0x3C21

<!

text/html

0x3C68746D6C

<html

text/html

0x3C626F6479

<body

text/html

0x3C68656164

<head

text/html

0x3C48544D4C

<HTML

text/html

0x3C424F4459

<BODY

text/html

0x3C48454144

<HEAD

text/html

0xFFD8FFE0

 

image/jpeg

0xFFD8FFEE

 

image/jpeg

0xFFD8FFE1XXXX4578696600 [2]

 

image/jpeg

0x89504E470D0A1A0A

 

image/png

0x52494646

RIFF

audio/x-wav

0xD0CF11E0A1B11AE1 [3]

 

image/vnd.fpx


[2] The XX bytes are not checked. They can be anything.

[3] This actually just checks for a Microsoft structured storage document. Several other more complicated checks have to be made before deciding whether this is indeed an image/vnd.fpx document.

ASCII mappings, where they exist, are case-sensitive. For example, guessContentTypeFromStream( ) does not recognize <Html> as the beginning of a text/html file.



Java Network Programming
Java Network Programming, Third Edition
ISBN: 0596007213
EAN: 2147483647
Year: 2003
Pages: 164

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net