Alleged Literature >> Damian Cugley >> 2003 >> Oct.

Damian Cugley’s Archive

ASP’s Response.Write disaster

Thurs. 9 Oct. 2003

Were you foolish enough to be creating a web application with ASP, then you will be used to using the Write method of the Response object to stream HTML to the web browser of your client. Today I got caught out by a serious limitation of the implementation of Response.Write. (And this is not the first time it has leapt out and bit me, either.)

The fundamental problem is caused by a decision made when designing COM all those years ago. Micrsoft intended, sensibly enough, that all character data processed by COM objects should use Unicode, which, at the time, could be all represented correctly using 16-bit integers (Unicode version 3 breaks this, but that is another story). The COM convention is that all strings are passed as UTF-16LE. Most software still represented strings as byte sequences; Europeans with one byte per character, the Asians using variable-length multi-byte sequences. Calling COM methods always requires transcoding between the legacy encoding and UTF-16.

Thus the Response.Write method is designed to consume character data (supplied as UTF-16), even though it immediately encodes as whatever single-byte (or MBCS) encoding is the default on your computer. Thus an ASP page served by a British computer, say, will convert the character data to the Windows-1252 encoding:

Characters Z o ë
Unicode 2018 005A 006F 00CB 2019
Windows-1252 0x91 0x5A 0x6F 0xCB 0x92

So long as you only want to generate character data in Windows-1252 then that is OK. But there are times when this is not what you want.

On a European Union web site one may want to be able to generate text that includes both Greek and Latin scripts. The only sensible way to do this is to use UTF-8. Writing a function to convert Unicode character data to a byte sequence is simple enough (in fact, recent versions of the Win32 API have a function that does it for you):

CharactersZoë
Unicode2018005A006F00CB2019
UTF-8E2 80 985A6FC3 8BE2 80 98

But, having generated a byte sequence, how do you make the Response object pass the bytes unchanged to the recipient?

Theoretically you can transmit data verbatim by passing it to Response.BinaryWrite. For this, you need to create data of type ‘variant containing array of byte’, which is something of a tall order in VBScript. By writing a COM component in a language like C++, one can create a VARIANT, and a SAFEARRAY and copy buffers around and generally write about 25 lines of code in order to create a byte sequence you can pass to VBScript, which it can then pass to Response.BinaryWrite. I have done this in the past, and it really is an unreasonable amount of work. What’s more, the VB programmers will not thank you for it, because working with byte arrays is even more inconvenient in VB than working with strings.

In practice I have used a trick that works fairly well. Remember that Response.Write applies the Win32 encoding routine (called WideCharToMultiByte) to the data you pass to it. So we convert our byte sequence to character data using the reverse function, MultiByteToWideChar:

Windows-1252E280 985A6FC38BE28099
Unicode00E220AC 02DC005A006F00C3203900E220AC2122
Charactersâ ˜ZoÃâ

When this is run through the transcoder again, the UTF-8 byte sequence is restored. The web browser will decode it to generate the original character data. Or, if it does not realize the data is in UTF-8 format, it may display the gibberish characters shown above.

In general, if you have a byte sequence you have to pass over a COM boundary, you will end up using this kludge. Mostly this works and no-one notices the continual transcodings back and forth between MBCS and Unicode. Mark Hammond’s Win32 extensions for Python use the same technique when passing Python strings (= byte sequences) through the COM barrier (Unicode strings get passed unchanged).

To extend ASP to generate PDF, I wanted to create the COM component in Python, partly because that way we can use ReportLab’s excellent PDF toolkit. After some frustration before I worked out that the _reg_clsctx_ attribute of the Python class needs to be set to pythoncom.CLSCTX_LOCAL_SERVER. The default value causes ASP to display an error message, which is, as usual, meaningless. Apart from that, creating COM servers in Python is pretty straightforward. Having written COM servers in C++ before now, I am very impressed at how easy Mark Hammond makes it look.

I spent a day, more or less, on working out how to do two apparently trivial things: intercept the XML data stream and pass it to my Python code (via COM). That done, it was pretty smooth sailing for a bit, using Fredrik Lundh’s elementtree package to parse the XML and ReportLab’s PLATYPUS to render it as PDF. Everthing was going swimmingly.

Then suddenly it all went wrong again. First ASP errors (probably caused by a typo) but then I discovered that whenever I visited the PDF page, Adobe Acrobat Reader complaied that the file was corrupt. Oh no! I changed the script to set its content-type to text/plain, so I could see the raw PDF data. It was strangely short. I changed the script so that it printed the number of characters (representing bytes) returned by my COM object. The total was still correct. Somehow I was feeding 52K of data in to Response.Write and only 2K was coming out. My best guess is that PDF files can contain zero bytes, and that Response.Write treats its argument as a 0-terminated string (even though it is a BSTR, which includes a character count). Amateurs. I am going to have to come up with a new way of getting my bytes out of Python and on to the WWW.

In conclusion, the trick I described above works OK, so long as your byte sequence contains noi zeros. When your bytes originate as some form of character data that will definitely be the case, so don’t worry. Just do not expect ASP to be any good for binary formats like PDF.

Next time I could try writing the whole server from scratch in Python and hope my bossess don’t notice.