Web Informant #149, 29 March 1999:
Beware of Microsoft's XML

http://www.strom.com/awards/149.html

One of the most interesting innovations in Microsoft's latest beta of its Office 2000 suite is the way it uses XML, or Extensible Markup Language, as its common file format. This has important implications to those of us who create and exchange documents, and it will affect what we use to author web pages in the future.

Not long ago, I used to receive non-Microsoft Office documents from my correspondents. Now I rarely receive that errant Word Perfect or Lotus 1-2-3 file. When I do I often castigate my correspondents and tell them to send me their Microsoft equivalents. Not because I love Microsoft products, but because that is what the world uses. Remember revisable-form text? Gone. Remember non-PowerPoint presentations? Nearly extinct. Microsoft Office is the default document interchange standard today.

But to truly make documents interchangeable, one burning issue remains issue: file compatibility. When Office 97 came out, people using Office 95 or earlier versions couldn't read the newer document formats. On the one hand, this incompatibility encourages people to switch to the new version. But it makes upgrading painful for corporations that want to exchange information easily, seamlessly.

To get around the problem this time, Microsoft has chosen a standards-based format, XML, for all its Office applications. (Yes, Internet Explorer 5 also supports XML, but that isn't really the point of my discussions here.) Let's call this support MS-XML, to distinguish it from the standards effort.

To test this format, I saved a few Word and PowerPoint 2000 files in the MS-XML format and then opened them in a text editor. What I found surprised me: a single HTML file that used to be intelligible was transformed into multiple files.

With my Word and PowerPoint documents I have a huge HTML/XML header that includes all sorts of font metric definitions and meta tags and file information. The body text does some neat tricks to preserve font size, font color, justification and other font information. All this makes it easier to exchange Word and PowerPoint 2000 document with someone else, even people who aren't running Office 2000. If they have IE 5 or another browser that supports MS-XML, they can see the style and layout of my text, which HTML has done poorly since day one.

One other feature in Word 2000 makes life easier for web authors. You can save documents directly to your web site via FTP. Once you enter the URL, username and password, the FTP site appears on your local directory tree as just another location. Nice. Very nice.

The downside is that I must make a pact with the devil. Once I go down the route of saving my pages as MS-XML, the naked code may become unintelligible. The pages also take up more room and so take a bit longer to download and view.

Now, I am not an XML programmer, or even any kind of programmer. I have purposely kept my web pages sparse and relatively devoid of "advanced" features, in the name of being browser agnostic and universally viewable. I fear that the more people use Word 2000, the more that MS-XML will replace ordinary HTML code on the web.

What about PowerPoint 2000? Earlier versions of PowerPoint could publish presentations to the web. While easy to use, this produced rather clunky code and a long series of files. The new and XML-ized version produces a single "pointer file" that contains code enumerating the other files in a separate directory that comprises your PowerPoint slide show.

Why so many files? Each is essentially a style sheet for different purposes: one for all the XML-capable browsers (guess who?), one that uses CSS, one that uses Javascript. Again, this makes it easy to publish your work to the web and exchange it with others.

If you buy this, then the idea of using Microsoft for putting IE into the operating system becomes a minor sideshow. With Office 2000, something bigger is at stake, to capture all the current non-MS Office users, those few hardy holdouts who use Lotus and Corel tools to create their documents, spreadsheets and presentations.

And while they are at it, Microsoft also wants to capture those who use non-MS tools for writing web pages. The underlying effort is to be the single document interchange vendor for everyone, even for folks who don't run Windows on their desktop. And MS-XML will be the Trojan Horse to pull this off.

Microsoft is trying to move people away from ordinary HTML v3 documents and make Office 2000 the standard tool for web authoring. And while earlier efforts (Front Page most memorable) haven't really caught on, I think this time Office 2000 has a solid chance.

And while I welcome the advances in file compatibility that Office 2000 brings to the party, it has a price in terms of page readability and size that you might not want to pay.

Self promotions dep't

This article is adapted from a piece I wrote for XML.com and used with their permission. If you want to read up more on XML, there are a number of other resources at this site, along with pointers to the Web Journal issue on XML.

Speaking of adaptations, portions of a chapter from the famously useful book "Internet Messaging" by Marshall Rose and I has been printed in the Internet Protocol Journal. Entitled Secure Email, you can grab an Acrobat version of it on line here.

David Strom
david@strom.com
+1 (516) 944-3407
back issues
entire contents copyright 1999 by David Strom, Inc.
Web Informant is ® registered trademark with the U.S. Patent and Trademark Office