11 March 2007

OOXML hoax 6: ISO fasttracking requires a perfect format

In the Office document format war there is a lot of critique on the OOXML format specifications especially now that Ecma has submitted the format for ISO fasttracking.

It seems like many people do not understand the purpose of ISO fasttracking standardization. The fasttracking method is a standardization process to easily guide existing industry standards into an ISO standard. It is not meant for creating a new standard from scratch but to get existing technology which has a broad basis into ISO.

ISO standards are meant to be used and ISO has a pragmatic policy that it needs to provide standards that are have a market requirement. See also this post by Rick Jeliffe on ISO standards: http://www.oreillynet.com/xml/blog/2007/02/what_is_a_standard_at_iso_1.html
OOXML is a standard originating in the MS Office 2003 XML formats, SpreadsheetML, WordprocessingML and PresentationML, formats which first showed in the august 2000 Office XP beta. This means that the formats are already well established in describing the features of Office documents. Important differences to those old formats in OOXML are the use of the Open Packaging Convention and the Markup Compatibility and Extensibility and putting embedded (binary) files in as separate files in the package and remove the binary content from the XML files. Also the OOXML specs old markup languages have been augmented to take tons of examples and a good structuring for implementation by using parent and child elements being defined everywhere. And finally the VML vector language format is being replaced by the new DrawingML format (although VML is kept in the spec for compatibility with the MS Office 2003 XML formats that used VML).

This is an example of why the format is not perfect. It carries with it the burden of backwards compatibility and the amount of extensive features being used in MS Office. But it also combines this with a spec that has a lot of possible issues in it already dealt with. It would be very hard to stamp out a completely new format that could be used by an Office suite like MS Office. Certainly a spec like ODF which is pretty good would not suffice as it leaves a lot of things still undefined or up to implementation. ODF is improving this by improving it's specs and extending them in newer versions and also by trying to create a set of reference documents to further define it's implementations. Still it is has proved very hard to get an implementation that could implement the entire ODF spec even 20 months after it has been standardized by OASIS even without looking at formula's. http://testsuite.opendocumentfellowship.org/summary.html . This shows that implementing a complex Office spec even when you already have a full Office suite as a basis is a process that can takes years.

This is basically what Microsoft has done with it's Office format. It has taken more than 6 years to put it's Office suite towards a full XML implementation that is also backwards compatible with it's billions of legacy documents. Then it has opened up this format for everyone to use by standardizing it trough Ecma. It does carry some scars from that development but it also is a spec that is in full use with many of it's key markup elements like the formula's having proven themselves over a longer period of time. In ISO fasttracking that is important as it show that the format has pedigree in the real world and that it has use in the market of today.

OOXML is a format that has a foundation in the market and has most of it's features proven in the last 6 years. It does not have perfection but then again neither has it's competitor that is still being worked upon. For the ISO national bodies that market foundation and proven track record are important aspects for approving this format. The Ecma standard does have several newer less proven elements for sure, which have to be considered in this process, but as of now, those new elements are the elements that seem to be least criticised which also show that the development of the format is improving toward the future.

The ISO fasttracking process does not need perfection but it relies on a standard that is based on existing technology that will be massivly used in the future. This is exactly what OOXML will prove to be and I think that is why ISO will approve this standard as an ISO standard despite the protests from the open source community and MS competitors.

The Wraith


Ben Langhinrichs said...

As someone who is not either a member of the "open source" movement nor a competitor to Microsoft (I am head of an ISV that is both an IBM and Microsoft partner), I respectfully disagree. The XML formats in Microsoft Office have not been widely used, and there were many decisions made in the formation of OOXML which were not wise, in my opinion. There was an overly developed sense of reliance on what would be easy to implement in MS Office and not on what would be good in a standard, even if it would only have required a small amount of tweaking to accomplish the latter goal. The bitmask issue is a good example. It would have required very little to implement such settings as flags, but although this was suggested to Microsoft before the finalization of the standard, they chose to ignore that advice. Some people say this was a deliberate attempt to build a format that was difficult for others to implement. I am less sure that was true, and think it was more a careless effort to get the standard out the door with an assumption that nobody would object. I think Microsoft made a series of mistakes which will prove costly, although hardly fatal.

To be clear, I just this last week joined the OASIS ODF T.C., and am looking into the possibility of joining the ECMA organization in a similar role, although it is considerably more expensive.

The Wraith said...

I agree that the bitmasks are a quite ugly feature in OOXML. Although they are only few and do not really limit implementation or interoperablity as they seem defined correctly.

But as far as we have seen the issues brought up by the ISO natinal bodies that was typically not amongst the ones listed in the reply by Ecma. So either they are not really that important (which seems likely) or Ecma has not addressed them in the answers yet and could still do that later.

I think that Ecma could for instance give some promise/commitment that it will change certain things in future versions if ISO would prefer an alternative method being used.

We have noticed that Opendocument with it's flaws has also made it trough standardization fairly easy whilst have still issues attached to it and with a lot less proven format.

gopi said...

"It seems like many people do not understand the purpose of ISO fasttracking standardization. The fasttracking method is a standardization process to easily guide existing industry standards into an ISO standard."

I don't think that OOXML counts as an existing industry standard. There's only one real implementation of the standard, Microsoft's. When I hear the phrase, "existing industry standard," I think of things that many different people have already implemented, which are in widespread use.

If something is in widespread use by multiple apps, it's reasonable to adopt it as a formal standard, even if it's got things that you might've preferred to do differently.

Widespread adoption means that it can't be _that_ bad, and also means that it's impractical to make significant changes. It also means that whatever documentation is already around is sufficient for interoperability.

In the case of OOXML, there's only one implementation. Nobody's taken the documentation and tried to implement the standard. Only one vendor's apps use it.

My understanding is that the justification for adopting an existing standard is:

1. The level of current use is significant enough to demonstrate that it is a workable standard. A bad candidate for standardization may have problems such as being tightly coupled with implementation details, or have serious efficiency problems. If OOXML passes this test, then _any_ file format you write documentation for should be standardizeable.

Nobody really knows how easy it is to implement OOXML interoperability. The components I've read through sound unreasonably difficult to implement - speaking as a computer scientist.

2. It's not practical to make changes to the standard because it's already been too widely implemented. Given that OOXML is a single-vendor format right now, it really would not be challenging for Microsoft to modify the standard to achieve more consistency and ease of implementation.

It seems to me that the size and complexity of the standard is such that it really should be more fully analyzed. It's a single vendor's standard without industry support.

The Wraith said...

Firstly OOXML is based on XML formats that were already in use for years. Also at the time of standardization a full implementation of the standard was available in the leading Office product.

Secondly, ODF which was standardized 20 months ago still hasn't gotten a single full implementation , not even of OOo which already started with the format before its standardisation , let alone two fully interoperabel ones.

Thirdly, I bet you a a nice krate of beer that 20 months after standardization OOXML will have been producing more documents and have more programs implementing it than ODF has after 20 months. Microsoft is an industry leader in Office products. Many people may not like that but it does not change the fact that OOXML will also be a leading industry format in a very short time.

BobFolkerts said...

How do you recommend that bitmasks found in OOXML be implemented with standard XML tools? Writing XSLT to perform basic bitmask operations is possible if you make assumptions about the Endian-ness of the computer architecture. In my opinion, which I hope is widely shared, is that XML was introduced to allow for documents to be exchanged between any computers that are able to exchange text documents. If my XSLT has to worry about big vs. little Endian of the computer that runs the XSLT, then it seems that we have a perversion of XML. Therefore, bitmasks should be prohibited. Should I not expect to be able to transform between OOXML and ODF using XSLT?

The Wraith said...

Actually I would not recommend using bitmasks. With any luck they will not make it to the next version or will be deprecated them. However they are not a big issue but rather a minor inperfection and you can actually verify them trough XLST.

As you might have noticed ODF has already revised it's first version and a much bigger revision is still underway. Clealry at the time it was standardised the ODF spec wasn't perfect. In fact it lacked some fundamental stuff. However that was not a reason for dismissing it. Standards can grow and improve.

What i hope is that OOXML also will also improve and I would certainly approve if during the ISO standardization there would be commitment from Ecma and Microsoft to improve the standard in certain area's.