16 March 2007

OOXML hoax 7: The binary blob

Regularly when OOXML is discussed it is referred to as a XML that can contain binary blob. In the past the Office 2003 XML formats did indeed contains embedded binary date within the file. This of course due to the fact that this format consisted of a single XML file. With OOXML a big change is now that the binary embedded formats are put in the package as separate files.

So is binary data within XML a thing of the past.
No, not really, you can still find it here:
http://http://docs.oasis-open.org/office/v1.1/OS/OpenDocument-v1.1.pdf
Yes, right in the middle of the Opendocument.
It is called the office:binary-data element.
It refers to a base64 binary encoded element within the XML.

So why does no one ever mention that ODF can contain binary data within the XML ?

11 March 2007

OOXML hoax 6: ISO fasttracking requires a perfect format

In the Office document format war there is a lot of critique on the OOXML format specifications especially now that Ecma has submitted the format for ISO fasttracking.

It seems like many people do not understand the purpose of ISO fasttracking standardization. The fasttracking method is a standardization process to easily guide existing industry standards into an ISO standard. It is not meant for creating a new standard from scratch but to get existing technology which has a broad basis into ISO.

ISO standards are meant to be used and ISO has a pragmatic policy that it needs to provide standards that are have a market requirement. See also this post by Rick Jeliffe on ISO standards: http://www.oreillynet.com/xml/blog/2007/02/what_is_a_standard_at_iso_1.html
OOXML is a standard originating in the MS Office 2003 XML formats, SpreadsheetML, WordprocessingML and PresentationML, formats which first showed in the august 2000 Office XP beta. This means that the formats are already well established in describing the features of Office documents. Important differences to those old formats in OOXML are the use of the Open Packaging Convention and the Markup Compatibility and Extensibility and putting embedded (binary) files in as separate files in the package and remove the binary content from the XML files. Also the OOXML specs old markup languages have been augmented to take tons of examples and a good structuring for implementation by using parent and child elements being defined everywhere. And finally the VML vector language format is being replaced by the new DrawingML format (although VML is kept in the spec for compatibility with the MS Office 2003 XML formats that used VML).

This is an example of why the format is not perfect. It carries with it the burden of backwards compatibility and the amount of extensive features being used in MS Office. But it also combines this with a spec that has a lot of possible issues in it already dealt with. It would be very hard to stamp out a completely new format that could be used by an Office suite like MS Office. Certainly a spec like ODF which is pretty good would not suffice as it leaves a lot of things still undefined or up to implementation. ODF is improving this by improving it's specs and extending them in newer versions and also by trying to create a set of reference documents to further define it's implementations. Still it is has proved very hard to get an implementation that could implement the entire ODF spec even 20 months after it has been standardized by OASIS even without looking at formula's. http://testsuite.opendocumentfellowship.org/summary.html . This shows that implementing a complex Office spec even when you already have a full Office suite as a basis is a process that can takes years.

This is basically what Microsoft has done with it's Office format. It has taken more than 6 years to put it's Office suite towards a full XML implementation that is also backwards compatible with it's billions of legacy documents. Then it has opened up this format for everyone to use by standardizing it trough Ecma. It does carry some scars from that development but it also is a spec that is in full use with many of it's key markup elements like the formula's having proven themselves over a longer period of time. In ISO fasttracking that is important as it show that the format has pedigree in the real world and that it has use in the market of today.

OOXML is a format that has a foundation in the market and has most of it's features proven in the last 6 years. It does not have perfection but then again neither has it's competitor that is still being worked upon. For the ISO national bodies that market foundation and proven track record are important aspects for approving this format. The Ecma standard does have several newer less proven elements for sure, which have to be considered in this process, but as of now, those new elements are the elements that seem to be least criticised which also show that the development of the format is improving toward the future.

The ISO fasttracking process does not need perfection but it relies on a standard that is based on existing technology that will be massivly used in the future. This is exactly what OOXML will prove to be and I think that is why ISO will approve this standard as an ISO standard despite the protests from the open source community and MS competitors.

The Wraith

05 March 2007

Banned from Groklaw

A first for me in 20 years of Internet. I got banned from a public discussion.
I was banned from Groklaw because of excessive commenting and using unacceptable language. I do admit having posted quite a few comments on Groklaw. Probably more than 50 or so in the last month. How should I know what would be excessive. I consider the quality of Groklaw objections information to be very poor and one-sided and the initiative of Groklaw to start a mailing campaign towards ISO as very pathetic and that might have shown in lots of critical commenting on Groklaw on which people often refer to the objections pages like if it were a bible. A few quick comments are easily made then.

I do have serious objections though against being accused of unacceptable language. the worst I remember is probably stating once or twice that something is complete nonsense or bull or something in that order. Nothing which would cause anyone a headache. I have posted on probably about 30 blogs or sites about OOXML in the last half year and most bloggers seem very open to another opinion about ooxml. The 'not so open' blog that is Groklaw however isn't to pleased with pro-ooxml comments. I'd wish they themselves were a bit more open about that.

And if anyone from Groklaw reads this. Yes, I use multiple IP addresses as me, my girlfriend, my parents, my friend and my work and sometimes an available open wireless have different addresses. I mostly post anonymous as registering everywhere is a lot of effort nowadays and Groklaw also removed quite a few posts that were clearly signed with The Wraith and a link to this blog. But PJ, it is fine if you and you minions want to censor the answers on groklaw to be just pro-Groklaw. I'll find other platforms that do appreciate reasonable two-way discussion on the OOXML debate.

The Wraith

02 March 2007

Ecma responded to OOXML contradictions

Ecma has responded to the issues raised by the national bodies during the 30-contradictions period in the fasttracking process.

The Ecma response in PDF form can be found here:
http://www.computerworld.com/pdfs/Ecma.pdf
It also contains the issues raised by the national bodies.

Astounding is that the comments of the national bodies sometimes seem to be directly copied from the Grokdoc objections pages. The suggestions on that Grokdoc pages are mostly dubious at best with comment by for instance XML expert Rick Jelliffe on the discussion pages ignored.
http://www.grokdoc.net/index.php/Talk:EOOXML_objections#Bitmasks_cause_significant_validation_problems