02 December 2007

OOXML: Google search engine supports IBM FUD campaign ?

IBM's Rob Weir recently created a blogpost on the use of Office Open XML files in the real world. Strangely enough he uses the Google search engine searching the web to prove dat OOXML is not really used in the world. Is this because he understands that ODF is really more semi w3c webformat than a serieus Office format? A more respectable way to look at Office documents would be to ask companies using Office software but apperantly IBM thinks differently. But it get even Weirder.

In his blogpost Rob find uses the Google search engine to determine that there is less than 2,000 Office Open XML files available on the internet compared to 160,000 ODF files. That is interesting because it is totally ridiculous. The numbers by Google do not add up at all as it shows no increase in OOXML files at all and just looking at them made me very suspicious. I tried a similar search of docx files using the Live search search engine:
http://search.live.com/results.aspx?q=contains%3Adocx&mkt=us-us (69,000)
http://search.live.com/results.aspx?q=contains%3Apptx&mkt=us-us (41,000)
http://search.live.com/results.aspx?q=contains%3Axlsx&mkt=us-us (14,000)

An example of this Google blindness for instance this link:http://blogpictures.members.winisp.net/saas.pptx which at this time can be found trough Live search and trough Yahoo search but not trough Google search.

So where Rob using Google can not find more than 2,000 actual Office Open XML files I can easily find 124,000 pages that contain one or more Office Open XML files.

It becomes easy to manipulate the figures when Google is on your side ?

7 comments:

Chris said...

The confusion seems to come in because MSN Live Search looks for a description metatag which contains docx, (syntax contains: docx), whereas Google has a special filetype count facility facility that counts the number of docx files that can be pulled off websites.

So in the former case, you pull up mentions about docx, and these are increasing, but you can't always download a document, and the latter, you get access to the document itself - I count 935 and it is declining.

So whatever you do, don't send a resume in .docx format to the average company, because they'll have difficulty opening it.

The Wraith said...

Chris, you are wrong.
Live search counts pages that contain direct links to the Office Open XML files. I verified a lot of links and they all contained the actual OOXML files.
Also I verified quite a number of the these links that were found. When put directly in the search it show that Google does not recognize them but that both Live search and Yahoo search find them. So evidently Google is nog indexing them whilst the other search engines are.
This is very odd because Google indexes arbitrary filestypes beteer than this and to have only one percent of the Office Open XML files indexed would mayby suggested a delibarate filtering of Office Open XML files out of the Google results.

Joe Strange said...

Interesting. So you count links to OOXML documents instead of the documents themselves?

How do you reconcile the two numbers?

Also, why is this? Google has no motive to ignore OOXML files that I can imagine. For the file you found that was not included, did you check if robots.txt blocked it (yes, you can choose to block some spiders but not others)? It's hard to imagine anything but a new site that's not in Google's index these days, but who knows.

The Wraith said...

@joe strange
I do not just count links to ooxml files. I verified a lot of those links.
Google does not recognize the links but Live search does.
These files can be opened and downloaded from the internet but google does not show them.

Evidently you seem not try any of the links. If you did you could easily verify that hey are actual files on the internet and easily downloadable. But Google does not index them...

Vexorian said...

The mere fact you require MS' covenant not to sue makes OOXML a total fraud as an open standard. So, shut up shill.

And regarding your conspiracy theory, ever thought that maybe live search is the biased engine?

The Wraith said...

@vexorian
You are aware that the Opendocument format requires the covenants not to sue from Sun and a promise IBM to release their patent claims on ODF technology ?

By your suggestion that a covenant makes a format a total fraud I must conclude you think the ODF format is a total fraud to.

May you are unaware but standardization organisazations like OASIS and Ecma International actually require members that participate in standards development to release their patent claims on the standard. So it is actually normal for organizations that work on standard to give out a license for patents unless they do no own patents (of course the main ODF contributors have a lot more patents than
Microsoft has).

Vexorian said...

Sorry for the late response, it is just that I don't care about this blog at all I just found your reply by coincidence.

It is fun when MS people or drones claim that the patent issues on OOXML can be reflected on ODF, anyways, Bruce Perens has recently made an statement regarding this that's a lot more insightful than what I would be able to write in a rush:

If I remember correctly his statement is somewhere around the comments at:

http://slashdot.org/article.pl?sid=08/03/13/1559204