Tuesday, June 30, 2009

How I think Wikipedia works

I have a mental model of how Wikipedia works and behaves. This may not reflect reality, but it is how I, as an end-user, expect Wikipedia to behave. I think these are reasonable expectations regarding things like standards of proof and balance and that if the real Wikipedia differs substantially from these expectations, then we have a problem.

Please let me know if my mental model differs from reality.

First, I assume that we deal with facts, not opinions. So an editor cannot state a personal opinion such as, "Citizen Kane is the greatest movie ever made", since there is no objective, recognized scale for cinematic greatness.

However, saying, "Citizen Kane topped the list of 'Greatest Films' according to a 2002 poll of directors and film critics by Sight & Sound magazine" would be fine. It is a factual statement, albeit a statement about an opinion, but the factual portion of it is verifiable. It is a fact about an opinion and that is OK.

But if I made the statement, "Citizen Kane is the greatest movie ever made" and cited the Sight & Sound article, this would not be proper, since that article does not establish the fact of the greatest movie, but only the fact of a poll that collected opinions on the greatest movie. A fact about the existence of an opinion (or even a polled opinion) does not assert the truth of the opinion.

Similarly, a statement, "Gone with the Wind has been criticized for its long running time" would not be properly cited by merely referencing a source that states its length as 238 minutes. That citation would merely be evidence of its length, not that its length was inordinate. You need a citation for the length being criticized.

Similarly, if another recognized expert stated, "Gone with the Wind was too short and failed to cover the entire Mitchell novel", then I would expect both opinions to be mentioned, not merely selecting an arbitrary opinion.

I also expect that cited sources have recognized (not merely self-declared) expertise in the area. So, I would find it idiosyncratic if an article on cinema said, "Citizen Kane is the greatest film ever made, according to a fan blog post by Joe Blow, a ophthalmologist in Podunk, Michigan", since he would be a source cited outside any area of recognized expertise.

I also, as a user, expect Wikipedia to give a balanced view of issues. This does not mean equal time to all fringe opinions. Although I expect there to be multiple views presented on the propriety of the Iraq War, I would not expect that someone who believes that Abraham Lincoln was an alien from the planet Quthbral to have a section in the Lincoln article, even if he could cite a blog post or a photocopied article, or self-published book on the subject. Ditto for Flat Earth Society members, holocaust deniers and those who think the Apollo moon landing was filmed on a Hollywood sound stage.

On the other hand, I don't expect that every fact requires a citation. For example as a user, I don't expect to see citations whenever someone says "Mercury is the closest planet to the Sun". Similarly, I would find it odd if someone removed that assertion for lack of a citation.

However, I would be suspicious if someone writes something in the form, "Mercury is the hottest planet because it is closest to the Sun". Although the it is well known that Mercury is the closest planet, it does not follow that it is the hottest. In fact, Venus is the hottest planet. It is a subtle form of editorializing, where an editor can inadvertently introduce personal assumptions into an article. I'm assuming Wikipedia editors are on the watch for this kind of thing.

On the other hand, some things clearly logically follow from known facts. If we know that John Brown was buried on January 23rd, 1582, then we should, absent contrary evidence, safely be able to state that his date of death was on or before January 23rd, 1582. I would not expect someone to revert such a statement as being unfounded, speculation, original research, etc. It logically follows based on our knowledge of how the world works.


Does anyone know whether the above statements have any basis in the aspirations or actual practice of Wikipedia editors and admins? Sadly, my recent reading of some articles suggests that these reasonable expectations are routinely flouted and bear little resemblance to reality.

Labels:

Friday, June 26, 2009

ODF Plugfest



Although the term may be alien to some, "plugfests" have been around for around 20 years. A plugfest is when implementors of the same interface get together and test the interoperability of their products. In the beginning this was done with wired standards, USB, etc. (thus 'plug'). Over the years the term was applied to networking at all higher levels of the protocol stack. The concept is also applicable to document exchange formats like ODF.

We had an ODF Plugfest last week in the Hague. Although we've had interoperability workshops and camps before that attracted a handful of vendors, this was the first one that had nearly universal participation from ODF vendors. I'm not going to recap the details of the plugfest. Others have done that already. But I will share with you some of my conclusions, based on long discussions with other participants, from whose insights I have greatly benefited.

In an ideal world, specifications would be perfect and software applications would be bug-free and users would read the manuals and we would achieve perfect interoperability instantly by anointment of the salubrious unction of standardization. But to the extent this planet's population obdurately persists in imperfection, we are resigned to make additional efforts in pursuit of interoperability. We are not alone in this regard. The only standards that don't need to work on interoperability are those standards that no one implements.

We should use every licit technique at our disposal to give the user the best experience with ODF we can. In a competitive market you can not get away with telling your customer, "Sorry, your spreadsheet doesn't work because page 652, clause 23 says 'should' rather than 'shall'". If you did that you would not have that customer for long. (Unless, of course, you have a monopoly, in which case many seemingly irrational, anti-consumer actions can occur, seemingly without consequences.)

Further, I assert:
  1. Users want real-world interoperability, and not excuses
  2. Real-world interoperability is what users see and achieve in practice
  3. Where vendors have the will to interoperate, achieving interoperability is a known technical problem, with known engineering solutions, but where the will to interoperate is lacking, there are no technical means of compelling interoperability
  4. Interoperability lies at the intersection of technology, engineering standards, competition law, intellectual property and economics. There are no silver bullets, although there are a arsenal of proven techniques that can help to improve interoperability
  5. Achieving interoperability is facilitated by a variety of cooperative activities, including standardization, test case creation, implementation testing, online validators, plugfests, defect collection and reporting
Going forward there is a promising constellation of efforts converging around ODF interoperability:

So, we're moving in the right direction. The key thing will be to sustain the momentum from the Plugfest and transition it into an ongoing effort, a Perpetual and Virtual Plugfest where the effort and the progress is continuous.

[6/29/09: I've received some emails on the photo, so here are the details:

The picture was taken at 3:30PM on the 2nd day of the workshop.

The lens was a Pentax DA 10-17mm "fisheye" zoom at 10mm. So that explains the projection distortion. The graininess and B&W was from post-processing using Nik Software's Silver Efex Pro and Sharpener Pro.]

Labels:

Tuesday, June 23, 2009

ODF TC timeline

I used a variation of this chart at the recent ODF Plugfest in the Netherlands. But the aspect ratio of a presentation slide doesn't suit this type of chart well, so here is a fuller version of what I showed there.

Those who are not familiar with standards development are sometimes amazed at how long it takes to develop a good standard. Perhaps the single-vendor, 6,000 page, 12-month escapade of OOXML in Ecma has skewed expectations. Fortunately, OOXML is the exception, not the rule. Achieving a multi-vendor consensus around a substantial technical standard will always be time-consuming, but it is time that is well spent.

OASIS standards go through several stages of development:

  1. Working Draft (WD)
  2. Committee Draft (CD)
  3. Public Review Draft
  4. Committee Specification
  5. OASIS Standard
Progressing from one step to another is by ballot. The first 4 stages are advanced by vote of the Technical Committee (TC), while the last stage (OASIS Standard) is by a ballot of all OASIS members. As a draft advances through stages 1-4, an increasing degree of consensus is required. So, a CD requires only simple majority, whereas a Committee Specification requires 2/3 approval, with no more than 1/4 disapproval. Some of these stages allow iteration. So we can, and typically do, have several WD's and several CD's.

If you want more detail on the nitty-gritty details, here is a flow chart of the OASIS standards approval process.

I occasionally get a question along the lines of: "What has the ODF TC been doing for the past couple of years?" The following timeline should give you an idea. I've indicated the time spent developing ODF 1.0 and ODF 1.1, along with some other milestone activities, such as the PAS transposition of ISO/IEC 26300, the publication of ODF 1.0 Approved Errata 01 and the creation of the various ODF subcommittees. I've also indicated the dates of each of the ODF 1.2 WD's and CD's.

As you can see, we've been quite busy. After iterating on WD's during 2007 and 2008, we've now moved on to CD's. All of the planned feature work for ODF 1.2 is now completed. The remaining work is to address the various editorial and technical comments that have been submitted to our comment list, as well comments from TC members and JTC1/SC34. The goal is to have no known defects in ODF 1.2 before we send it out for a Public Review. Of course, previously-unknown defects will likely be identified during the Public Review, and we have a process for handling these. I'll comment more on that process, and Public Reviews in general, when we get closer to that stage.

Labels:

Tuesday, June 09, 2009

ODF Lies and Whispers

There is an interesting disinformation campaign being waged against ODF. You won't see this FUD splattered across the front pages of blogs or press releases. It is the kind of stuff that is spread by email and whispers, and you or I rarely will see it in the light of day. But occasionally some of it does cross my desk, and I'd like to share with you some recent examples.

First up is this instance, from a small Baltic republic, where a rather large US-based software company was recently arguing to the national standards committee for the adoption of OOXML instead of ODF. Here are some of the points made by this large company in a letter:

There is no software that currently implements ODF as approved by the ISO

(They then link to Alex Brown's comment from Wikipedia). I think this demonstrates the triangle-trade relationship among Microsoft, Alex Brown (and other bloggers) and Wikipedia, by which Microsoft FUD is laundered via intermediaries to Wikipedia for later reference as newly minted "facts". No wonder one of Microsoft's first actions during their OOXML push was to seize control of the Wikipedia articles on ODF and OOXML via paid consultants. In any case, Alex's claims were rebutted long ago.

ODF has a number (more than a hundred) of technical flaws which haven't been addressed for 3 years despite change requests addressed to OASIS by countries such as Japan and United Kingdom. There are discussions between OASIS and ISO/IEC JTC 1 SC 34 regarding true ownership of ISO ODF, which is a reason why the flaws in ISO ODF aren't being addressed. In a recent SC 34 meeting in Prague a new ISO ODF maintenance committee has been formed because ISO / IEC 26300: 2006 is not being presently maintained.

This is not true. First, the ODF TC has received zero defect reports from any ISO/IEC national body other than Japan. Second, we responded to the Japanese defect report last November. Amazingly, Alex Brown is implicated in this FUD one as well. It was false then and it is false now. At the exact time Alex was quoted in the press as saying the the ODF TC was not acting on defect reports (October 8th, 2008), we had in fact already sent our response to the defect report out to public review (August 7th, 2008) and then completed that reivew (August 22nd), after quite a bit of active technical discussion with the submitter of the original defect report (Murata Mokoto). How Alex translated that into "Their defect reports are being shelved" and "Oasis has not been acting on reports of defects" is beyond me. It must be particularly embarrassing that Murata-san wrote to the OASIS list, within days of Alex's FUD, "I am happy with the way that the errata has been prepared." How could Alex be ignorant of these facts? Why was he lying to the press? How is this conformant with his leadership role in JTC1/SC34 and his participation in BSI? Also observe the triangle-trade route of FUD in this case from Alex to Doug Mahugh to Wikipedia, this time for negative edits in the OASIS article.

IBM currently recommends not using OASIS ODF 1.1 and to instead use OASIS ODF 1.2 which is currently not complete and will not be complete and ISO certified before 2010/2011. OASIS on the other hand have started work on ODF 2.0 which will not be backwards compatible.

This is an odd one, demonstrably false. IBM Lotus Symphony supports ODF 1.1. We have no ODF 1.2 support at present. I wonder where they came up with this one? It is totally bizarre. Although we have started to gather requirements for "ODF-Next", the contents of that version, and to what degree it will be backwards compatible, has not even been discussed by the TC, let alone determined. So this is pure FUD, trying to make ODF sound risky to adopt, and then lying about IBM's support for it, and our position on ODF 1.2.

The list goes on, including claims that no one supports ODF 1.0 or ODF 1.1, etc., but you get the gist of it. The particulars are interesting, of course, but more so the reckless disregard for the truth, and the triangle-trade relationship between notable bloggers, Wikipedia, and Microsoft's whisper campaign.

Another current example is part of Microsoft's attempt to duck and cover from criticism over their interoperability-busting ODF support in Office 2007 SP2. I've heard variations on the following from three different people in three different countries, including from government officials. So it is getting around. It goes something like this:

We (Microsoft) wanted to be more interoperable with ODF. In fact we submitted 15 proposals to the ODF TC to improve interoperability, but IBM and Sun voted them down.

Nice story, but not true. Certainly Microsoft submitted 15 proposals. But they were never voted on by the TC, because Microsoft chose not to advance them for a vote. They opted not to have these proposals considered for ODF 1.2. It was their choice alone and their decision alone not to put these items up for a vote. I would have been fine with whatever decision Microsoft wanted to make in this situation. I'm not criticizing their decision. I'm just saying we need to be clear that the outcome was entirely due to their decision, and not to blame IBM or Sun for Microsoft's choice in this matter.

I think I can trace this FUD back to a May 13th blog post from Doug Mahugh where he wrote:

We then continued submitting proposed solutions to specific interoperability issues, and by the time proposals for ODF 1.2 were cut off in December, we had submitted 15 proposals for consideration. The TC voted on what to include in version 1.2, and none of the proposals we had submitted made it into ODF 1.2.

This certainly is an interesting statement. There is nothing I can point to that is false here. Everything here is 100% accurate. However, it seems to be reckless in how it neglects the most relevant facts, namely that the proposals did not make it into ODF 1.2 at Microsoft's sole election. It is as if Lee Harvey Oswald had written a note: "Went to Dallas and saw a parade today. Tried to see a movie, but had to leave early. Heard later on the radio that the President was shot". This would have been 100% accurate as well, but not the "whole truth". In any case, the rundown of the facts in this question are on the TC's mailing list.

So what is one to do? You obviously can't trust Wikipedia whatsoever in this area. This is unfortunate, since I am a big fan of Wikipedia. I want it to succeed. But since the day when Microsoft decided they needed to pay people to "improve" the ODF and OOXML articles, these articles have been a cesspool of FUD, spin and outright lies, seemingly manufactured for Microsoft's re-use in their whisper campaign. My advice would be to seek out official information on the standards, from the relevant organizations, like OASIS, the chairs of the relevant committees, etc. Ask the questions in public places and seek a public, on-the-record response. More people are willing to lie than face of consequences of being caught lying. That is the ultimate weakness of lies. They cannot stand the light of public exposure. Sunlight is the best antiseptic.

Labels:

Sunday, May 17, 2009

The Battle for ODF Interoperability

Last year, when I was socializing the idea of creating the OASIS ODF Interoperability and Conformance TC, I gave a presentation I called "ODF Interoperability: The Price of Success". The observation was that standards that fail never need to deal with interoperability. The creation of test suites, convening of multi-vendor interoperability workshops and plugfests is a sign of a successful standard, one which is implemented by many vendors, one which is adopted by many users, one which has vendor-neutral venues for testing implementations and iteratively refining the standard itself.

Failed standards don't need to work on interoperability because failed standards are not implemented. Look around you. Where are the OOXML test suites? Where are the OOXML plugfests? Indeed, where are the OOXML implementations and adoptions? Microsoft Office has not implemented ISO/IEC 29500 "Office Open XML", and neither has anyone else. In one of the great ironies, Microsoft's escapades in ISO have left them clutching a handful of dust, while they scramble now to implement ODF correctly. This is reminiscent of their expensive and failed gamble on HD DVD on the XBox, followed eventually by a quick adoption of Blue-ray once it was clear which direction the market was going. That's the way standards wars typically end in markets with strong network effects. They tend to end very quickly, with a single standard winning. Of course, the user wins in that situation as well. This isn't Highlander. This is economic reality. This is how the world works.

Although this may appear messy to an outside observer, our current conversation on ODF interoperability is a good thing, and further proof, to use the words Microsoft's National Technology Director, Stuart McKee, that "ODF has clearly won".

Fixing interoperability defects is the price of success, and we're paying that price now. The rewards will be well worth the cost.

We've come very far in only a few years. First we had to fight for even the idea and acceptance of open standards, in a world dominated by a RAND view of exclusionary standards created in smoke filled rooms, where vendors bargained about how many patents they could load up a standard with. We won that battle. Then we had to fight for ODF, a particular open standard, against a monopolist clinging to its vendor lock-in and control over the world's documents. We won that battle. But our work doesn't end here. We need to continue the fight, to ensure that users of document editors, you and I, get the full interoperability benefits of ODF. Other standards, like HTML, CSS, EcmaScript, etc., all went through this phase. Now it is our turn.

With an open standard, like ODF, I own my document. I choose what application I use to author that document. But when I send that document to you, or post it on my web site, I do so knowing that you have the same right to choose as I had, and you may choose to use a different application and a different platform than I used. That is the power of ODF.

Of course, the standard itself, the ink on the pages, does not accomplish this by itself. A standard is not a holy relic. I cannot take the ODF standard and touch it to your forehead say "Be thou now interoperable!" and have it happen. If a vendor wants to achieve interoperability, they need to read (and interpret) the standard with an eye to interoperability. They need to engage in testing with other implementations. And they need to talk to their users about their interoperability expectations. This is not just engineering. Interoperability is a way of doing business. If you are trying to achieve interoperability by locking yourself in a room with a standard, then you'll have as much luck as trying to procreate while locked in a room with a book on human reproduction. Interoperability, like sex, is a social activity. If you're doing it alone then you're doing it wrong.

Standards are written documents -- text -- and as such they require interpretation. There are many schools of textual interpretation: legal, literary, historic, linguistic, etc. The most relevant one, from the perspective of a standard, is what is called "purposive" or "commercial" interpretation, commonly applied by judges to contracts. When interpreting a document using an purposive view, you look at the purpose, or intent, of a document in its full context, and interpret the text harmoniously with that intent. Since the purpose of a standard is to foster interoperability, any interpretation of the text of a standard which is used to argue in favor of, or in defense of, a non-interoperable implementation, has missed the mark. Not all interpretations are equal. Interpretations which are incongruous with the intent of standardization can easily be rejected.

Standards can not force a vendor to be interoperable. If a vendor wishes deliberately to withhold interoperability from the market, then they will always be able to do so, and, in most cases, devise an excuse using the text of the standard as a scapegoat.

Let's work through a quick example, to show how this can happen.

OpenFormula is the part of ODF 1.2 that defines spreadsheet formulas. The current draft defines the addition operator as:

6.3.1 Infix Operator "+"

Summary: Add two numbers.
Syntax: Number Left + Number Right
Returns: Number
Constraints: None
Semantics: Adds numbers together.

I think most vendors would manage to make an interoperable implementation of this. But if you wanted to be incompatible, there are certainly ways to do so. For example, given the expression "1+1" I could return "42" and still claim to be interoperable. Why? Because the text says "adds numbers together", but doesn't explicitly say which numbers to add together. If you decided to add 1 and 41 together, you could claim to be conformant. OK, so let's correct the text so it now reads:

6.3.1 Infix Operator "+"

Summary: Add two numbers.
Syntax: Number Left + Number Right
Returns: Number
Constraints: None
Semantics: Adds Left to Right.

So, this is bullet-proof now, right? Not really. If I want to, I can say that 1+1 =10, if I want to claim that my implementation works in base 2. We can fix that in the standard, giving us:

6.3.1 Infix Operator "+"

Summary: Add two numbers.
Syntax: Number Left + Number Right, both in base 10 representations
Returns: Number, in base 10
Constraints: None
Semantics: Adds Left to Right.

Better, perhaps. But if I want I can still break compatibility. For example, I could say 1+1=0, and claim that my implementation rounds off to the nearest multiple of 5. Or I could say that 1+1 = 1, claiming that the '+' sign was taken as representing the logical disjunction operator rather than arithmetic addition. Or I could do addition modulo 7, and say that the text did not explicitly forbid that. Or I could return the correct answer some times, but not other times, claiming that the standard did not say "always". Or I could just insert a sleep(5000) statement in my code, and pause 5 seconds every time the an addition operation is performed, making a useless, but conformant implementation And so on, and so on.

The old adage holds, "It is impossible to make anything fool- proof because fools are so ingenious." A standard cannot compel interoperability from those who want resist it. A standard is merely one tool, which when combined with others, like test suites and plugfests, facilitates groups of cooperating parties to achieve interoperability.

Now is the time to achieve interoperability among ODF implementations. We're beyond kind words and empty promises. When Microsoft first announced, last May, that it would add ODF support to Office 2007 SP2, they did so with many fine words:
So the words are there, certainly. But what was delivered fell far, far short of what they promised. Excel 2007 SP2 strips out spreadsheet formulas when it reads ODF spreadsheets from every other vendor's spreadsheets, and even from spreadsheets created by Microsoft's own ODF Add-in for Excel. No other vendor does this. Spreadsheet formulas are the very essence of a spreadsheet. To fail to achieve this level of interoperability calls into question the value and relevance of what was touted as an impressive array of interoperability initiatives. What value is an Interoperability Executive Council, an Interop Vendor Alliance, a Document Interoperability Initiative, etc., if they were not able to motivate the most simple act: taking spreadsheet formula translation code that Microsoft already has (from the ODF Add-in for Office) and using it in SP2?

The pretty words have been shown to be hollow words. Microsoft has not enabled choice. Their implementation is not robust. They have, in effect, taken your ODF document, written by you by your choice in an interoperable format, with demonstrated interoperability among several implementations, and corrupted it, without your knowledge or consent.

There are no shortage of excuses from Redmond. If customers wanted excuses more than interoperability they would be quite pleased by Microsoft's prolix effusions on this topic. The volume of text used to excuse their interoperability failure, exceeds, by an order of magnitude, the amount of code that would be required to fix the problem. The latest excuse is the paternalistic concern expressed by Doug Mahugh, saying that they are corrupting spreadsheets in order to protect the user. Using a contrived example, of a customer who tries to add cells containing text to those containing numbers, Doug observes that OpenOffice and Excel give different answers to the formula = 1+ "2". Because all implementations do not give the same answer, Microsoft strips out formulas. Better to be the broken clock that reads the correct time twice a day, than to be unpredictable, or as Doug puts it:

If I move my spreadsheet from one application to another, and then discover I can’t recalculate it any longer, that is certainly disappointing. But the behavior is predictable: nothing recalculates, and no erroneous results are created.

But what if I move my spreadsheet and everything looks fine at first, and I can recalculate my totals, but only much later do I discover that the results are completely different than the results I got in the first application?

That will most definitely not be a predictable experience. And in actual fact, the unpredictable consequences of that sort of variation in spreadsheet behavior can be very consequential for some users. Our customers expect and require accurate, predictable results, and so do we. That’s why we put so much time, money and effort into working through these difficult issues.

This bears a close resemblance to what is sometimes called "Ben Tre Logic", after the Vietnamese town whose demise was excused by a US General with the argument, "It became necessary to destroy the village in order to save it."

Doug's argument may sound plausible at first glance. There is that scary "unpredictable consequences". We can't have any of that, can we? Civilization would fall, right? But what if I told you that the same error with the same spreadsheet formula occurs when you exchange spreadsheets in OOXML format between Excel and OpenOffice? Ditto for exchanging them in the binary XLS format. In reality, this difference in behavior has nothing to do with the format, ODF or OOXML or XLS. It is a property of the application. So, why is Microsoft not stripping out formulas when reading OOXML spreadsheet files? After all, they have exactly the same bug that Doug uses as the centerpiece of his argument for why formulas are stripped from ODF documents. Why is Microsoft not concerned with "unpredictable consequences" when using OOXML? Why do users seem not to require "accurate, predictable results" when using OOXML? Or to be blunt, why is Microsoft discriminating against their own paying customers who have chosen to use ODF rather than OOXML? How is this reconciled with Microsoft's claim that they are delivering "choice, interoperability and innovative solutions to the marketplace"?

Labels: ,

Thursday, May 07, 2009

A follow-up on Excel 2007 SP2's ODF support

Wow. My previous post seems to have attracted some attention. When I woke up on Monday morning, made my coffee and logged into to my email, I found out that my geeky little analysis of Office 2007 SP2's ODF support had sparked some interest. I did not intend it to be more than an update for the handful of the "usual suspects" who regularly follow ODF issues via various blogs, many of which you see listed to your right. If I had any foreknowledge or expectation that this post would end up being on SlashDot, GrokLaw, ZDnet, IDG, Reuters, CNet, etc., I would have done a better job spell checking, and maybe toned down the rhetoric a little (just a little).

But this widespread interest in the topic tells me one thing: ODF is important. People care about it. People want it to succeed, and when this success is threatened, whether for deliberate or accidental reasons, they are upset. Although Office 2007 SP2 also added PDF and XPS support, you don't see many stories on that at all.

I've been trying to respond to the many comments by anonymous FUDsters and Fanboys on various web sites where my post is being discussed. However, it is getting rather laborious swatting all the gnats. They obviously breed in stagnant waters, and there is an awful lot of that on the web. Since all links lead back here anyways, it will be much simpler to do a recap here and address some of the more widespread errors.

The talking points from Redmond seem to be consistent, along the lines of:
We did a 100% perfect and conforming implementation of ODF 1.1 to the letter of the standard. If it is not interoperable, then it is the fault of the standard or the other applications or some guy we saw sneaking around back on the night of the fire. In any case, it is not our fault. We just design, write, test and sell software to users, businesses, governments and educational institutions. We have no influence over whether our products are interoperable or not. What effect SP2 has on users or the market -- that's not our concern. Come back in 50 years when you have a 100% perfect standard and maybe we'll talk.

In other words, all of those Interoperability Directors and Interoperability Architects at Microsoft seem to have (hopefully temporarily) switched into Minimal Conformance Directors and Minimal Conformance Architects, and are gazing at their navels. I hope they did not suffer a reduction in salary commensurate with the reduction in their claimed responsibilities.

In any case, their argument might be challenged on several grounds. First up is the question of whether the ODF documents written by Excel 2007 SP2 indeed conform to the ODF 1.1 standard. This is not a hard question to answer, but please excuse this short technical diversion.

Let's see what the ODF 1.1 standard says in section 8.1.3 (Table Cell):
Addresses of cells that contain numbers. The addresses can be relative or absolute, see section 8.3.1. Addresses in formulas start with a “[“ and end with a “]”. See sections 8.3.1 and 8.3.1 for information about how to address a cell or cell range.

And the referenced section 8.3.1 further says:

To reference table cells so called cell addresses are used. The structure of a cell address is as follows:

  1. The name of the table.

  2. A dot (.)

  3. An alphabetic value representing the column. The letter A represents column 1, B represents column 2, and so on. AA represents column 27, AB represents column 28, and so on.

  4. A numeric value representing the row. The number 1 represents the first row, the number 2 represents the second row, and so on.

  5. This means that A1 represents the cell in column 1 and row 1. B1 represents the cell in column 2 and row 1. A2 represents the cell in column 1 and row 2.

    For example, in a table with the name SampleTable the cell in column 34 and row 16 is referenced by the cell address SampleTable.AH16. In some cases it is not necessary to provide the name of the table. However, the dot must be present. When the table name is not required, the address in the previous example is .AH16

So, going back to my test spreadsheets from all of the various ODF applications, how do these applications encode formulas with cell addresses:
I'll leave it as an exercise to the reader to determine which one of these seven is wrong and does not conform to the ODF 1.1 standard.

Next is the question of the relationship between interoperability and conformance. So we are not building skyscrapers in the air, let's start with a working definition of interoperability, say that given by ISO/IEC 2382-01, "Information Technology Vocabulary, Fundamental Terms":

The capability to communicate, execute programs, or transfer data among various functional units in a manner that requires the user to have little or no knowledge of the unique characteristics of those units

I think we probably have a better sense of what conformance is. Something conforms when it meets the requirements defined by a standard.

So let's explore explore the relationship between conformance to a standard and interoperability.

First, does interoperability require a standard? No. There have been interoperable systems without formal standards. For example, there is a degree of interoperability among spreadsheet vendors on the basis of the legacy Excel binary file format (XLS), even though the binary format was never standardized and never defines spreadsheet formulas. Another example is the SAX XML parsing API. Widely implemented, but never standardized. We may call them informal or de facto standards.

Additionally, many standards start out as informal technical agreements and specifications that achieve interoperability among a small group of users, who then move it forward to standardization so that a broader audience can benefit. But the interoperability came first and the formal standard came second. See the history of the Atom syndication format for a good example.

Second, Is interoperability possible in the presence of non-conformance? Yes. For example, it is well known that the vast majority of web pages (93% by one estimate) on the web today do not conform to the HTML standard. But there is a not unsubstantial degree of interoperability on the web today in spite of this lack of conformance. Generally, interoperability does not require perfection. It requires good faith and hard work. If perfection were required, nothing would work in this world, would it?

Third, if a standard does not define something (like spreadsheet formulas) then I am allowed to do whatever I want, right? This is true. But further, even if ODF 1.1 did define spreadsheet formulas you would still be allowed to do whatever you want. Remember, these are voluntary standards. We can't force you to do anything, whether we define it or not.

So what then is the precise relationship between conformance and interoperability? I'd state it as:
In other words, the relationship is due to the efficiency of this configuration to those who wish to interoperate. Conformance is neither necessary nor sufficient to achieve interoperability in general, but interoperability is most efficiently achieved when conformance guarantees interoperability. When I talk about "standards-based interoperability" I'm talking about the situation when you are in the neighborhood of that optimal point.

The inefficiency of other orientations is seen with HTML and Web browsers. Because of the historically low level of HTML conformance by authoring tools and users who hand-edit HTML, browsers today are much more complex then they would otherwise need to be. They need to handle all sorts of mal-formed HTML documents. This complexity extends to any tool that needs to process HTML. Sure, we have a pretty good grip on this now, with tools like HTML Tidy and other robust parsers, but this has come at a cost. Complexity eats up resources, both to coders and testers, but also runtime resources, memory and processing cycles. More complex code is harder to maintain and secure and tends to have more bugs. Greater conformance would have lead to a more efficient relationship between conformance and interoperability.

Similarly, the many years of non-conformance in browsers, most notably Internet Explorer, to the CSS2 standard has resulted in an inefficiency there. From the perspective of web designers, tool authors and competing browser vendors, the lack of conformance to the standards has increased the cost needed to achieve interoperability, a cost transferred from a dominate vendor who chose not to conform to the standards, to other vendors who did conform.

The efficiency of conformance to open standards in particular is the clarity and freedom it provides around access to the standard and the contingent IP rights needed to implement the standard.

So back to ODF 1.1. What is the relationship between conformance and interoperability there? Clearly, it is not yet at that optimal point (which few standards ever achieve) where interoperability is most-efficiently achieved. We're working on it. ODF 1.2 will be better in that regard than ODF 1.1, and the next version will improve on that, and so on.

Does this mean that you cannot create interoperable solutions with ODF? No, it just means that, like most standards in IT today, you need to do some interoperability testing with other vendor's products to make sure your product interoperates, and make conformant adjustments to your product in order to achieve real-world nteroperability. Most vendors who don't have a monopoly would do this naturally and in fact have done this, as my chart indicated. Complaining about this is like complaining about gravity or friction or entropy. Sure, it sucks. Deal with it. Although it may not pay as much as being a professional mourner, work as a programmer is more regular. And giving value to customers will always bring more satisfaction than than standing there weeping about how code is hard.

In any case, this comes down to why do you implement a standard. What are your goals? If your goal is be interoperable, then you perform interoperability testing and make those adjustments to your product necessary to make it be both conformant and interoperable. But if your goal is to simply fulfill a checkbox requirement without actually providing any tangible customer benefit, then you will do as little as needed. However, if your goal is to destroy a standard, then you will create a non-conformant, non-interoperable implementation, automatically download it to millions of users and sow confusion in the marketplace by flooding it with millions of incompatible documents. It all depends on your goals. Voluntary standards do not force, or prevent, one approach or another.

To wrap this up, I stand on the table of interoperability results in the previous post. SP2 has reduced the level of interoperability among ODF spreadsheets, by failing to produce conforming ODF documents, and failing to take note of the spreadsheet formula conventions that had been adopted by all of the other vendors and which are working their way through OASIS as a standard.

If we note the arguments used by Microsoft in the recent past, they have argued that OOXML must be exactly what it is -- flaws and all -- in order to be compatible with legacy binary Office documents. Then they argued that OOXML can not be changed in ISO, because that would create incompatibility with the "new legacy" documents in Office 2007 XML format. But when it comes to ODF, they have disregarded all legacy ODF documents created by all other ODF vendors and take an aloof stance that looks with disdain on interoperability with other vendor's documents, or even documents produced by their own ODF Add-in. The sacrosanctness of legacy compatibility appears to be reserved, for strategic reasons, for some formats but not others. We'll redefine the Gregorian calender in ISO to be interoperable with one format if we need to, but we won't deign, won't stoop, won't dirty ourselves to use the code we already have from the ODF Add-in for Microsoft Office, to make SP2 formulas interoperable with the other vendors' products, to benefit our own users who are asking for ODF support in Office. As I said before, this ain't right.

Labels:

Tuesday, May 05, 2009

OpenDocument Format: The Standard for Office Documents

A belated note that an article of mine on ODF was recently published in IEEE Internet Computing, called "OpenDocument Format: The Standard for Office Documents". I think it is a good introduction to ODF, what it is, where it came from and why it is important. They allow authors to post a copy on their websites. So feel free to link to it, but any redistribution will need to be negotiated with the publisher.

At the same time I've taken the opportunity to put together a new web page of some of my other publications, workshop and conference presentations. I have few others that I want add, once I find them. But this is a start.

Labels:

Sunday, May 03, 2009

Update on ODF Spreadsheet Interoperability

[2009/05/07 -- I've posted a follow up article on this topic which you may want to read]

A couple of months ago I did some experiments on the interoperability of ODF spreadsheets, the theory and practice. In that earlier post I looked at the then current ODF implementations, including:

  1. OpenOffice.org 2.4
  2. Google Spreadsheets
  3. KOffice KSpread 1.6.3
  4. IBM Lotus Symphony 1.1
  5. Microsoft Office 2003 with the Microsoft-sponsored CleverAge Add-in version 2.5
  6. Microsoft Office 2003 with Sun's ODF Plugin
I created a test document in each of those editors and then loaded each test document in each of the other editors. I showed what worked, what didn't, and made some suggestions on how interoperability could be improved. I found only two notable failures, when the Microsoft/CleverAge Add-in for Excel loaded KSpread and Symphony documents. The other scenarios I tested were OK:



Created In






CleverAge
Google
KSpread
Symphony
OpenOffice
Sun Plugin

Read In


CleverAgeOK
OK
Fail
Fail
OK
OK

GoogleOK
OK
OK
OK
OK
OK

KSpreadOK
OK
OK
OK
OK
OK


SymphonyOK
OK
OK
OK
OK
OK

OpenOfficeOK
OK
OK
OK
OK
OK

Sun PluginOK
OK
OK
OK
OK
OK


I lot has happened in the two months since I did that analysis. Several of the applications I tested have been updated:
I haven't been able to get the release candidate of KOffice installed, so I'm still including KSpread 1.6.3 in my tests, but for the rest I have created new test files in each editing environment, saved them to ODF format and then loaded the resulting documents into each of the other editors. From these test documents I was able to perform 42 different test combinations.

I'll explain a bit more how I tested, then give you the table of results, and finally make some observations and recommendations.

The test scenario I used was a simple wedding planner for a fictional user, Maya, who is getting married on August 15th. She wants to track how many days are left until her wedding, as well as track a simple ledger of wedding-related expenses. Nothing complicated here. I created this spreadsheet from scratch in each of the editors, by performing the following steps:

The resulting spreadsheet looks something like this:




Feel free to download a zip of all of the test spreadsheet files. The file names should be self-explanatory.

Here is what I found when I tested the various scenarios:



Created In







Google
KSpread
Symphony
OpenOffice
Sun Plugin
CleverAge
MS Office 2007 SP2

Read In


GoogleOK
OK
OK
OK
Fail
OK
Fail

KSpreadOK
OK
OK
Fail
Fail
OK
Fail

SymphonyOK
OK
OK
OK
OK
Fail
Fail


OpenOfficeOK
OK
OK
OK
OK
OK
Fail

Sun Plugin
OK
OK
OK
OK
OK
OK
Fail

CleverAge Plugin
OK
OK
OK
OK
Fail
OK
OK

MS Office 2007 SP2
Fail
Fail
Fail
Fail
Fail
Fail
OK


So what is happening here?

CleverAge appears to have heeded the advice from my earlier blog post and now correctly processes KSpread and Symphony spreadsheets. This is great news and they deserve credit for that work. But this is a small bit of good news in a table that now shows awful lot of red. Let's see if we can figure this out.

First, some combinations that worked previously, when I tested two months ago, are now not working:

The new entry to the mix is Microsoft Office 2007 SP2, which has added integrated ODF support. Unfortunately this support did not fare well in my tests. The problem appears to be how it treats spreadsheet formulas in ODF documents. When reading an ODF document, Excel SP2 silently strips out formulas. What is left is the last value that cell had, when previously saved.

This can cause subtle and not so subtle errors and data loss. For example, in the test document I presented above, the current date is encoded using the TODAY() spreadsheet function. If the formulas are stripped, then this cell no longer updates, and will return the wrong value. Similarly, if Maya tries to continue her ledger of expenses by copying the formula cells from column E down a row, this will cause incorrect calculations, since there is no longer a formula to copy, so she would just be copying the prior balance. In general, SP2 converts an ODF spreadsheet into a mere "table of numbers" and any calculation logic is lost.

In the other direction, when writing out spreadsheets in ODF format, Excel 2007 SP2 does include spreadsheet formulas but places them into an Excel namespace. This namespace is not what OpenOffice and other ODF applications use. It is not the ODF 1.2 namespace. It isn't even the OOXML namespace. I have no idea what it is or what it means. Not every ODF application checks the namespace of formulas when loading documents, but the ones that do reject the SP2 documents altogether. And the ones that do not check the namespace try and fail to load a formula since it is syntactically different than what they expected. The applications essentially display a corrupted document that is shows neither the formula nor the value correctly. For example, a SP2 document, loaded in MS Office using the Sun ODF Plugin looks like this:




Similar corruption occurs when loading the Excel 2007 SP2 spreadsheet into KSpread, Symphony and OpenOffice. Google doesn't import the document at all.

I must admit that I'm disappointed by these results. This is not a step forward compared to where we were two months ago. This is a big step backwards. Spreadsheet interoperability is not hard. This is not rocket science. Everyone knows what TODAY() means. Everyone knows what =A1+A2 means. To get this wrong requires more effort than getting it right. It is especially frustrating when we know that the underlying applications support the same fundamental formula language, or something very close to it, and are tripped up by lack of namespace coordination. Whether it is accidental or intentional I don't know or care. But I cannot fail to notice that the same application -- Microsoft Excel 2007 -- will process ODF spreadsheet documents without problems when loaded via the Sun or CleverAge plugins, but will miserably fail when using the "improved" integrated code in Office 2007 SP2. This ain't right.

I have some suggestions for how to move things forward again. There will be a lot less red on the above table if two simple changes are made:
  1. Sun should write out formulas in ODF 1.1 format, using the legacy "oooc" namespace prefix that the other vendors are using. Remember, the other vendors are using that namespace specifically for compatibility with OO's ODF documents. This is the current convention. To unilaterally switch, without notice or coordination, to a new namespace, is not cool. When ODF 1.2 is an approved standard, then we all can move there in a coordinated fashion, to cause users minimal inconvenience. But the above table clearly shows the confusion that results if this move is not coordinated. I know OO 3.01 has an option to save in ODF 1.0/1.1 format. IMHO, this setting should be the default. I'm not sure if the Sun Plugin has a similar configuration option, but I hope it does.
  2. In addition to writing out compatible formulas as per the above comments on the Sub Plugin, Microsoft should remove the code in SP2 that causes it to reject every other vendor's spreadsheet documents. Give the user a warning if you need to, but let them have the choice.
Finally, let me try to anticipate and debunk some of the counter-arguments which might be raised to argue against interoperability.

First, we might hear that ODF 1.1 does not define spreadsheet formulas and therefore it is not necessary for one vendor to use the same formula language that other vendors use. This is certainly is true if your sole goal is to claim conformance. If your business model requires only conformance and not actually achieving interoperability, then I wish you well. But remember that conformance and interoperability are not mutually exclusive options. An application can be conformant to a standard and also be interoperable, if you use the legacy formula namespace and syntax. So the desire to be conformant is not an excuse for not also being interoperable, or at least not a valid excuse. One might also wryly note that Microsoft has several Directors of Interoperability, not Directors of Minimal Conformance, and they workshops are called Document Interoperability Initiatives, not Minimal Conformance Initiatives. The difference between minimal conformance and interoperability is well illustrated in these tests.

Remember, it is not particularly difficult or clever to to take an adverse reading of a standard to make an incompatible, non-interoperable product. Take HTML, for example. It does not define the attributes of unstyled (default) text. So I could create a perfectly conformant browser implementation that makes all default text be 4-point Zapf Dingbats, white text on a white background. It would conform with the standard, but it would be perfectly unusable by anyone. If you try hard enough you can create 100% conformant, but non-interoperable, implementations of almost most standards. Standards are voluntary, written to help coordinate multiple parties in their desires for interoperability. Standards are not written to compel interoperability by parties who do not wish to be interoperable.

(A side point is that SP2's implementation of ODF spreadsheets does not, in fact, conform to the requirements of the ODF standard, but that is another story, for another blog post.)

We might also hear concerns that supporting other vendors' ODF spreadsheet formulas cannot be done because this formula language is undocumented. The irony here is that the formula language used by OpenOffice (and by other vendors) is based on that used by Excel, which itself was not fully documented when OpenOffice implemented it. So an argument, by Microsoft, not to support that language because it is not documented is rather hypocritical. Excel supports 1-2-3 files and formulas and legacy Excel versions (back to Excel 4.0) neither of which have standardized formula languages. Why are these supported? Also, the fact that the Microsoft/CleverAge add-in correctly reads and writes the legacy ODF formula syntax shows not only that it can be done, but that Microsoft already has the code to do it. The inexplicable thing is why that code never made it into Excel 2007 SP2.

We'll probably also hear that 100% compatibility with legacy documents is critical to Microsoft users and that it is dangerous to try to save Excel formulas into interoperable ODF formulas because there is no guarantees that OpenOffice or any other ODF application will interpret them the same as Excel does. So one might try to claim that Microsoft is protecting their customers by preventing them from saving interoperable spreadsheet formulas. But we should note that fully-licensed Microsoft Office users have already been creating legacy documents in ODF format, using the Microsoft/CleverAge ODF Add-in. These paying Microsoft Office customers will now see their existing investment in ODF documents, created using Microsoft-sanctioned code, get corrupted when loaded in Excel 2007 SP2. Why are paying Microsoft customers who used ODF less important than Microsoft customers who used OOXML? That is the shocking thing here, the way in which users of the ODF Add-in are being sacrificed.

If you are cynical, you might observe that if Excel 2007 SP2 allowed Microsoft/CleverAge ODF Add-in formulas to work correctly, then SP2 would need to allow all vendors' formulas to work, since the other vendors are using the same legacy namespace. The only way for Microsoft to make their legacy ODF documents work and to exclude other vendors would be (hypothetically) to specifically look in the document for the name of the application that created the document, and allow their ODF Add-in but reject OpenOffice, etc. IANAL, but I think something like that would look very, very bad to competition authorities. So the only way out, if your goal (hypothetically) is to avoid interoperability, is to sacrifice your existing Office customers who are using the Microsoft/CleverAge ODF Add-in. It serves them right for not sticking to the party line in the first place. This'll teach 'em good.

Of course, I am not that cynical. I was taught to never assume malice where incompetence would be the simpler explanation. But the degree of incompetence needed to explain SP2's poor ODF support boggles the mind and leads me to further uncharitable thoughts. So I must stop here.

As I mentioned before, this is a step backwards. But it is just one step on the journey. Let's look forward (and move forward). This is just code. Code can be fixed. We know exactly what is needed to have good interoperability of spreadsheet formulas. In fact most of the code already exists for this. The only thing we need now is to actually go do it and not get too far ahead, or lag too far behind from the other implementations. This is more a question of timing and coordination than hard technical problems.

[2009/05/07 -- For more on this topic, see my "A follow-up on Excel 2007 SP2's ODF Support"]

Labels:

Tuesday, April 21, 2009

Shooting Daffodils

I like daffodils. I've been planting a couple hundred additional bulbs each fall, so that now I have a lovely spring-time display, right around this time.

In past years I would walk through the garden and take a photo here and there, mainly while standing, shooting straight down, not paying particular attention to the lighting or the composition. Flower "mug shots" I'd call them. Then last year, I started doing macro (close-up) photography. Although the results were technically adequate -- sharp, detailed closeups -- they were...well... rather dull, symmetrical and artless.

This year I've decided to try something different. I realized that a flower can be posed like a person. I guess that is obvious in retrospect, but it never occurred to me before that the poses of classical portraiture, like the 2/3 view, over-the-shoulder, profile view, etc., apply to flowers as well as people. And you don't need to show all of the flower. A close up of part of it can also be interesting.

I've also worked to improve my technique, shooting with a tripod and remote trigger, using the McClamp to steady and isolate the blossoms in the field, using small erapertures to get greater depth of field, locking the mirror up before shooting to reduce any residual camera shake, shooting on days and at times where harsh shadows can be avoided, etc.

Here are three examples, intimate portraits, all shot on location in my garden. You can view more on my Flickr page.

Daffodil

Daffodil

Daffodil

Labels: ,

Monday, April 13, 2009

A time for decision

April 15th is Tax Day in the United States, the day by which we must file our income tax returns for 2008 and pay any balance due.

The day before, April 14th, is also a day of reckoning, with another outstretched hand asking for money. This is the day which marks the end of "mainstream support" for Microsoft Windows XP and Office 2003. After this date, licensed owners of these products will no longer receive free support and updates.




Depending on how consumers respond, one of three things will result from this end of life.

  1. Users migrate to Vista/Office 2007
  2. Users stay on unsupported Microsoft products for the near term and wait for Windows 7/Office 14 to come out in 2010.
  3. Users take the opportunity to evaluate the available alternatives, including open source.
Since Windows XP is the most widely-deployed version of Windows, and Windows is the most-widely deployed operating system in the world, a lot of licenses will be up for grabs as IT shops decide what to do next. Especially in these tough economic times, upgrading to Vista just to see Vista become obsolete in less than a year doesn't make sense. But neither does remaining on an unsupported version of Windows.

This is a significant opportunity for alternatives, such as Linux and other open source applications, to increase their representation on the desktop. We should spend the next nine months making it especially easy for Microsoft's seemingly unwanted and expendable Windows XP and Office 2003 customers to migrate to better alternatives. The Windows/Office release calendar and economic conditions have combined to make this a huge upgrade cycle. In 2010 almost everyone will be looking to upgrade. An opportunity like this does not come every year. Let's make the most of it!

Labels: , ,

This page is powered by Blogger. Isn't yours?