« March »
SunMonTueWedThuFriSat
 123456
78910111213
14151617181920
21222324252627
28293031   
       
About
Categories
Recently
Syndication
Locations of visitors to this page

Powered by blojsom

Radovan Semančík's Weblog

Tuesday, 23 February 2010

Simplicity is an important architectural feature. Simple systems are easier to understand. Simple component has fewer moving parts and therefore fewer reasons to break. Simple systems are easier to change and maintain. Yet our world is a complex thing and we, naive human beings, are often making it even more worse. We are creating legislation that no common citizen can understand unless he spends at least 5 years of his life attending a law school. We are creating all kinds of rules and regulations starting ranging from non-formal social norms all the way to multi-national treaties.

Software systems are here to help us deal with the complexity. But that necessarily makes them complex. It is not the programming language or operating system that makes software complex. Novice programmers can handle technological problems quite quickly (ever seen "Teach yourself Java in 21 days"?). The difficult problem is not "how to build it". The real problem is "what to build". The environment, user expectations, past and future - these are the primary sources of complexity.

What happens if you try to solve complex problem with a simple solution? It may work - if you are genius and you figure out something that generations before you missed. But honestly most of us are not geniuses. More likely outcome is that a simple solution for a complex problem will not work. It will break in spectacular and unexpected ways. Why is that?

You start to analyse the problem, find the important pieces, the "essence". Once you have that you design it, implement it, test it, reimplement it, test it, deploy it ... and you discover that the "essence" was not enough to solve the problem. I mean the real problem of human users, not the usual problem of "how to pass acceptance testing and get the money". Therefore you rethink the problem, discover that the essence is much more complex than you expected. Most project now just start to complicate the think that they already implemented. The system get more and more difficult to develop and no amount of refactoring seems to be enough. The system more and more looks like some devious creation of Dr.Frankenstein. An abomination. Such project will run high over the budget and miss all the deadlines. And project goals change as well. The original goal of solving the real problem of system users is quickly transformed to "just finish it, get the money and get out of here".

Initial definition of the problem and scope of the project is essential. The specification needs to have a correct breadth. Specification of a problem that is too complex will make the project extremely expensive just to find out that users are using only 20% of system functionality. Specification of a problem that is too simple will lead to a system that fails to satisfy the users and is unusable in practice. The specification also needs to have a correct depth. Waterfallist-like 1000 pages of analytical documentation is a pure nonsense. Nobody will read it and it creates analysis-paralysis situation. Agilist-like one-liner "the system must work" will not do either. It will send the project into an endless loop of refactoring, competing project goals and divergence.

Small problems, inaccuracies and omissions in initial project specification are easy to fix during the project. But if the breadth and depth of the specification is wrong, the project is doomed to failure from the very beginning. If the goals are wrong is it a success if such goals are reached?

Posted by rsemancik at 9:16 AM in Software
Tuesday, 26 January 2010

SOAP is one of the prominent protocols for remote procedure invocation (RPC). It can do more than that, but it is used almost exclusively for RPC. More specifically it is used for RPC across the Web, both internally and externally. It is used on the Web so frequently that most people working with SOAP do not even realize that it can be used without HTTP and in non-RPC way.

SOAP by itself is quite simple XML-based message format. However it is accompanied by army of profiles, recommendations and especially a set of so called WS-* specifications. That creates a "SOAP stack" that is quite complex. This labyrinth of specifications is an attempt to solve the qualities of SOAP such as security, reliability, addressing, distribution of policies, etc. It makes SOAP quite a flexible mechanism. But ...

Flexibility does not come for free. Until very recently the price was paid by suffering all kinds of interoperability problems. It was so severe that a special organization was established to improve interoperability. Now basic SOAP implementations work together acceptably well, but the situation is not that good for various WS-* extensions. It will take a lot of effort to make implementations fully interoperable. But that is not a problem of SOAP itself. It is an inherent cost of complexity and distribution.

SOAP is not the first "fabric" of distributed systems. There was CORBA before SOAP and Sun-RPC before CORBA to name just a few of many existing mechanisms. However, the designers of SOAP failed to learn from the past. The intent of SOAP was to simplify things. But SOAP stack is now almost the same complexity as CORBA was ten years ago. SOAP is XML-based and with HTTP it can pass easily through firewalls (that are broken anyway). But that's almost the only advantage over CORBA. And now about the drawbacks ...

The most serious failure of SOAP design is the lack of support for object orientation. SOAP is not about invoking methods on objects, it is about invoking operations of static services. Objects cannot be arguments in SOAP messages, cannot be returned from operations and there is no support for object references. All of that was a fundamental part of CORBA, yet there is no concept of objects in SOAP. In fact it is an odd joke to call it Simple Object Access Protocol - as it is definitely not about objects and either not simple or not a protocol (depends on your point of view).

SOAP is also not outright compatible with World Wide Web architecture. Web is based on REST style that defines few basic operations that should be common for all services. SOAP services can use arbitrary operations without any link to the operations of REST. REST architecture also naturally assumes object orientation - web resources are (almost) objects. SOAP does not deal with objects at all. Therefore applicability of SOAP on Internet scale is quite a controversial topic.

SOAP is good in the enterprise and in quite closed environments where interoperability can be assured by testing. SOAP with WSDL has quite a strong interface definition mechanism. It is a rare trait for a technology born on the Internet and it is a necessary condition for composing complex systems. SOAP is also almost the only option for integration, as CORBA is dead and asynchronous mechanisms are seen as too complex and unnecessary by buzzword-driven integrators. If we will be lucky enough, SOAP may eventually get to the state where CORBA was a decade ago ... I can almost hear the melody ... just little bits of history repeating.

Technorati Tags:

Posted by rsemancik at 9:34 AM in Software
Monday, 25 January 2010

RESTful web services are seen by many (especially young) developers with almost religious awe. Such services are built using standard HTTP protocol with usual HTTP methods as operations. RESTful web services have no arguments, they GET, PUT, POST and DELETE resource representations. The resources are identified by URLs that are also used for links among resources. Such an approach requires a fundamental change of mindset when compared to a more traditional RPC-style of building services. But that is not really a problem: most simple services can be acceptably well modeled using the RESTful approach. The problem is not in the functional aspect.

The problem is, as usual, in the tricky non-functional aspect. Web services are mechanism for communication between computers, but the Web was designed for human-to-computer interactions. Many issues appear from the blue if the Web is used for something that it was not really designed for. Let's have a look at security aspect of RESTful web services as an example.

It is difficult to authenticate invoker of the service to the provider. There are two authentication mechanisms for HTTP (basic and digest), but these are design for interactive human-to-computer authentication. HTTPS in mutual authentication mode provides another solution. This can be non-interactive, but is quite hardcoded to X.509. Under normal circumstances it can authenticate two sites to each other. What would a service need is to authenticate user to the site. If you want to authenticate user on the client side to server, you can still do that with somehow non-typical use of X.509. In that case each client site must be a certificate authority. However as certificate constraints are not well supported, root certificate authorities are not likely to issue certificates that allow creating subordinate certificate authorities to clients.

But even if HTTPS/SSL/X.509 can be fixed, it will most likely not solve the problem. I doubt that X.509 can be flexible enough to support broad variety of security schemes that Internet-wide technology requires. And the flexibility comes with a cost: interoperability. The people working with enterprise PKI know how difficult is to achieve interoperability of different X.509 implementations, and that is miles away from Internet scale. There was only a slightly improvement in two decades of X.509 existence therefore there is little hope that X.509 will be the right solution for the Internet.

There are (relatively) new security mechanism out there, but these apply more to the RPC-style web services. WS-Security and SAML are good examples. WS-Security specifies a header to SOAP request that contains security credentials. SAML specifies protocols and security token applicable for various scenarios, including Internet-scale single-sign-on and federation. However it is difficult to use SAML with RESTful web services. SAML tokens are usually many lines of XML code. In SOAP there is a place for the token in message header, but there is no such place in HTTP. I don't think that placing few kilobytes of XML data in custom HTTP request header is ideal solution. If that would work at all it will be a non-standard hack. And there is no other place in HTTP GET request for such data. There is a way how to shorten SAML token into a few bytes of SAML artifact. But artifact resolution requires additional round trip. In fact several round trips as a new TCP connection (and most likely also SSL handshake) is usually required. It also requires active client being able to listen for connections and maintenance of state on client side. There is also a question how to pass the artifact to the server. The usual way of putting that in the query string is a violation of REST principles, therefore the result will likely be non-standard solution or broken architecture.

The situation is quite similar for many other non-functional aspects. It is difficult to guarantee consistency, atomicity and coordination of RESTful web services (e.g. make them part of a transaction). As URLs are both service endpoints and object identifiers, it is difficult to move service without breaking compatibility. There is no practical interface definition language and interoperability guidelines. Each definition of RESTful service is a free-form text for humans to read and implement with very limited possibility for code generation ...

I'm not trying to tell that all that is RESTful is useless. Both REST and RESTful web services can be very useful, especially with services that shoot for Internet scale. RESTful web services undoubtedly have many advantages but also many limitations. Standard RESTful web services are not yet ready for anything but very simple public services - for that RESTful solution could be ideal. However RESTful approach fails if service quality is important. Custom non-standard solutions can help a bit, but these have their own dangers, especially if the goal is to create interoperable Internet-scale services.

Engineering is not religion and technologies should be assessed with sceptic eye. An engineer that designs anything RESTful should be well aware of the limitations of REST and Web instead of blindly following the hype.

Technorati Tags:

Posted by rsemancik at 12:43 AM in Software
Monday, 18 January 2010

The world is not an objective place. There seems to be no single point of view, no absolute truth. There is only a little piece of information that could be regarded as reliable - an information that is well summarized by the famous cogito, ergo sum. All the rest is, more or less, speculation.

Consider some quite distant land, for example Antarctica. Have you been there? Have you observed it personally? Most probably you haven't. All you know about Antarctica is second-hand information. They say that strange birds that cannot fly live in Antarctica. Penguins, that's how they are called. Would you believe that? Yes, you probably would. Have you heard that Yeti recently moved to Antarctica? Would you believe that? You probably wouldn't. Both "here be penguins" and "here be Yeti" are information. These are not facts, but mere information. It is the belief that makes them into facts.

But even things and phenomena that you personally observe cannot be automatically regarded as true. Think of David Copperfield and Statue of Liberty. People have witnessed how the statue disappeared. Yet, if you were one of them, would you believe that huge steel-and-copper statue has really ceased to exist for those few moments? Probably not. How many times have you seen pretty ladies sawn in half, disassembled into pieces and reconnected again or levitating freely in the air? What we see may not be what it appears to be. I'm sure you will be amazed by this excellent performance by my favorite duo Penn and Teller.

Think about your date of birth. Do you think that the date of your birth is an unquestionable fact? Not really. You were there when you were born, but you probably do not remember it. And you was quite incapable of checking the date for yourself at that time. Therefore you date of birth is just an information. It comes from quite a trusted source but, strictly speaking, it is not unquestionable.

Any information must be regarded with an appropriate level of confidence. You will probably not really doubt your date of birth, therefore the level of confidence is very high. You believe in that information. But you will probably doubt that Yeti lives in Antarctica (everybody knows that Yeti lives in Himalayas). Therefore a level of confidence for that information is low. You do not believe in that. However, you may slowly increase the level of confidence as more and more expeditions will report encounters with Yeti in Antarctica. As it goes beyond a certain threshold you may start believing that. And once the popular press brings a convincing evidence that what was considered to be Yeti was in fact a mutated giant rat from Mars transported to Antarctica for the sole amusement of penguins by the four headed hyper-intelligent lizards of Sirius IV, you may quite stop believing in Yeti.

Seems pretty obvious, isn't it? Now how is it related to software?

Software is all about information. However, overwhelming majority of software systems have no ability to be "somehow inclined to believe in Yeti" or "quite doubt that Yeti has moved recently". Most software systems have only one level of confidence: fact. That was not a problem when the information systems were small and disconnected. A user working with a specific system was somehow aware what is the reliability of information coming from that system. The user either knew how the system worked or slowly learned how reliable the information is by confronting it with reality. The user as a thinking human being is correcting inability of computer system to deal with uncertainty.

But such a simple approach will fail in case of global distributed hyper-connected information super-highway such as the Web or Semantic Web. Users don't know how the displayed information was acquired and processed and usually have no time spending few years confronting the information with reality. Users of the Web have no way how to asses reliability of information they see. The simple binary model of true-false will not work in this environment. Any system using such binary model that includes computer-to-computer communication on a large scale is doomed to failure.

I quite believe that the future is not really bright for Internet-scale web services and semantic web. Unless they can learn how to doubt.

Technorati Tags:

Posted by rsemancik at 1:29 PM in Software
Wednesday, 13 January 2010

I was surprised to find out that not many people can create good abstraction. Many people are good in thinking about concrete objects and problems, but only a few of these can think abstractly. We in the software industry are forced to think abstractly from the very beginning, as software itself is somehow abstract. However when it comes to creating higher-level software abstractions, people often fail.

Interfaces are probably the most significant abstractions in software. Interfaces are formed from programming languages constructs, network protocol messages, states and sequences, signals, file formats, XML tags and many other elements. Interfaces provide a basic mechanism that an architect can use to exercise control over the system. Interfaces are powerful tool to contain change, to enhance reusability, to make the system more understandable and manageable. Yet too many interface definitions are weak, imprecise, incomplete or outright misleading.

During the course of several years I found myself gradually compiling a list of items that need to be included in a good interface definition. Recently I have found the time to put it into a document, add some explanation and examples. The result is here:

Interface Definition, Guidelines and Recommendations

I have decided to publish it under Creative Commons Attribution license (CC-BY) so you can freely use it in your project as long as you give me a proper credit. I recommend you to copy and paste parts of the document to create a guidelines suitable for your project. I hope that this helps many people to improve the skill of creating abstractions.

Technorati Tags:

Posted by rsemancik at 3:57 PM in Software
Friday, 16 October 2009

Now I will disclose one of the most secret of secrets of software business: All software is bad. Except maybe for very rare specimens that are even worse. It does not matter whether it is commercial or open-source, young or mature, big or small, it is bad. The quality is universally low. I'm not talking just about the bugs, but about all software qualities: performance, scalability, understandability, flexibility, visibility, reliability and security.

I cannot remember if I have seen good software in my entire career. I mean a software that was appropriate for the purpose. Software that worked as expected. Worked not only on the day one, but even 10 years later. Software that could be evolved without undermining its basic architectural principles. Software that was intuitive and easy to use, well documented, secure, ...

The reason for this situation is not technological. It is not that we software engineers are ... ehm ... idiots. We are not. We are doing our work well, considering the circumstances. The reasons are purely economical. It is just not profitable to create good software. Bad software can be very successful on the market. Quality is not high priority when making software purchasing decisions, but features are. Quality is difficult to understand and it is usually not directly visible. However features are outright visible and can be presented in in an impressive way. Quality can be usually seen only after the system survives first few years under production load. That's the point where the defects will manifest themselves, usually in a spectacular way. But at that point the software is already purchased and strongly hardwired in place.

From this point of view it is just a plain waste of money to invest in quality. Increased quality will not increase software sales and therefore not increase profits of software companies. In fact it may even harm their business: higher quality means lower motivation to purchase support services. Quality is not a competitive advantage. Spending more money on quality is a competitive disadvantage.

Therefore, caveat emptor.

Technorati Tags:

Posted by semancik at 12:01 PM in Software
Tuesday, 2 June 2009

Just a few days ago Google launched Wave. The demo is a fun to watch. The technology seems quite impressive, even in this early stage. I went through all the documents and here are my impressions.

First of all it is obvious that Google Wave is still in early development stage. The most obvious signs of that are in the area of architecture and its documentation. The terminology is inconsistent and often quite confusing:

  • It is not clear what is the difference between wavelet and wavelet copy. For example there is a statement "... local wavelets are those created at that provider ...". local wavelet are in fact wavelet copies, can they be "created" at a provider? Or only a wavelet can be "created" and the copy is just a side-effect of that?
  • Local wavelet: is it local to the client? local to provider? And it is a wavelet copy after all. It should be called "Local wavelet copy" or "provider-local wavelet copy". Or is "local wavelet" really a "wavelet" and only a remote wavelet is a wavelet copy?
  • What means "processing a wavelet operation"? It is changing state of wavelet? Or wavelet copy?
  • The "frontend" component that is mentioned in federation description is not mentioned in the "cient-server protocol" document, alhough the fereration is referencing the other document for more details on that.
  • It is not defined what "WSP" means (in protocol specification)
  • The developer's guide mentions conceptual hierarchy: wave-wavelet-blip-document. Other documents does not mention blips (almost) at all. This terminology or consistency problem needs to be cleaned up.

Some documentation sections are far from being clear. For example the sentence "In the same way a user can submit operations to a remote wavelet, namely by letting the federation proxy connect to the remote federation proxy and submit the operation to its wave server." should obviously be "... letting the federation proxy connect to the remote federation gateway ...". Or not? In previous text it is stated that proxy connects to the gateway, there is no mention of proxy connecting to proxy or gateway connecting to gateway. However the next paragraph also describes gateways connecting to each other. This needs to be cleaned up or clarified. Conceptual sequence diagrams would help a lot.

The protocol and architecture description needs more pictures. Much more pictures. I would suggest creating figures to illustrate at least following concepts:

  • Architecture big picture, showing all the high-level system components, illustrating their roles and interactions. This is almost mandatory to any architectural description. I'm surprised it is missing.
  • Some kind of deployment diagram: How wavelet store, wave server, federation proxy and gateway relate to each other? Are they under the same organizational control (site) or not? It is important to understand that federation is really a federation.
  • Sequence diagrams that illustrate basic communication exchanges. The single attempt on sequence diagram that I've found is not really sufficient to describe massively parallel, distributed, federated, real-time, open and insert-you-favorite-buzzword-here system.
After reading all the documents I could understand how the system works. But I was working on a system that was somehow similar and this whole concept is not new to me. However, the description is far from being easy to read. If Google's intent is to gain community support, the readability and understandability of the architecture has to significantly improve.

Apart from formal inconsistencies and difficulties to understand the architecture, there are deeper concerns. The architecture seems to be problematic in few aspects.

Google Wave architecture does not adhere to architectural best practice. It is not minimal. The robots are described to communicate with Wave by HTTP/JSONRPC (robot is server), Client apparently communicates by HTTP (as AJAX application?) , while the wave federation protocol is described as XMPP-based. Why do we need so many protocols? Is there any reason why robot protocol and client-server protocol needs to be different? The non-minimalistic approach can be seen in the OT operations as well. The antidocumentelementstart and endantidocumentelementstart operations seems redundant to me. If they are not redundant, their existence should be explained in the architectural documents.

I'm a bit afraid of Google Wave scalability. Persistent queues are used in federation gateways. This may mean too much state to maintain, too much I/O operations, too much context switches in implementation. It may scale to several hundreds of interconnected nodes, but the scalability to an Internet scale is questionable. Similar concern may apply to use of digital signatures for authenticating wavelet operations may be too expensive. Even though hash trees are used, I wonder how this could scale with millions of users writing in real time. If would be nice to have empirical data on scalability of these mechanisms before going on with the prototype, especially considering that these mechanisms determine some of the basic properties of the system and the protocols.

The documents does not mention failure cases. While designing an distributed system of this scale, the failure cases are as important as positive use cases. How will the system be affected if one of the wavelet-hosting servers will not be available? What happens if master server for a wave goes down? And can the system reliably work on Internet links with quite high latencies and low reliability?

There is a question of trust infrastructure. The trust infrastructure is not considered in the Google Documents or in the paper draft by Kissner and Laurie. The XMPP specification (RFC 3920) also pushes the trust infrastructure outside of the specification scope. I can feel that TLS/X.509/DNS combination is somehow (almost silently) assumed. But for Wave to be used as an ubiquitous system, such infrastructure must exists and be universally available. Will CAs offer Wave (XMPP) certificates? What CAs will Google accept? Cannot that lead to monopolization? How much will such certificate cost? Will not that be kind of a ransom that a site must pay to be able to participate?

Wave is changing paradigms. People can no longer take back what is released. Even if someone deletes part of the document, the deleted part can be seen in playback. While this "permanent memory" was there almost since the beginning of the Internet, it was never before real-time. How could we take back an information from a Wave? Imagine you have misplaced your password to the wave instead of password input box. It will always be visible. OK, I could change my password, but what about unfortunate copy&paste event with a credit card number?

But the worst architectural deficiencies of Google Wave go even deeper: Wave is not aligned with WWW architecture and the specific nature of user identities is not considered.

Let's for a while abstract from all the deficiencies of WWW architecture itself and let's agree that, for better or worse, the WWW architecture is still useful. According the the WWW architecture agents should provide URIs as identifiers for resources. Waves, wavelets, blips and documents can definitely be regarded as resources, however Wave architecture does not assign URIs to them. Wave specification uses QNames (XML namespaces) a lot, however it does not provide QName to URI mapping as it should. Some problems of Wave architecture are caused by XMPP, such as violation of URI opacity and URI reuse. The very nature of Wave goes against REST. REST assumes stateless interactions, while Wave is inherently stateful. I don't blame Wave for all those problems. WWW architecture, XMPP and REST can be guilty as well (as they are). But I would expect discussion of these problems in the Wave architectural description and reasoning behind the decision that were taken while designing Wave.

Wave does very little to consider user identities. The demo seems to use only a simple drag&drop from contact list. But how will these contact lists get constructed and maintained? All the documents seem to assume that the use of email-like identifier for users. Will this be global? With all the unfortunate consequences? Could Wave avoid linking user activities at different sites? Does it support pseudonyms, pair-wise identifiers, user privacy controls, anonymous groups, or anything that can support user control over their personal data? How does Wave plans to defend against spam and phishing? Or do they expect that this will not apply to Wave? That would be really naïve.

My last concern is about Wave maintainability. It is only half of the success to create a system - the second half is to keep it operational and efficient. How could Wave handle change of domain names of participants? Would they loose all old waves and drop off their friend's contact lists? Domain names are human-readable, and they do change occasionally. Can Wave handle changes in network topology? For example moving wave servers here and there without service interruptions? Merging two servers or splitting a server to several boxes? To several organizations? Mergers, acquisitions and re-orgs happen all the time ...

Even though I must be making an impression that I do not like Wave, that's not true. Quite the contrary. I was having fun watching the demo, I like the idea and I think that the overall concept is good. However creating useful, reliable, real-time and Internet-scale system is really a major challenge. The Wave team is obviously up to that challenge. But there is still a long way to go. I would conclude that it is critical for basic architectural concepts of Wave to be sorted out as soon as possible, especially the alignment with Web architecture and concerns related to user identities. Wave technology is undoubtedly attractive and it will be most probably very successful. However it can be really harmful for the whole Internet if the Wave would be deployed in this form, with all the architectural deficiencies. If that would happen, the whole system will fall apart in few years or decades and it will block further innovation in this area.

Technorati Tags:

Posted by semancik at 7:31 PM in Software
Friday, 6 February 2009

I was reading about that quite rarely. But it has spread like epidemic lately. I can see it now in almost any text about SOA, WOA, REST or SOAP. I'm talking about a phenomenon that I call the fallacy of mixing abstraction layers.

Have you ever heard proposition that if we all implemented SOAP then we all can communicate without problems? Such proposition is only partially true. And it is altogether false when talking about the business services layer of architecture. Why is that?

SOAP is just another network protocol. In OSI model it would belong somewhere to presentation layer. Because SOAP is just a protocol, it does not assign any business meaning to individual operations or arguments. It can define that an argument is a floating-point number. It can even define that the argument is named "balance". But is it balance of checking account? Or of some other financial product? Or is it a subjective value of balance of good and evil in the world? And even if could somehow figure out that it is a balance of checking account, many questions still remain: what is the precision and rounding rules? Can it be negative? Are there any upper and lower bounds? And even if that can be answered, then we get to the problems with operation using that argument: what are the consistency guarantees of the operation, what is the synchronicity? Are there any parallelism limitations? Side effects? Undefined states?

Expecting that SOAP will guarantee application interoperability is almost the same like expecting that IP will guarantee that. If IP could do that, then all interoperability problems would have been solved few decades ago. SOAP is closer to the application (on higher level of abstraction), but it is still not the application itself. Application is using SOAP and we must make sure that applications will use SOAP in interoperable ways. This is in no way specific to SOAP. RESTful services and other low-formality mechanisms exhibit even more interoperability problems, as they lack sufficiently widespread means of schema definition.

Only a good definition of interface on the appropriate layer of abstraction can guarantee interoperability of applications. The appropriate layer is in most cases application layer of OSI reference model (not the application layer of TCP/IP reference model which is in fact just a catch-all layer). The interface should define entities and concepts that are meaningful from business perspective and not only some XML tags and types. The interface definition should include also definition of bindings: it should define how the business concepts are translated to SOAP (or RESTful service), that SOAP in this particular case is using HTTP binding and so on.

The usual rules apply: Architecture first, technologies second. There is no panacea, no magic solution for interoperability. The sooner the people realize that, the sooner will the SOA/SOAP/WOA/REST hype be over.

Technorati Tags:

Posted by semancik at 12:34 PM in Software
Wednesday, 28 January 2009

I've met a good friend today, entirely by chance. We had a coffee and as none of us has a time for a beer-talk anymore, we started the usual rant drinking coffee. We have been talking software quality, especially the quality of software design. We have agreed that the situation is worse than bad. If software really works than it is mostly only a miracle. And even if it works, it works unreliably, insecurely, does not perform, does not scale, cannot be operated and maintained ... or (most frequently) all of that at the same time.

My friend proposed quite a disturbing vision: He imagined that the users will eventually get really angry, they will drag all the software engineers out, get them up against the wall and have them shot. While I can really understand that such a solution may appeal to many (most?) software users, I don't think this affair will get that far. Hopefully.

We have also talked why civil engineering and architecture works quite well while software architectures fail. I think there are two principal reasons:

  1. Civil engineers have hundreds of years' worth more experience than software engineers. They have transformed experience to knowledge and fed knowledge back to practice, gaining more experience. They went through that circle many times, much more than we could even hope for. And even thou there was a major progress in material and building technologies in last hundred or so years, there were only few really revolutionary changes. In software, we have a major "revolution" every decade. Time sharing, parallelism, networking, Internet, Web ... there are only few equivalents to the development of cheap steel in construction technologies. We don't have enough time to learn and gain experience before a new revolutionary technology appears.
  2. Civil engineers and architects have responsibility. We haven't. If civil engineer makes a mistake and big building is about to collapse, he would rather kill himself than to face the consequences. If software engineer makes a mistake, endangering security of millions of people, the company employing that engineer does not even take the time to apologize.

I think that Bruce Schneier is right. We should take responsibility. Yes, the software will get more expensive. And yes, the progress will slow down. But the results may be better. More reliable, secure and efficient software. Maybe we could get even to the ... er ... sustainable progress?

Technorati Tags:

Posted by semancik at 10:55 PM in Software
Monday, 8 December 2008

Last week I was presenting introduction to Pragmatic Software Architecture on a JavaTeam (Slovak Java User Group) event. My presentation went well, but I was surprised by a reaction to a different presentation that described basics of Enterprise Service Bus. One guy stated that SOAP is obsolete technology and that REST is much better. He based that especially on performance advantage of REST.

This is not an isolated statement. I've noticed that especially young engineers tend to prefer REST. These RESTafarians somehow assume that it will save the world, solve all the existing problems in computing starting with integration problems. And with a bit o luck it might even lead to a decent go-playing algorithm. I can understand. I was similarly enthusiastic and short-sighted when I was younger. Now as I'm getting older I'm only short-sighted. And quite skeptic. I don't believe in panacea any more.

Neither SOAP-based SOA nor REST-based WOA are perfect. They both have their problems. Comparing them is like discussing whether a stone or a hammer is better tool for driving in a screw. SOA and WOA are designed for different purposes and different environment. Trying to apply the holy principles of REST in typical enterprise will be a disaster that can be matched only by an attempt to push current SOA to the Internet scale.

The key to a good architecture is appropriateness. There is no one-size-fits-all architecture. Different approaches have different properties and limits. Therefore please stop these religious debates, holy wars and blind defending of sacred positions. We need to improve both SOA and WOA, as neither of them is really usable now. We have a lot to learn from both of them.

Posted by semancik at 3:22 PM in Software
Wednesday, 19 November 2008

It has been in my head for quite a long time. And I've been working hard on that for last few months. The first result came last week. And it was a success.

I'm talking about Pragmatic Software Architecture course, that I've created. I've lectured that for the very first time during last week. The attendees were great. There were engineers both from software development companies and customers, therefore we've discussed both sides of the barricade. This was very enlightening discussion. I was wondering what will be the feedback of the students at the end of the course. Although I've taken great care to be prepared perfectly (254 pages of student guide!) the course underwent the pressure test for the first time and some of the defects usually pops up at that time. But all went perfectly and the feedback was excellent. Much better than expected. Thanks guys!

Why I've done such a foolish thing as creating a software architecture training? First of all because there is a market niche. There is no training or methodology suitable for small-to-medium projects. Therefore I've compiled the knowledge that can help guiding them, as I've spend last 10 years working on these. Secondly I've found out that surprisingly few people are aware about the basic architectural principles. And even those few do not understand why and how these principles work. Which leads to cargo cult architecture or even worse monstrosities, resulting in software abominations instead of software systems. I've leading my students to pragmatism: to understand how the principles work, why are they helpful, what effect on the result will they induce and especially what are the limits and when not to use them.

The course is organized by our very good partner SunEd Consulting. If you are interested and you happen to live in Europe please contact SunEd for information about course availability.

Posted by semancik at 9:48 PM in Software
Tuesday, 26 August 2008
"You know when we were flying and I was worried we might hit something in the storm and you said the only thing we could possibly hit at this height was a cloud stuffed with rocks?"
"Well?"
"How did you know?"
-- Terry Pratchett, The Light Fantastic

I'm a bit worried about all these proponents of The Cloud. Are they OK? Do they know what they are talking about? Maybe they haven't looked at the sky recently.

What I've seen in the sky are just levitating stone slabs being piloted by a well-trained teams of druids from Amazon and Google. If you are up to flying on a broomstick in the storm, be careful to avoid those clouds stuffed with rocks.

Posted by semancik at 11:50 AM in Software
Friday, 1 August 2008

Networks. Enabling interactions between any two nodes. Gaining value from "the Net Effect". There should be no channels in networks. There should be no "third parties" .... or ... should they?

This is ongoing discussion through the blogsphere. Let's abstract for a while from the question whether people need hierarchies or not and whether they are good or evil. Let's look at the problem from purely technical point of view. Let's think a while about implementation of one basic use case without a need for channels, hierarchies and third parties.

Use case: I want to share photos from my vacation with friends.

Solution 1: What I'm doing now is to upload the photos to my private server that I'm running in a broomstick closet. I have a perl script that will take the photos and create nice HTML photoalbum from that. I will send link to that album to my friends.

The Storage Problem: Not many people are crazy enough to run their own servers in broomstick closets. Most of them doesn't even know or care what a server is. Where are they going to "upload" the photos? They may keep it on their laptop or home PC. But then it is too bad for any friend who want to look at them while the PC is down. The core of the problem: Where will the user's state (data) be kept?

Solution 2: OK, let's change the paradigm and think of "sending" the photos to friends instead of publishing them.

The Communication Problem: Then, how would I (in Europe) send a photos to someone in Japan if our computers are online only for a few hours a day (and these does not overlap). We are too used to e-mail system and forgetting that e-mail is being queued by third-parties to make it kindof reliable. And how would I even find out the address of my friend's computer? He may be in Japan today and in Australia tomorrow. How will I know where to send the message? The system now is based on organizations under central control (IANA for IP addresses, ICANN for DNS names) and we are so used to it that we tend to overlook that.

Solution 3: Let's change the paradigm again. Let's think peer-to-peer now. That usually means DHT-based networks that allows communication and data storage without substantial centralized coordination. Let's store that data in the network "fabric" itself.

The P2P Problem: Seems perfect. I will store my data to the "network" and retrieve it anytime and anywhere. But ... who will pay for that service? I hear the idealists say: all users will contribute part of their disk space and CPU power to the goodness of everybody. Being raised under the communist rule I'm naturally suspicious to such claims. But even if people would willing to contribute disk space, would it work? Consider that most "terminal" devices of the network will be laptops, mobile "phones", TV sets, etc. These are not always-on devices. And hearing all these things about energy-saving such devices will probably be "mostly-off". And even if they have a disk space to share, such a storage will be quite non-practical. The inaccessibility of the devices will need to be balanced with massive replication of data. That could mean difficult synchronization of different copies of the same datum. Will such system still be practical? Our current empirical data from peer-to-peer systems are based on networks that are usually used to smuggle illegal data, networks with a good incentive for participants to be up and running most of the time. Can reliable peer-to-peer data storage still be practical in different environment? This is yet to be answered.

(Bonus) The Trust and Privacy Problem: Let's pretend that the problems above can be solved. Now I can publish my photos to some reliable P2P network and send a link to my friend over the same P2P network. My friend will like them and not realizing that some of these photos are quite personal will forward the link to his friends. And the link will spread ... violating privacy. How would I make sure that only my friends can see the photos? Maintaining their accounts and forcing all of them to authenticate? Bad idea. Going for some kind of Single Sign On? Then you need to trust someone telling you that "this is really your friend". Maybe a public key crypto can be used for that. But now you need hierarchy (X.509) or web of trust. And that needs state again - the keys. I don't think people will take their keys with them wherever they go. Store the keys this ideal P2P storage we have? Well, then how would you authenticate to the storage itself? And what happens it you lost the keys? Your entire social network that you were building for last few years would be lost? And ... the best of all: If I see a message from person X that I haven't had any previous interaction with, how do I know that I can trust him? (Remember: we want no third party to make any statements about his credibility). I will need to ask my friends whether their friends have some information about person X. But this actually means that all my friends are "third parties"!

The bottom line is that with great freedom comes a great responsibility. Are people ready to take that responsibility? Is technology ready to support it?

I think that the answer to both questions is: No, not yet. Not anytime soon.

Posted by semancik at 12:34 PM in Software
Tuesday, 6 May 2008

I've just listening to the Identity Bus discussion of five men. It's just going in cycles. It reminds me of all the discussions regarding system integration difficulties that end up in the concept of almighty web services and Enterprise Service Bus concepts. I cannot help myself, but I'm naturally suspicious of all the panacea-like solutions.

All the "buses" reminds me of a solution that I've created to fix my spout, until it will be completely replaced in few weeks. I wanted to connect two rectangular segments at an unusual angle. After a bit of hammering I've got this:

All the "buses" are just that. A duct tape. The best product for temporary fixes ever made. But you cannot really build an infrastructure on duct tape, can you?. How you would make a water supply system for a big city using a duct tape? How long can that last? Can you duct tape an electricity distribution system?

My question is if all these buses go somewhere. What is the systemic solution that we want to achieve? What is our vision? Where we want to go? As the Cheshire Cat observed, if we do not know where we want to go it does not matter which road we take.

Posted by semancik at 5:55 PM in Software
Tuesday, 25 March 2008

Many years ago I've seen an old woman on a big flea market in a little Polish town. She was selling some kind of potion and she was announcing "It can heal all the diseases in the world - except for stupidity".

Recently I've been trying to catch up on a blogsphere after yet another long absence. I've seen that the trend to oversell partial solutions is much stronger than a few months ago. I just want to make something clear:

  • No "Identity Technology" by itself can heal your enterprise of problems that were being swept under the carpet for too long. These problems have to be dealt with, no matter what magical technology use use. I'm talking about chaos in organizational structure, roles and entitlements, incompatible information systems, inconsistent and unreliable data that are corrected on the fly by manual processes, etc. Admit that you have these problems and do not expect (or pretend) that a deployment of some software product will make them disappear.
  • No "User-Centric Identity" will magically turn the Internet into a better and more secure place where the privacy of the people is honored. No. Technology can seldom do that. Only a proper business motivation and legislation can help to achieve that. But even that cannot solve it completely. World is not a kind place. Get over it.
  • No "Web Service Technology" can make everyone cooperate. All the technologies that dance around SOAP are not that new. There is not much improvement over CORBA, Sun-RPC and many other similar technologies. Only standardization and interoperability effort can help, not the technology per se.

What you should do is to think for a while what you really need. And try to think about the solution that you want without all the marketing nonsense. Think about the architecture first. And only then go down to individual technologies. As the wold is far from perfect you will need to adapt your ideas and you architecture to the technologies that are available. But always keep in mind your original goal.

The problem of most of the technological companies is that their goal is to make profit, not to solve your problems. The goal is set by marketing department, not by technologists. And that's natural. They are commercial companies. Therefore you must be the navigator and keep your direction. And be strong.

Think about the Architecture ... and avoid buying magic potions that claim to heal all the diseases in the world - they will not heal stupidity.

Posted by semancik at 12:39 PM in Software