«
February
»
| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
| | | | 1 | 2 | 3 | 4 | | 5 | 6 | 7 | 8 | 9 | 10 | 11 | | 12 | 13 | 14 | 15 | 16 | 17 | 18 | | 19 | 20 | 21 | 22 | 23 | 24 | 25 | | 26 | 27 | 28 | 29 | | | | | | | | | | | |
About
Categories
Recently
Syndication
|
Radovan Semančík's Weblog
Friday, 8 July 2011
I was a very young student when I came across a book named Programátorské poklesky (Programmer's Misdemeanours) by Ivan Kopeček and Jan Kučera. The authors describe in a humorous way what are the results of programming errors. It was probably my very first book about programming that was not a programming language manual. It was a year after our country woke up from the communist era and programming books were difficult to come by. I think the book had influenced me more than I have anticipated or was willing to admit at the time.
One of the parts that I particularly remember was the software "psychology". Authors observed four temperaments of programs:
- Sanguine programs provide readable and helpful error messages, have useful help texts, try to recover from errors and try to communicate reasonably in general. Yet, user interaction is maybe the only useful part of such programs.
- Choleric programs does its job well. Such programs do not crash, but the error messages are very dense and cryptic. They do not provide any additional information and there is no help text. It does not try to recover from errors - it expects that the user will know what to do. Experts find these programs easy to use, but all other people hate it.
- Melancholic programs get very sad when they encounter the smallest of problems. The program just crashes, does not provide any message or description. They refuse to communicate about the problem any further and usually does not even provide a way to resolve it.
- Phlegmatic programs ignore any errors. They just carry no matter the cost. No error message, no indication, it just works on. Of course they may provide wrong results from time to time, but they run. That's the most important thing.
All of that came to my mind as I was discussing the error handling approach in mainstream programming languages (mostly Java). It usually boils down to handling exceptions.
The original approach in Java was to use checked exceptions. Programmer has to either catch them or declare them to be thrown. The authors of Java hoped that it will lead to a better error handling. But it looks like there is a glitch: error handling is very difficult to do right. It takes a lot of time and the error handling code may well be a significant part of the system. This leads to sanguine programs: they provide good information about errors, but they do little else. There is just not enough time and resources to do everything right.
Laziness is one of the three great virtues of the programmer. Therefore programmers soon stared to focus on the "meat" and simplified the exception handling. The easiest way at hand was to ignore all the exceptions. Catch all exceptions and handle them with empty code block. This obviously leads to a phlegmatic program. It will run no matter what happens. But the results may not be the best.
The current trend is to switch all the exceptions to the runtime exceptions. These do not enforce checking and handling. The usual outcome is that nobody checks or handles them. Any exception will bubble up through the call stack to the upper layers until is is caught by the framework. That may be an application server that will display a nicely formatted error message that essentially says "something somewhere went wrong" and terminate the request. The user has no idea what went wrong and where or how to recover from the problem. This is a melancholic program.
Luckily, some programmers display at least the exception type and message to the user. But what will the user do if presented with the message "ConsistencyException: Consistency constraint violated"? It is not really helpful. Most programmers also display or log a complete stack trace. But that won't help the user a bit. Even members of a core programming team have problems understanding that, user does not stand a chance. That gives us a choleric program.
Obviously, one size does not fit all. There is no single right way to do it. If a good error handling is required then a sanguine approach is needed. But there is a cost to pay: either reduced functionality or much more effort to do the "same" thing. Robust system asks for somehow phlegmatic approach while cheap code is best done melancholic. However, the usual approach is choleric code. Errors are reported, but nobody really understands them. You just can't always win.
Friday, 3 June 2011
I'm still quite young and my "professional memory" does not even count two decades. But I just cannot help to see some recurring patterns. Quite a scary patterns.
I was a student when Sun RPC was the cool thing. It has all that a C programmer needed at that time to create a distributed system. But obviously it was too simple.
CORBA was taking the place of the "cool thing" as I was finishing university. It had all that a C++ programmer may wish for to create a distributed system. Interfaces, object-orientation, "interoperable" references, ... But it was obviously too complicated to use.
XML took over during the dot-com bubble. Or better to say it was XML-RPC as a mechanism for Internet-scale distributed systems. It has all that PHP programmer would want. It had the "feature" of seamlessly passing firewalls. It was the cool thing for the Internet. But obviously, it was too simple.
SOAP came shortly after that. The mechanism by which Java and .NET architectures promised to bridge enterprise and the Internet. Originally designed as simple thing to do something with objects. It ended up as a maze of WS-WhatEver specifications that are far from being simple and actually have nothing to do with objects. This is obviously too complex to use.
RESTful religion is the current trend with JSON as its holy prophet, worshiped by the scripting crowd. It is based on an idealistic and internally inconsistent principles of Web Architecture with a loud promise of simplicity. But obviously, this is too simple to be practical.
Now we see JSON schema, namespaces, security and actually all the things that we have already seen in SOAP/WS-* and CORBA. I expect we will see a formal RESTful interface defintions soon. Will this be too complex to use, again?
What we see are cycles. Each new generation of engineers is re-inventing what the previous generation has invented, making all the mistakes all over again. Can this eventually converge? How long are the customers going to tolerate this? And what we really know about distributed systems?
Sorry guys. I just refuse to participate in this insanity.
Tuesday, 26 April 2011
Service Oriented Architecture is not a bad idea. Quite the contrary. What is bad about it is its way of implementation and an unbelievable hype.
Usual SOA implementation starts with purchase of an "infrastructure", which is usually some combination of ESB and process manager (usually BPEL-based). The next step is an attempt to connect existing services to ESB and "orchestrate" them using BPEL. While attempting to connect the very first real service it becomes quite clear that this is much harder that one would expect from product datasheets. Existing services - even though they are based on "web services" - are not quite ready for SOA. They need to be tweaked, schemas need to be modified to conform to the requirements of the ESB, new headers need to be added and so on. This usually result in wrapping existing services with yet another service layer to connect them to "SOA". But the ridiculous part just begins here. Now the services needs to be "orchestrated", e.g. the result of one service needs to be passed as an input to another service. But they have incompatible data formats. Now the deployment usually takes two different courses. First is an attempt to abuse BPEL to transform data formats. The result is unreadable, complex and unmaintainable "integration" mess expressed in BPEL. Second usual course is to create a special "conversion" service that just transforms data formats. This results in explosion of services: for each pair of cooperating services that is a new "adaption" service. Complex and unmaintainable integration mess again. Even if this task miraculously succeeds, there is another major problem ahead. The original services were neither independent nor idempotent, they usually have lot of (undocumented) side effects. Therefore they just cannot be freely reused and recombined into business processes. So the next necessary step is to re-engineer all the services (while maintaining backwards compatibility, of course). This is very costly and slow process, but until it is done the benefits of SOA are more than questionable.
What is the real benefit of such "SOA" deployment? I can't think of any. The mess is still there. It may seem that it is just in one place and therefore is easier to maintain. But that's just an illusion. There are service wrappers an transformation services that are usually not in one "place". And even if it is, it is so difficult to navigate that any real benefit of centralization is lost.
What are the downsides of such "SOA" deployment? First of all there is an investment to purchase SOA "infrastructure" and to set it up. Secondly, there is an investment to convert existing non-SOA integration mess to a SOA integration mess. Thirdly, the "infrastructure" is yet another moving part that can fail. It becomes a business-critical piece and needs additional (substantial) cost to set up high availability and resilience. It also takes a lot of energy and attention that could be invested in much better way, it disrupts usual business and builds a false of hope of better future. And I haven't even mentioned "advanced" problems such as synchronicity and consistency. Clearly and plainly, such SOA deployment is a waste of time. Waste of huge amount of time.
How to make it better? Just remember what SOA is: Service-Oriented Architecture. The focus should be on services not the infrastructure. The better way of SOA deployment is to start from there. Try to assess what services are already there, how well they are defined, whether they can be reused and whether they actually need to be reused at all. Every experienced developer knows that reusability does not come for free. It is actually quite costly quality. Therefore focus on services that need to be reused and re-engineer them to be reusable. This is best done as part of natural system upgrades and replacements. Try to gradually develop a common data model for your business so there will be less requirements to convert data from one service dialect to another. Some services may need more than one "upgrade" to get into a reusable form. This is not a fast process, so the number of services in SOA will grow very slowly. At the beginning of such SOA initiative the easiest way of "orchestrating" the services is to use any way that you are familiar with, e.g. a simple Java web application. Just make sure you can modify the orchestration code quite easily, as many adjustments will be needed as the things slowly settle down. Having an internal employee or a very flexible partner to do that job would be probably a good choice. As the number of services will be initially low and ability to reuse them will be quite limited, such "orchestration" code will be acceptably maintainable even if some things are hardcoded. This may be a good solution for first few years. Once the number of services grows then it is a good time to think about ESB and BPEL (or alternative technologies that most likely will be available in the future). At that time there will already be a considerable number of services therefore the cost of infrastructure could be easily justified. This service-oriented process to SOA deployment will be less expensive, less disruptive and will continually bring benefits proportional to investments.
Service-Oriented Architecture is nothing new. It is just an ordinary architecture. The architect works with systems instead of components. The architect works with services instead of component interfaces. Apart from that, it is still just an software architecture. Vast majority of principles and experiences applicable to intra-system architecture are reusable also for extra-system SOA. And that is probably the most important part of Service-Oriented Architecture.
Tuesday, 19 April 2011
I'm frequently exploring and evaluating new products. It is part of my job. The information overload is huge and it is important to know where not to look, how to quickly rule out products that are not worth looking at. I'm also reviewing architectures and design, consulting, commenting, advising and maintaining architectures. During the years I've noticed that it is quite easy to roughly evaluate a system by just looking at few places.
The first place to look is a web page section called Architecture or System Overview. If there is no such page or document, you can be pretty sure that the system is using a popular and well-established big ball of mud architectural pattern. Looking anywhere else is just a waste of time. The option you have is to try the system for yourself or look at the sources. But that's very time-consuming and usually the result is not worth the effort. Scratch such system.
If you have found the architectural description, it is usually not worth start reading yet. Just skip ahead and look at the first figure in the document. It usually looks like the "diagram" below. What does this creative depiction tells about the system? Well, it is a three-layer system. Or at least that's what the architect meant. There is a bunch of stuff in the middle layer, but the picture does not provide any information about the structure inside. Dependencies are not shown so system maintainability is unknown. Interfaces are not even hinted, so there is probably no interface at all or there is a jungle of competing and redundant interfaces. The system structure may not be set yet. Or maybe the architect does not understand the system to draw a better picture. Or maybe the team is afraid to show the structure to the public. None of that is a good sign. The arguments that "we don't want to clutter the picture with too much details" is usually just an excuse. Reading through the text is mostly a waste of time as it will most likely be just a marketing-oriented nonsense (MON). If there is no better figure anywhere nearby the best strategy is to ... run away.
If there is a picture similar to the following one, you are almost there. The system is a good candidate for further exploration. Such picture gives a reasonable level of details and indicates that the architect has quite a clear idea about the system structure. It is not just the form itself, it is the content of the diagram that you should focus on. Look at the figure below. You can see that there is a Repository component. Two data stores below indicates that the component is supposed to act as an abstraction for various data storage mechanisms. Although it is not shown in the picture, you can pretty much expect an interface on top of the Repository component (and other components as well). Similarly with Integration component. You can also see that similar approach is used for two subsystems (Repository and Integration) unified by a common Model component. User interface is placed on top of Model, which clearly isolates it from the low-level details. The most important dependencies are shown in the diagram which provides some hints about maintainability of such architecture. You can get quite a good idea how such system works by just looking at the diagram.
Now it is the right time to read through the text and other documents. Look especially for explanation of motives, not just the structure. For example, look for an explanation of reasons that the Repository and Integration components are not unified into a single component. That gives confidence that the team actually though over several alternatives before committing to current architecture. Look for links to papers, books and other sources. That gives a hint that the team spent some time "in the library" instead of trying to save time by blindly re-inventing things in the laboratory. Especially look for buzzwords. If find a buzzword used without a deeper explanation of motives it is a serious warning sign that the architecture may be buzzword-driven and therefore not sound. Look also at hints that the architecture was done all the way down. That means that the architect thought about deployment and usage of the system. Presence of an deployment diagram or a typical usage scenario is a good sign that this happened.
It is a curious thing how much can be learned from a single picture. The picture is really worth a thousand words.
Wednesday, 13 April 2011
This diagram can be found in almost any marketing document dealing with integration problems. It illustrates a concept that is known as hub and spoke. The difficult O(n2) problem of full mesh is reduced to a simpler O(n) problem by introducing a central hub. That's what the marketing guys say (although they don't usually use the O() notation).
What the marketing guys don't say is that this approach usually does not work. The most hubs are just simple message routers. This does not simplify anything. The hub just passes the message from sender to receiver. Oh yes, the hub may use abstract addresses instead of concrete ones, but that's yet another indirection thing that can be easily done without hub. Oh yes, the hub may do some basic protocol adaptation, but a well-chosen data representation library will do essentially the same thing. That won't justify the cost of the hub. And what's really the cost? Except for pretty big pile of money to pay there is an impact on systemic qualities. The hub is inherently a single point of failure. The hub introduces latencies. Hub can cause additional problems as it is yet another moving part in the system. And the communication with the hub is making the code more difficult to understand (just have a look at JMS).
This is a typical anti-pattern for many SOA-motivated deployments. Enterprise Service Bus (ESB) is usually the first component that gets deployed in SOA initiative. But it initially brings no substantial benefit, as there are no services to put on the bus. And if there are services, they are not independent and definitely not idempotent. Such services just cannot be efficiently reused in any other way that it was before the hub came into a play. In fact, the ESB should be among the last components that get deployed in SOA initiative, not the first. SOA is about the services, not about the bus.
Yet, there are few cases when the hub-and-spoke approach works:
- Asynchronous system: The hub is used to break the timing dependencies of the communicating systems. Such as in well-designed systems based on Messaging-Oriented Middleware, sending system does not need to wait for receiver to process the message. But such system needs to be designed for asynchronous operation, which is usually much more difficult to do than simple synchronous system. Also, the hub needs to be up all the time.
- Visibility: The hub is used to audit messages passing among the systems. But the hub needs to understand the protocol quite deeply to be of any real use.
- Common data model: Used to merge data from disparate data sources, converting each of them to and from a common format. The hub forms a "common denominator" communication interface. But, the interface needs to be designed very carefully, as it needs to satisfy the requirements of many communicating parties. Maintenance of such an interface may be in itself a much more difficult problem than the original full mesh thing.
Yet, most of these are very difficult to do right. And if done wrong, they may bring much more pain than benefits. Handle with care. The hub is a dragon egg and the spikes are venomous snakes.
Wednesday, 12 January 2011
The more I learn about distributed systems, the more I start to think that nobody has any idea how to make them well. It is unbelievable how little we know about distributed systems design and development. And how many failures do we still need to suffer to learn at least something?
As far back as I remember, there was Sun RPC, CORBA, DCOM, RMI, ... And now there are Web Services and REST. This is at least 20 years of development, yet all of that approaches seems to manifest the same fallacy: they try to hide the network from the developer. I will use Java JAX-WS as an example, but this is in no way specific to Java. The JAX-WS provides a runtime for a web service and generates a web service client. Both are designed to use local Java calls, efficiently hiding the network boundary. It goes a great deal for a programmer to feel comfortable. For example it hides the network exceptions from the programmer (by making them runtime exceptions). This may seem like a good approach, but it is a bad think in the very principle.
Most people would be surprised how applicable is the theory of relativity to a design of software systems. And most developers would be really surprised how slow the light is. The light will travel only approx. 300km in one millisecond. If the system needs a response under 1 millisecond, it just cannot communicate with anything that is more distant than 150km. Add a TCP three-way handshake and you are pretty much down to a size of a city. Assuming Einstein was right there is nothing, absolutely nothing, that could be done about this. No amount of technology or money can speed it up.
One millisecond is unbelievably long time for a computer system. Even a cheap computer can execute more than a million instructions in 1 millisecond. Local memory access is a bit slower, but even with that slow-down the local call is incomparably faster than a call over a fast network. Faster by several orders of magnitude. How could anyone hope to hide such a difference?
Reliability is another problem. While local call cannot (reasonably) fail, network call can and often does. Hiding the errors from the engineer does a disservice. It usually means that the network problems are ignored or handled at the top-level of an application. It means that any serious error in network communication means a sudden death of the application. Applications that are written a little bit better still manifest serious usability problems in presence of network failures. I can see that well enough on my iPhone.
Network is a significant boundary. A boundary that just cannot be swept under the carpet of a software development framework. Anyone trying to do that will most likely fail. Fail miserably.
Friday, 24 September 2010
During last few days Facebook went down and up and down and up again. It was not nice, but it was also not unexpected.
It was not nice because I suddenly haven't had a place where I could post my usual sarcastic remark (Twitter and all the rest just don't matter any more). That was painful.
It was not unexpected because Facebook is still just a technology (even though it sometimes looks like magic). Technology fails.
Facebook changed the communication paradigm. Except for private(*) point-to-point communication there are options for semi-private "multicast" status updates, groups, events, etc. I believe that the communication paradigm that Facebook popularized is a good one. However, Facebook failure disrupted communication of too many people at the same time. That failure is a good demonstration that Facebook architeture is wrong. Totally wrong. Because it is centralized, both from technological and from organizational point of view.
Centralization means that one party has a control all over the place. They can see everything, change anything, even rewrite history (that actually almost happened with terms of service and privacy settings). Think nineteen eighty-four.
Centralization means single point of failure. Good engineers know that this should be avoided, especially in Internet scale systems.
Centralization has an economic impact as well. Who will pay for all these boxes (and electricity, cooling, space, staff, ...) that are needed to handle a communication of more than a billion people?
The future of Facebook in its current form is more than questionable. Yet very few people can see it. I would not recommend to buy Facebook shares.
*) If a communication that must be shared with a strange multi-national corporation can be considered private.
Monday, 12 July 2010
Architecture and design of software systems is quite an adventure. There are very little hard constraints in software and even less in software architecture. Almost anything can be designed. And vast majority of the designs will look good and feasible. Even if quite an intensive review process is applied. It is extremely difficult to find the mistakes in software architecture just by talking about it. As a consequence I dare to speculate that all non-trivial software architectures contain at least one error.
Software architecture needs to be put into the conflict with reality as soon as possible. Only the reality can uncover the problems. The architecture needs to be quickly applied to design. Key concepts should be designed down to the details early in the project. The design needs to rapidly lead to implementation of prototypes. Prototypes needs to be immediately tested. Problems need to be addressed as soon as possible. Solutions to the problems of prototypes will backfire to the design. Changes in the design will influence the architecture. Changes in the architecture will need new prototypes ... and we have a loop here. This loop should better be convergent and finite. All the architects need this kind of loop for the architecture to be of any practical use. The difference between good and bad architect is the speed of convergence. Bad architects will need many iterations and most of them will happen during project implementation phase. Changing the architecture during implementation is really expensive. Good architects will settle down the architecture in small number iterations and will have pretty stable basic concepts before the full-scale implementation starts. Few adjustments to the architecture during implementation are always necessary, but these should not fundamentally change the basic idea. Such projects can usually be delivered with reasonable costs.
Architecture that is not validated by implementing parts of it is just a theoretical exercise. It may be a good first step, but it definitely cannot be presented as a final, practical result. Untested architecture may be good for experiments and research, but it is almost worthless from engineering point of view.
This principle applies to standardization even more intensively than to the software architecture. Standards influence a lot of engineers. Standards can make entire families of technologies to either succeed or fail. Good standards are based on a working software. Only working software can provide assurance that a standard does not have any major flaws. IETF standards are based on a working software. That's the approach that contributed to the success of the Internet as a whole. But too many of the standardization bodies does not follow this practice. Some of us can well remember the infamous example of CORBA, but it looks like most people have already forgotten. The WS-* stack seems to be heading in the same direction. And there is one particular example that I would like to mention: Service Provisioning Markup Language (SPML). SPML defines a (web) service specified using XML schema (XSD). However, the XML schema for the current version of SPML standard is not even passing validation. It violates the Unique Particle Attribution (UPA) rule. Therefore the standard SPML schema is unusable for many implementations. E.g. Java JAX-B cannot process it and therefore it cannot be a JAX-WS service. I have seen that people that use it are modifying the schema to make it usable - but then, what's the point of "standard" there?
There is very little space for innovation in standardization process. Almost none. Innovation should happen in engineering and experimental projects and only the working results of such project should be standardized. However, design by committee is a well known and widely used anti-pattern. Avoid using standards that are not based on working software. And especially avoid creating such standards.
Friday, 7 May 2010
From time to time I win the privilege to develop some code. I usually try do that using the same tools and environment that the development team is using. That way I can feel all the troubles that the developers are through and maybe I figure out how to make the work more efficient. Therefore I experienced a few development environments during my career all the way from vi to Eclipse. And that's exactly the damned thing I want to rant about today. No, it is not vi. Eclipse, that's the one to blame. I've used Eclipse several times before. It worked, but it actually never worked perfectly for me. There were some glitches all the time. But few days ago it culminated. To make long story short: I wasted many hours to make Eclipse Galileo work on Ubuntu with Subclipse and Maven2 plugins. I have failed. It just does not work. But the time was not entirely wasted, as now I can make really a bad example out of Eclipse.
Eclipse is multi-platforms system. There are flavors for Windows, Mac and Linux. Yet Eclipse Galileo SR1 somehow did not really work on my Ubuntu Linux. I could click wherever I wanted, but sometimes it just did nothing. Maybe that is a hidden usability feature that makes programmers think and not just blindly click? Or maybe to train them to use keyboard instead of mouse? Anyway, it makes this specific version really useless. Moral: If you claim you have multi-platform system take the time to really test it on all supported platforms.
The way how Eclipse is composed from plugins makes it very flexible. Just pick and choose plugins as you wish. But it also makes the system very complex. There are uncountable*) combinations of eclipse core and plugins, too many ways how they can influence each other and too many things to go wrong. And the result is that something goes wrong most of the time. The good news is that glitch is usually negligible on mainstream platforms, but users of less popular platforms usually suffer. Moral: Don't make your system too flexible. You will not be able to test it and to maintain it. If you pass a reasonable amount of flexibility, user satisfaction will go down instead of up.
Eclipse Galileo (that is the most recent version) for Java Enterprise Edition does not come with support for Subversion and Maven. If you would start a new Java project today, what would you choose to build it? It won't be make and probably not ant. I could understand that Maven support was not part of core distribution 5 years ago. But now it is a must for a development environment. The same applies for Subversion support. It is probably the most popular version control system ever, yet Eclipse does not come with a support for it. Yet, you can install them as plugins ... but ... see above. Moral: If you must have flexible system, make sure that it comes with a reasonable initial configuration. Flexibility is a difficult concept and it is extremely hard to diagnose. Make sure that your users will not experience problems caused by flexibility before they know your system well and can deal with the problems.
Now let's have a look at competition. Get recent NetBeans, download it, install it and - surprisingly - start developing. There's not only support for Subversion and Maven out of the box, there is also a very reasonable set of Java EE wizards and plugins. You can have a skeleton of pretty complex Java EE project completed literally in minutes. Productivity and usability, that should be the primary focus of development environment. It is not engineering to make something work. That's science. Engineering is to make something work better and especially at least a dollar cheaper than the next best competing company. Productivity is essential.
*) Figuratively speaking. They are in fact countable and even finite - as one of my favorite professors commented when I used that phrase during my dissertation defense.
Technorati Tags:
software
development
eclipse
IDE
usability
productivity
Tuesday, 23 February 2010
Simplicity is an important architectural feature. Simple systems are easier to understand. Simple component has fewer moving parts and therefore fewer reasons to break. Simple systems are easier to change and maintain. Yet our world is a complex thing and we, naive human beings, are often making it even more worse. We are creating legislation that no common citizen can understand unless he spends at least 5 years of his life attending a law school. We are creating all kinds of rules and regulations starting ranging from non-formal social norms all the way to multi-national treaties.
Software systems are here to help us deal with the complexity. But that necessarily makes them complex. It is not the programming language or operating system that makes software complex. Novice programmers can handle technological problems quite quickly (ever seen "Teach yourself Java in 21 days"?). The difficult problem is not "how to build it". The real problem is "what to build". The environment, user expectations, past and future - these are the primary sources of complexity.
What happens if you try to solve complex problem with a simple solution? It may work - if you are genius and you figure out something that generations before you missed. But honestly most of us are not geniuses. More likely outcome is that a simple solution for a complex problem will not work. It will break in spectacular and unexpected ways. Why is that?
You start to analyse the problem, find the important pieces, the "essence". Once you have that you design it, implement it, test it, reimplement it, test it, deploy it ... and you discover that the "essence" was not enough to solve the problem. I mean the real problem of human users, not the usual problem of "how to pass acceptance testing and get the money". Therefore you rethink the problem, discover that the essence is much more complex than you expected. Most project now just start to complicate the think that they already implemented. The system get more and more difficult to develop and no amount of refactoring seems to be enough. The system more and more looks like some devious creation of Dr.Frankenstein. An abomination. Such project will run high over the budget and miss all the deadlines. And project goals change as well. The original goal of solving the real problem of system users is quickly transformed to "just finish it, get the money and get out of here".
Initial definition of the problem and scope of the project is essential. The specification needs to have a correct breadth. Specification of a problem that is too complex will make the project extremely expensive just to find out that users are using only 20% of system functionality. Specification of a problem that is too simple will lead to a system that fails to satisfy the users and is unusable in practice. The specification also needs to have a correct depth. Waterfallist-like 1000 pages of analytical documentation is a pure nonsense. Nobody will read it and it creates analysis-paralysis situation. Agilist-like one-liner "the system must work" will not do either. It will send the project into an endless loop of refactoring, competing project goals and divergence.
Small problems, inaccuracies and omissions in initial project specification are easy to fix during the project. But if the breadth and depth of the specification is wrong, the project is doomed to failure from the very beginning. If the goals are wrong is it a success if such goals are reached?
Tuesday, 26 January 2010
SOAP is one of the prominent protocols for remote procedure invocation (RPC). It can do more than that, but it is used almost exclusively for RPC. More specifically it is used for RPC across the Web, both internally and externally. It is used on the Web so frequently that most people working with SOAP do not even realize that it can be used without HTTP and in non-RPC way.
SOAP by itself is quite simple XML-based message format. However it is accompanied by army of profiles, recommendations and especially a set of so called WS-* specifications. That creates a "SOAP stack" that is quite complex. This labyrinth of specifications is an attempt to solve the qualities of SOAP such as security, reliability, addressing, distribution of policies, etc. It makes SOAP quite a flexible mechanism. But ...
Flexibility does not come for free. Until very recently the price was paid by suffering all kinds of interoperability problems. It was so severe that a special organization was established to improve interoperability. Now basic SOAP implementations work together acceptably well, but the situation is not that good for various WS-* extensions. It will take a lot of effort to make implementations fully interoperable. But that is not a problem of SOAP itself. It is an inherent cost of complexity and distribution.
SOAP is not the first "fabric" of distributed systems. There was CORBA before SOAP and Sun-RPC before CORBA to name just a few of many existing mechanisms. However, the designers of SOAP failed to learn from the past. The intent of SOAP was to simplify things. But SOAP stack is now almost the same complexity as CORBA was ten years ago. SOAP is XML-based and with HTTP it can pass easily through firewalls (that are broken anyway). But that's almost the only advantage over CORBA. And now about the drawbacks ...
The most serious failure of SOAP design is the lack of support for object orientation. SOAP is not about invoking methods on objects, it is about invoking operations of static services. Objects cannot be arguments in SOAP messages, cannot be returned from operations and there is no support for object references. All of that was a fundamental part of CORBA, yet there is no concept of objects in SOAP. In fact it is an odd joke to call it Simple Object Access Protocol - as it is definitely not about objects and either not simple or not a protocol (depends on your point of view).
SOAP is also not outright compatible with World Wide Web architecture. Web is based on REST style that defines few basic operations that should be common for all services. SOAP services can use arbitrary operations without any link to the operations of REST. REST architecture also naturally assumes object orientation - web resources are (almost) objects. SOAP does not deal with objects at all. Therefore applicability of SOAP on Internet scale is quite a controversial topic.
SOAP is good in the enterprise and in quite closed environments where interoperability can be assured by testing. SOAP with WSDL has quite a strong interface definition mechanism. It is a rare trait for a technology born on the Internet and it is a necessary condition for composing complex systems. SOAP is also almost the only option for integration, as CORBA is dead and asynchronous mechanisms are seen as too complex and unnecessary by buzzword-driven integrators. If we will be lucky enough, SOAP may eventually get to the state where CORBA was a decade ago ... I can almost hear the melody ... just little bits of history repeating.
Technorati Tags:
software
SOAP
integration
Web
HTTP
REST
Monday, 25 January 2010
RESTful web services are seen by many (especially young) developers with almost religious awe. Such services are built using standard HTTP protocol with usual HTTP methods as operations. RESTful web services have no arguments, they GET, PUT, POST and DELETE resource representations. The resources are identified by URLs that are also used for links among resources. Such an approach requires a fundamental change of mindset when compared to a more traditional RPC-style of building services. But that is not really a problem: most simple services can be acceptably well modeled using the RESTful approach. The problem is not in the functional aspect.
The problem is, as usual, in the tricky non-functional aspect. Web services are mechanism for communication between computers, but the Web was designed for human-to-computer interactions. Many issues appear from the blue if the Web is used for something that it was not really designed for. Let's have a look at security aspect of RESTful web services as an example.
It is difficult to authenticate invoker of the service to the provider. There are two authentication mechanisms for HTTP (basic and digest), but these are design for interactive human-to-computer authentication. HTTPS in mutual authentication mode provides another solution. This can be non-interactive, but is quite hardcoded to X.509. Under normal circumstances it can authenticate two sites to each other. What would a service need is to authenticate user to the site. If you want to authenticate user on the client side to server, you can still do that with somehow non-typical use of X.509. In that case each client site must be a certificate authority. However as certificate constraints are not well supported, root certificate authorities are not likely to issue certificates that allow creating subordinate certificate authorities to clients.
But even if HTTPS/SSL/X.509 can be fixed, it will most likely not solve the problem. I doubt that X.509 can be flexible enough to support broad variety of security schemes that Internet-wide technology requires. And the flexibility comes with a cost: interoperability. The people working with enterprise PKI know how difficult is to achieve interoperability of different X.509 implementations, and that is miles away from Internet scale. There was only a slightly improvement in two decades of X.509 existence therefore there is little hope that X.509 will be the right solution for the Internet.
There are (relatively) new security mechanism out there, but these apply more to the RPC-style web services. WS-Security and SAML are good examples. WS-Security specifies a header to SOAP request that contains security credentials. SAML specifies protocols and security token applicable for various scenarios, including Internet-scale single-sign-on and federation. However it is difficult to use SAML with RESTful web services. SAML tokens are usually many lines of XML code. In SOAP there is a place for the token in message header, but there is no such place in HTTP. I don't think that placing few kilobytes of XML data in custom HTTP request header is ideal solution. If that would work at all it will be a non-standard hack. And there is no other place in HTTP GET request for such data. There is a way how to shorten SAML token into a few bytes of SAML artifact. But artifact resolution requires additional round trip. In fact several round trips as a new TCP connection (and most likely also SSL handshake) is usually required. It also requires active client being able to listen for connections and maintenance of state on client side. There is also a question how to pass the artifact to the server. The usual way of putting that in the query string is a violation of REST principles, therefore the result will likely be non-standard solution or broken architecture.
The situation is quite similar for many other non-functional aspects. It is difficult to guarantee consistency, atomicity and coordination of RESTful web services (e.g. make them part of a transaction). As URLs are both service endpoints and object identifiers, it is difficult to move service without breaking compatibility. There is no practical interface definition language and interoperability guidelines. Each definition of RESTful service is a free-form text for humans to read and implement with very limited possibility for code generation ...
I'm not trying to tell that all that is RESTful is useless. Both REST and RESTful web services can be very useful, especially with services that shoot for Internet scale. RESTful web services undoubtedly have many advantages but also many limitations. Standard RESTful web services are not yet ready for anything but very simple public services - for that RESTful solution could be ideal. However RESTful approach fails if service quality is important. Custom non-standard solutions can help a bit, but these have their own dangers, especially if the goal is to create interoperable Internet-scale services.
Engineering is not religion and technologies should be assessed with sceptic eye. An engineer that designs anything RESTful should be well aware of the limitations of REST and Web instead of blindly following the hype.
Technorati Tags:
software
REST
RESTful web services
web services
security
Monday, 18 January 2010
The world is not an objective place. There seems to be no single point of view, no absolute truth. There is only a little piece of information that could be regarded as reliable - an information that is well summarized by the famous cogito, ergo sum. All the rest is, more or less, speculation.
Consider some quite distant land, for example Antarctica. Have you been there? Have you observed it personally? Most probably you haven't. All you know about Antarctica is second-hand information. They say that strange birds that cannot fly live in Antarctica. Penguins, that's how they are called. Would you believe that? Yes, you probably would. Have you heard that Yeti recently moved to Antarctica? Would you believe that? You probably wouldn't. Both "here be penguins" and "here be Yeti" are information. These are not facts, but mere information. It is the belief that makes them into facts.
But even things and phenomena that you personally observe cannot be automatically regarded as true. Think of David Copperfield and Statue of Liberty. People have witnessed how the statue disappeared. Yet, if you were one of them, would you believe that huge steel-and-copper statue has really ceased to exist for those few moments? Probably not. How many times have you seen pretty ladies sawn in half, disassembled into pieces and reconnected again or levitating freely in the air? What we see may not be what it appears to be. I'm sure you will be amazed by this excellent performance by my favorite duo Penn and Teller.
Think about your date of birth. Do you think that the date of your birth is an unquestionable fact? Not really. You were there when you were born, but you probably do not remember it. And you was quite incapable of checking the date for yourself at that time. Therefore you date of birth is just an information. It comes from quite a trusted source but, strictly speaking, it is not unquestionable.
Any information must be regarded with an appropriate level of confidence. You will probably not really doubt your date of birth, therefore the level of confidence is very high. You believe in that information. But you will probably doubt that Yeti lives in Antarctica (everybody knows that Yeti lives in Himalayas). Therefore a level of confidence for that information is low. You do not believe in that. However, you may slowly increase the level of confidence as more and more expeditions will report encounters with Yeti in Antarctica. As it goes beyond a certain threshold you may start believing that. And once the popular press brings a convincing evidence that what was considered to be Yeti was in fact a mutated giant rat from Mars transported to Antarctica for the sole amusement of penguins by the four headed hyper-intelligent lizards of Sirius IV, you may quite stop believing in Yeti.
Seems pretty obvious, isn't it? Now how is it related to software?
Software is all about information. However, overwhelming majority of software systems have no ability to be "somehow inclined to believe in Yeti" or "quite doubt that Yeti has moved recently". Most software systems have only one level of confidence: fact. That was not a problem when the information systems were small and disconnected. A user working with a specific system was somehow aware what is the reliability of information coming from that system. The user either knew how the system worked or slowly learned how reliable the information is by confronting it with reality. The user as a thinking human being is correcting inability of computer system to deal with uncertainty.
But such a simple approach will fail in case of global distributed hyper-connected information super-highway such as the Web or Semantic Web. Users don't know how the displayed information was acquired and processed and usually have no time spending few years confronting the information with reality. Users of the Web have no way how to asses reliability of information they see. The simple binary model of true-false will not work in this environment. Any system using such binary model that includes computer-to-computer communication on a large scale is doomed to failure.
I quite believe that the future is not really bright for Internet-scale web services and semantic web. Unless they can learn how to doubt.
Technorati Tags:
software
distributed systems
web
semantic web
philosophy
software engineering
Wednesday, 13 January 2010
I was surprised to find out that not many people can create good abstraction. Many people are good in thinking about concrete objects and problems, but only a few of these can think abstractly. We in the software industry are forced to think abstractly from the very beginning, as software itself is somehow abstract. However when it comes to creating higher-level software abstractions, people often fail.
Interfaces are probably the most significant abstractions in software. Interfaces are formed from programming languages constructs, network protocol messages, states and sequences, signals, file formats, XML tags and many other elements. Interfaces provide a basic mechanism that an architect can use to exercise control over the system. Interfaces are powerful tool to contain change, to enhance reusability, to make the system more understandable and manageable. Yet too many interface definitions are weak, imprecise, incomplete or outright misleading.
During the course of several years I found myself gradually compiling a list of items that need to be included in a good interface definition. Recently I have found the time to put it into a document, add some explanation and examples. The result is here:
Interface Definition, Guidelines and Recommendations
I have decided to publish it under Creative Commons Attribution license (CC-BY) so you can freely use it in your project as long as you give me a proper credit. I recommend you to copy and paste parts of the document to create a guidelines suitable for your project. I hope that this helps many people to improve the skill of creating abstractions.
Technorati Tags:
software
architecture
abstractions
programming
design
interface
Friday, 16 October 2009
Now I will disclose one of the most secret of secrets of software business: All software is bad. Except maybe for very rare specimens that are even worse. It does not matter whether it is commercial or open-source, young or mature, big or small, it is bad. The quality is universally low. I'm not talking just about the bugs, but about all software qualities: performance, scalability, understandability, flexibility, visibility, reliability and security.
I cannot remember if I have seen good software in my entire career. I mean a software that was appropriate for the purpose. Software that worked as expected. Worked not only on the day one, but even 10 years later. Software that could be evolved without undermining its basic architectural principles. Software that was intuitive and easy to use, well documented, secure, ...
The reason for this situation is not technological. It is not that we software engineers are ... ehm ... idiots. We are not. We are doing our work well, considering the circumstances. The reasons are purely economical. It is just not profitable to create good software. Bad software can be very successful on the market. Quality is not high priority when making software purchasing decisions, but features are. Quality is difficult to understand and it is usually not directly visible. However features are outright visible and can be presented in in an impressive way. Quality can be usually seen only after the system survives first few years under production load. That's the point where the defects will manifest themselves, usually in a spectacular way. But at that point the software is already purchased and strongly hardwired in place.
From this point of view it is just a plain waste of money to invest in quality. Increased quality will not increase software sales and therefore not increase profits of software companies. In fact it may even harm their business: higher quality means lower motivation to purchase support services. Quality is not a competitive advantage. Spending more money on quality is a competitive disadvantage.
Therefore, caveat emptor.
Technorati Tags:
software business quality
|