| || || ||1||2||3||4|
| || || || || || || |
Radovan Semančík's Weblog
Wednesday, 18 May 2016
Test-Driven Development (TDD) tells us to write the tests first and only then develop the code. It may seem like a good idea. Like a way how to force lazy developers to write tests. How to make sure that the code is good and does what it should do. But there's the problem. If you are doing something new, something innovative, how the hell are you supposed to know what the code should do?
If you are doing something new you probably do not know what will be the final result. You are experimenting, improving the code, changing the specification all the time. If you try to use TDD for that you are going to fail miserably. You will have no idea how to write the tests. And if you manage to write it somehow you will change them every time. This is a wasted effort. A lot of wasted effort. But we need the tests, don't we? And there is no known force in the world that will make the developer to write good and complete tests for the implementation once the implementation is finished. Or ... is there?
What are we using in midPoint project is Test-Driven Bugfixing (TDB). It works like this:
- You find a bug.
- You write an (automated) test that replicates the bug.
- You run the test and you check that the test is failing as expected.
- You fix the bug.
- You run the test and you check that the test is passing.
That's it. The test remains in the test suite to avoid future regressions. It is a very simple method, but a very efficient one. The crucial part is writing the test before you try to fix the bug. Even if the bugfix is one-liner and the test takes 100 lines to write. Always write the test first and see that it fails. If you do not see this test failure how can you be sure that the tests replicates the bug?
We are following this method for more than 5 years. It works like a charm. The number of tests is increasing and we currently have several times more tests that our nearest competition. Also the subjective quality of the product is steadily increasing. And the effort to create and maintain the tests is more than acceptable. That is one of the things that make midPoint great.
(Reposted from Evolveum blog)
Tuesday, 17 March 2015
There was a nice little event in Bratislava called Open Source Weekend. It was organized by Slovak Society for Open Information Technologies. It is quite a long time since I had a public talk therefore I've decided that this a good opportunity to change that. Therefore I had quite an unusual presentation for this kind of event. The title was: How to Get Rich by Working on Open Source Project?.
This was really an unusual talk for the audience that is used to talks about Linux hacking and Python scripting. It was also unusual talk for me as I still consider myself to be an engineer and not an entrepreneur. But it went very well. For all of you that could not attend here are the slides.
The bottom line is that it is very unlikely to ever get really rich by working on open source software. I also believe that the usual "startup" method of funding based on venture capital is not very suitable for open source projects (I have written about this before). Self-funded approach looks like it is much more appropriate.
(Reposted from https://www.evolveum.com/get-rich-working-open-source-project/)
Friday, 24 October 2014
Evolveum is a successful open source company now. We develop open source Identity and Access Management (IAM) software. We have legally established Evolveum in 2011 but the origins of Evolveum date back to mid-2000s. In 2014 we are getting out of the startup stage into a sustainable stage. But it was a long way to get there. I would like to share our experiences and insights in a hope that this will help other is their attempts to establish an open source business.
The basic rules of the early game are these: It all starts with an idea. Of course. You need to figure out something that is not yet there and people need it. That is the easy part. Then you have to prototype it. Even the brightest idea is almost worthless until it is implemented. You have to spend your own time and money to implement the prototype. You cannot expect anyone to commit the time or money to your project until you have a working prototype. So prepare for that - psychologically and financially. You will have to commit your savings, secure a loan or sacrifice your evenings and nights for approx. 6 months. If you spend less than that then the prototype is unlikely to be good enough to impress others. And you cannot have a really successful project without the support of others (I will get to that). Make sure that the prototype has proper open source license from day one, that it is published very early and that it follows open source best practice. Any attempt to twist and bend these rules is likely to backfire when you are at the most vulnerable.
Then comes the interesting part. Now you have two options. The choice that you make now will determine the future of your company for good. Therefore be very careful here. The options are:
Fast growth: Find an investor. This is actually very easy to do if you have a good idea, good prototype and you are a good leader. Investors are hungry for such start-up companies. You have to impress the investor. This is the reason why the prototype has to be good. If you started alone find an angel investor. If you already have a small team find a venture capitalist. They will give you money to grow the company. Quite a lot of money actually. But there is a catch. Or better to say a whole bunch of them. Firstly, the investor will take majority of shares in your company in exchange for the money. You will have to give away the control over the company. Secondly, you will need to bind yourself to the company for several years (this may be as long as 10 years in total sometimes). Which means you cannot leave without losing almost everything. And thirdly and most importantly: you must be able to declare that your company can grow at least ten times as big in less than three years. Which means that your idea must be super-bright ingenious thingy that really everyone desperately needs and it also needs to be cheap to produce, easy to sell and well-timed - which is obviously quite unlikely. Or you must be inflating the bubble. Be prepared that a good part of the investment will be burned to fuel marketing, not technology. You will most likely start paying attention to business issues and there will be no time left to play with the technology any more. Also be prepared that your company is likely to be sold to some mega-corporation if it happens to be successful - with you still inside the company and business handcuffs still on your hands. You will get your money in the end, but you will have almost no control over the company or the product.
Self-funded growth: Find more people like you. Show them the prototype and persuade them to work together. Let these people become your partners. They will get company shares in exchange of their work and/or money that they invest in the company. The financiers have a very fitting description for this kind of investment: FFF which means Friends, Family and Fools. This is the reason for the prototype to be good. You have to persuade people like you to sacrifice an arm and a leg to your project. They have to really believe in it. Use these FFF money to make a product out of your early prototype. This will take at least 1-2 years and there will be almost no income. Therefore prepare the money for this. Once you are past that state the crucial part comes: use your product to generate income. No, not sell the support or subscription or whatever. This is not going to work at this stage. Nobody will pay enough money for the product until it is well known and proven in practice. You have to use the product yourself. You have to eat your own dogfood. You have to capitalize on the benefits that the product brings, not on the product itself. Does your product provide some service? Set up a SaaS and provide the service for money. Sell your professional services and mix in your product as additional benefit. Sell a solution that contains your product. Does your product improve something (performance, efficiency)? Team up with the company that does this "something" and agree on sharing the revenue or savings generated by your product. And so on. You have to bring your own skin to the game. Use the early income to sustain product development. Do not expect any profit yet. Also spend some money and time on marketing. But most of the money still need to go to the technology. If the product works well then it will eventually attract attention. And then, only then, you will get enough money from subscriptions to fund the development and make profit. Be prepared that it can take 3-6 years to get to this stage. And a couple more years to repay your initial investment. This is a slow and patient business. In the end you will retain your control (or significant influence) over the product and company. But it is unlikely to ever make you a billionaire. Yet, it can make a decent living for you and your partners.
Theoretically there is also a middle way. But that depends on a reasonable investor. An investor that cares much more about the technology than he cares about money, valuations and market trends. And it this is extremely rare breed. You can also try crowdfunding. But this seems to work well only for popular consumer products that are not very common in the open source world. Therefore it looks like your practical options are either bubble or struggle.
And here is a couple of extra tips: Do not start with all-engineer team. You need at least one business person in the team. Someone that can sell your product or services. Someone that can actually operate the company. You also need one visionary in the team. Whatever approach you choose it is likely that your company reaches full potential in 8-12 years. Not any earlier. If you design your project just for the needs of today you are very likely to end up with an obsolete product before you can even capitalize on it. You also need a person that has his feet stable on the ground. The product needs to start working almost from the day one otherwise you will not be able to gain the momentum. Balancing the vision and the reality is the tough task. Also be prepared to rework parts of your system all the time. No design is ever perfect. Ignoring the refactoring needs and just sprint for the features will inevitably lead to development dead-end. You cannot afford that. That ruins all your investment.The software is never done. Software development never really ends. If it does end then the product itself is essentially dead. Plan for continuous and sustainable development pace during the entire lifetime of your company. Do not copy any existing product. Especially not other open source product. It is pointless. The existing product will always have a huge head start and you cannot realistically ever make that up unless the original team makes some huge mistake. If you need to do something similar than other project already does then team up with them. Or make a fork and start from that. If you really start on a green field you have to use a very unique approach to justify your very existence.
I really wish that someone explained this to me five years ago. We have chosen to follow the self-funded way of course. But we had to explore many business dead-ends to get there. It was not easy. But here we are, alive and well. Good times are ahead. And I hope that this description helps other teams that are just starting their companies. I wish them to have a much smoother start than we had.
(Reposted from https://www.evolveum.com/start-open-source-company/
Monday, 30 July 2012
When I was a young university student I have learned TCP/IP by reading RFCs. It gave me exact idea how the network worked. It trained me to recognize good specification. And it also somehow persuaded me to believe in standards. And I have maintained that belief for most most of my professional life. However it started to vanish few years ago. And recently I have lost that faith completely. There were two "last drops" that sent my naïveté down the drain.
The first drop was SCIM. I was interested in that protocol as I hoped that having a standard interface in midPoint would be a good thing. But as I went through the specification I have recognized quite a lot of issues. This is a clear telltale of an interface which is under development and it is not suitable for a real-world use, not even thinking about becoming a standard. I have concluded that SCIM is a premature standardization effort and was ready to forget about it. But there was suggestion to post the comments on SCIM mailing list and in an attempt to be a good netizen I did just that. There was some discussion on the mailing list. But it ended in vain. What I figured is that there is no will to improve the protocol, to make the specification more concrete and useful. SCIM is not a protocol, it is not an interface. It is a framework that can be changed almost beyond recognition and one still can call it SCIM. All hopes for practical interoperability are lost. Well, there are some public interoperability testing. But I have checked the scenarios that were actually tested. And these are the most basic simplest cases. These are miles away from the reality. The folks on SCIM mailing lists argue that most of the "advanced" features are to be done as protocol extension, which most likely requires "profiling" the protocol for a specific use case. Which means practically no interoperability out of the box. Every real-world deployment will need some coding to make it work. I believe that SCIM is lost both as a protocol and as a standard.
The other drop was OAuth 2. I was not watching that one so closely, but recently a friend pointed me to Eran Hammer's blog entry. Eran describes the situation that is very similar to SCIM: specification that does not really specifies anything and a lack of will to fix it. That was the point when I realized that I have seen this scenario in various other cases during the last few years. It looks like premature standardization is the method and vague specifications are the tools of current standardization efforts. I no longer believe in standards. They just don't work.
But we need interoperability. We need protocols and interfaces. How can we do that without standards? I think that open specifications are the way to take. Specifications that are constructed outside of the standardization bodies. Specifications backed by (open source) software that really work in practical situations before they are fixed and "standardized". Specifications based on something that really works. That seems to be the only reasonable way.
But there is also a danger down this road. Great care should be taken to do the design responsibly, to specify it well, to reuse (if possible) instead of reinvent and to learn from the experiences of others. To avoid creating abominations such as OpenID.
Monday, 16 July 2012
Clouds are everywhere. We got pretty much used to that buzzword. Open API Economy is quite new. But it is almost the same. What seems to be the mantra behind "Cloud" and "Open API Economy" is: Do not do it yourself. Scrap whatever solution you have now and replace it with the magic service from the cloud. It is a perfect, easy, cheap and simple solution. Or ... is it?
What most of the proponents of cloud APIs have in mind is this:
The cloud companies publish an API that makes their services available to the consumers. Consumers do not need to understand the intricacies of how the service is implemented. They just consume the API which is far simpler. So far so good. It is quite easy to do for one service. Or two. But how about eight?
Poor little Alice will need to create (and maintain) a lot of client software. Oh yes, it is still easier than hosting all these services internally. Unless, of course, the internal implementation can be customized to specific needs that Alice has. And unless the internal implementation can expose a more suitable interface. Anyway, the complexity will not magically go away with migration to the cloud. It can even complicate the things much more, especially if somehow each of the cloud services has a different mechanism for security, consistency, redundancy, ...
There is also one deadly trap in the clouds: vendor lock-in. It is not the regular vendor lock-in as is known today. This is much worse. If you have a software and you stop paying your annual support fee nothing really happens. You still have the right to use the software. The software may break or you may need to change it. But there may be several companies that can do this for you. But the situation is quite different in the cloud. If you stop paying for the cloud the service will stop. Immediately. You have no right to use the service any more. You may own the data, but how do you migrate it to a different service? The APIs are not compatible, data formats are not compatible and processes are not compatible. Actually cloud service is inherently difficult to customize therefore the usual software replacement strategy is almost useless. Once the idea of a cloud sinks in the service fee may quite easily become a ransom.
To make things even worse current cloud services are not really cloud at all. They are not lightweight, not omnipresent and they cannot really move that well either. They are more like petrocumuli. They are dangerous.
The problem behind all of this is the basic misunderstanding of the purpose of the interface in the software systems. Interface, thats the "I" in API, yet too many people designing APIs do not understand the principles well enough. One of the purpose of the interface is to hide implementation from the clients. That's what the API folk gets right. But there is more. Much more. The reason why we want to hide the implementation is because we want to have a freedom in changing that implementation. The changes may happen in time, e.g. a new version of the service implementation. But they may also happen in space, e.g. a switch to an alternative service implementation. And it is the later case that cloud providers seems to ignore. Accidentally? Or is there a purpose?
For an example see how Microsoft reinvents semantic web using its own Graph API. The experience taught me that whatever Microsoft does it does it with a purpose.
So, what is the solution? It is too early to tell. We do not know enough about distributed systems yet. But one thing is almost certain. The use of cloud APIs should be similar to the use of interfaces in any well-designed software systems. When applied to cloud it might look like this:
We know this concept well. It is concept of a protocol: an agreement between communicating parties that abstracts the actual implementation. And that's it. The cloud APIs should not really look like APIs. They should be protocols.
Wednesday, 16 May 2012
All software is bad and it is not likely to change anytime soon. There is not a substantial difference between open source and commercial software when it comes to product quality. Both are difficult to use, very hard to diagnose and unsuitable for any practical purpose without a good deal of ugly hacking. But there is one little detail that actually makes a huge difference: source code.
I have spent most of today fighting with a code generation plugin that is part of our build. The code gave all kinds of helpful error messages such as "Index out of bounds: -1" and "null". There were no logs and no diagnostics output. The
-verbose option was most likely provided just for the sake of completeness and had no practical effect. It was simply a dream of every engineer. A very bad dream.
I have been in such situation numerous times, mostly with commercial software. That was a nasty experience in vast majority of the cases. Usually I had to spend many hours reading the useless documentation provided with the product and trying to diagnose the problem using any available tool ... just to fail miserably. Then I would file a trouble ticket and play a long ping-pong match with the support team. If I would be really lucky, few weeks later after many exchanges (and my nerves almost lost in the process) I might have received a hint what the solution might look like. But the most likely outcome is that the support team provides no useful information and I would need to create an ugly workaround all by myself. This happened too many times already.
But today the situation was different. The package that I was using was not a commercial software. It was open source. So I have downloaded the source code, fought with it for a few minutes and finally I had a fresh build of my own. I have navigated the labyrinth of ugly uncommented code and dropped few debug messages here and there. After many attempts and failures I have figured out what is wrong. And solved the problem with only minimal amount of ugly hacking. In just one day.
Few weeks compared to one day. That looks like a huge difference to me. That's one of many reasons why I have stopped to use almost all commercial software. It is just not worth the time. If you don't have buildable and modifiable source code you have nothing. Nothing at all.
May the Source be with you.
Friday, 11 May 2012
I see evidence in favor of this all the time. My colleagues that works on variety of projects and with quite a wild assortment of products are also agreeing that it holds. It looks like this might be a law:
No matter what it is, no matter how big it is, no matter how many people works on it, it always takes at least two years to create a working software product.
Friday, 8 July 2011
I was a very young student when I came across a book named Programátorské poklesky (Programmer's Misdemeanours) by Ivan Kopeček and Jan Kučera. The authors describe in a humorous way what are the results of programming errors. It was probably my very first book about programming that was not a programming language manual. It was a year after our country woke up from the communist era and programming books were difficult to come by. I think the book had influenced me more than I have anticipated or was willing to admit at the time.
One of the parts that I particularly remember was the software "psychology". Authors observed four temperaments of programs:
- Sanguine programs provide readable and helpful error messages, have useful help texts, try to recover from errors and try to communicate reasonably in general. Yet, user interaction is maybe the only useful part of such programs.
- Choleric programs does its job well. Such programs do not crash, but the error messages are very dense and cryptic. They do not provide any additional information and there is no help text. It does not try to recover from errors - it expects that the user will know what to do. Experts find these programs easy to use, but all other people hate it.
- Melancholic programs get very sad when they encounter the smallest of problems. The program just crashes, does not provide any message or description. They refuse to communicate about the problem any further and usually does not even provide a way to resolve it.
- Phlegmatic programs ignore any errors. They just carry no matter the cost. No error message, no indication, it just works on. Of course they may provide wrong results from time to time, but they run. That's the most important thing.
All of that came to my mind as I was discussing the error handling approach in mainstream programming languages (mostly Java). It usually boils down to handling exceptions.
The original approach in Java was to use checked exceptions. Programmer has to either catch them or declare them to be thrown. The authors of Java hoped that it will lead to a better error handling. But it looks like there is a glitch: error handling is very difficult to do right. It takes a lot of time and the error handling code may well be a significant part of the system. This leads to sanguine programs: they provide good information about errors, but they do little else. There is just not enough time and resources to do everything right.
Laziness is one of the three great virtues of the programmer. Therefore programmers soon stared to focus on the "meat" and simplified the exception handling. The easiest way at hand was to ignore all the exceptions. Catch all exceptions and handle them with empty code block. This obviously leads to a phlegmatic program. It will run no matter what happens. But the results may not be the best.
The current trend is to switch all the exceptions to the runtime exceptions. These do not enforce checking and handling. The usual outcome is that nobody checks or handles them. Any exception will bubble up through the call stack to the upper layers until is is caught by the framework. That may be an application server that will display a nicely formatted error message that essentially says "something somewhere went wrong" and terminate the request. The user has no idea what went wrong and where or how to recover from the problem. This is a melancholic program.
Luckily, some programmers display at least the exception type and message to the user. But what will the user do if presented with the message "ConsistencyException: Consistency constraint violated"? It is not really helpful. Most programmers also display or log a complete stack trace. But that won't help the user a bit. Even members of a core programming team have problems understanding that, user does not stand a chance. That gives us a choleric program.
Obviously, one size does not fit all. There is no single right way to do it. If a good error handling is required then a sanguine approach is needed. But there is a cost to pay: either reduced functionality or much more effort to do the "same" thing. Robust system asks for somehow phlegmatic approach while cheap code is best done melancholic. However, the usual approach is choleric code. Errors are reported, but nobody really understands them. You just can't always win.
Friday, 3 June 2011
I'm still quite young and my "professional memory" does not even count two decades. But I just cannot help to see some recurring patterns. Quite a scary patterns.
I was a student when Sun RPC was the cool thing. It has all that a C programmer needed at that time to create a distributed system. But obviously it was too simple.
CORBA was taking the place of the "cool thing" as I was finishing university. It had all that a C++ programmer may wish for to create a distributed system. Interfaces, object-orientation, "interoperable" references, ... But it was obviously too complicated to use.
XML took over during the dot-com bubble. Or better to say it was XML-RPC as a mechanism for Internet-scale distributed systems. It has all that PHP programmer would want. It had the "feature" of seamlessly passing firewalls. It was the cool thing for the Internet. But obviously, it was too simple.
SOAP came shortly after that. The mechanism by which Java and .NET architectures promised to bridge enterprise and the Internet. Originally designed as simple thing to do something with objects. It ended up as a maze of WS-WhatEver specifications that are far from being simple and actually have nothing to do with objects. This is obviously too complex to use.
RESTful religion is the current trend with JSON as its holy prophet, worshiped by the scripting crowd. It is based on an idealistic and internally inconsistent principles of Web Architecture with a loud promise of simplicity. But obviously, this is too simple to be practical.
Now we see JSON schema, namespaces, security and actually all the things that we have already seen in SOAP/WS-* and CORBA. I expect we will see a formal RESTful interface defintions soon. Will this be too complex to use, again?
What we see are cycles. Each new generation of engineers is re-inventing what the previous generation has invented, making all the mistakes all over again. Can this eventually converge? How long are the customers going to tolerate this? And what we really know about distributed systems?
Sorry guys. I just refuse to participate in this insanity.
Tuesday, 26 April 2011
Service Oriented Architecture is not a bad idea. Quite the contrary. What is bad about it is its way of implementation and an unbelievable hype.
Usual SOA implementation starts with purchase of an "infrastructure", which is usually some combination of ESB and process manager (usually BPEL-based). The next step is an attempt to connect existing services to ESB and "orchestrate" them using BPEL. While attempting to connect the very first real service it becomes quite clear that this is much harder that one would expect from product datasheets. Existing services - even though they are based on "web services" - are not quite ready for SOA. They need to be tweaked, schemas need to be modified to conform to the requirements of the ESB, new headers need to be added and so on. This usually result in wrapping existing services with yet another service layer to connect them to "SOA". But the ridiculous part just begins here. Now the services needs to be "orchestrated", e.g. the result of one service needs to be passed as an input to another service. But they have incompatible data formats. Now the deployment usually takes two different courses. First is an attempt to abuse BPEL to transform data formats. The result is unreadable, complex and unmaintainable "integration" mess expressed in BPEL. Second usual course is to create a special "conversion" service that just transforms data formats. This results in explosion of services: for each pair of cooperating services that is a new "adaption" service. Complex and unmaintainable integration mess again. Even if this task miraculously succeeds, there is another major problem ahead. The original services were neither independent nor idempotent, they usually have lot of (undocumented) side effects. Therefore they just cannot be freely reused and recombined into business processes. So the next necessary step is to re-engineer all the services (while maintaining backwards compatibility, of course). This is very costly and slow process, but until it is done the benefits of SOA are more than questionable.
What is the real benefit of such "SOA" deployment? I can't think of any. The mess is still there. It may seem that it is just in one place and therefore is easier to maintain. But that's just an illusion. There are service wrappers an transformation services that are usually not in one "place". And even if it is, it is so difficult to navigate that any real benefit of centralization is lost.
What are the downsides of such "SOA" deployment? First of all there is an investment to purchase SOA "infrastructure" and to set it up. Secondly, there is an investment to convert existing non-SOA integration mess to a SOA integration mess. Thirdly, the "infrastructure" is yet another moving part that can fail. It becomes a business-critical piece and needs additional (substantial) cost to set up high availability and resilience. It also takes a lot of energy and attention that could be invested in much better way, it disrupts usual business and builds a false of hope of better future. And I haven't even mentioned "advanced" problems such as synchronicity and consistency. Clearly and plainly, such SOA deployment is a waste of time. Waste of huge amount of time.
How to make it better? Just remember what SOA is: Service-Oriented Architecture. The focus should be on services not the infrastructure. The better way of SOA deployment is to start from there. Try to assess what services are already there, how well they are defined, whether they can be reused and whether they actually need to be reused at all. Every experienced developer knows that reusability does not come for free. It is actually quite costly quality. Therefore focus on services that need to be reused and re-engineer them to be reusable. This is best done as part of natural system upgrades and replacements. Try to gradually develop a common data model for your business so there will be less requirements to convert data from one service dialect to another. Some services may need more than one "upgrade" to get into a reusable form. This is not a fast process, so the number of services in SOA will grow very slowly. At the beginning of such SOA initiative the easiest way of "orchestrating" the services is to use any way that you are familiar with, e.g. a simple Java web application. Just make sure you can modify the orchestration code quite easily, as many adjustments will be needed as the things slowly settle down. Having an internal employee or a very flexible partner to do that job would be probably a good choice. As the number of services will be initially low and ability to reuse them will be quite limited, such "orchestration" code will be acceptably maintainable even if some things are hardcoded. This may be a good solution for first few years. Once the number of services grows then it is a good time to think about ESB and BPEL (or alternative technologies that most likely will be available in the future). At that time there will already be a considerable number of services therefore the cost of infrastructure could be easily justified. This service-oriented process to SOA deployment will be less expensive, less disruptive and will continually bring benefits proportional to investments.
Service-Oriented Architecture is nothing new. It is just an ordinary architecture. The architect works with systems instead of components. The architect works with services instead of component interfaces. Apart from that, it is still just an software architecture. Vast majority of principles and experiences applicable to intra-system architecture are reusable also for extra-system SOA. And that is probably the most important part of Service-Oriented Architecture.
Tuesday, 19 April 2011
I'm frequently exploring and evaluating new products. It is part of my job. The information overload is huge and it is important to know where not to look, how to quickly rule out products that are not worth looking at. I'm also reviewing architectures and design, consulting, commenting, advising and maintaining architectures. During the years I've noticed that it is quite easy to roughly evaluate a system by just looking at few places.
The first place to look is a web page section called Architecture or System Overview. If there is no such page or document, you can be pretty sure that the system is using a popular and well-established big ball of mud architectural pattern. Looking anywhere else is just a waste of time. The option you have is to try the system for yourself or look at the sources. But that's very time-consuming and usually the result is not worth the effort. Scratch such system.
If you have found the architectural description, it is usually not worth start reading yet. Just skip ahead and look at the first figure in the document. It usually looks like the "diagram" below. What does this creative depiction tells about the system? Well, it is a three-layer system. Or at least that's what the architect meant. There is a bunch of stuff in the middle layer, but the picture does not provide any information about the structure inside. Dependencies are not shown so system maintainability is unknown. Interfaces are not even hinted, so there is probably no interface at all or there is a jungle of competing and redundant interfaces. The system structure may not be set yet. Or maybe the architect does not understand the system to draw a better picture. Or maybe the team is afraid to show the structure to the public. None of that is a good sign. The arguments that "we don't want to clutter the picture with too much details" is usually just an excuse. Reading through the text is mostly a waste of time as it will most likely be just a marketing-oriented nonsense (MON). If there is no better figure anywhere nearby the best strategy is to ... run away.
If there is a picture similar to the following one, you are almost there. The system is a good candidate for further exploration. Such picture gives a reasonable level of details and indicates that the architect has quite a clear idea about the system structure. It is not just the form itself, it is the content of the diagram that you should focus on. Look at the figure below. You can see that there is a Repository component. Two data stores below indicates that the component is supposed to act as an abstraction for various data storage mechanisms. Although it is not shown in the picture, you can pretty much expect an interface on top of the Repository component (and other components as well). Similarly with Integration component. You can also see that similar approach is used for two subsystems (Repository and Integration) unified by a common Model component. User interface is placed on top of Model, which clearly isolates it from the low-level details. The most important dependencies are shown in the diagram which provides some hints about maintainability of such architecture. You can get quite a good idea how such system works by just looking at the diagram.
Now it is the right time to read through the text and other documents. Look especially for explanation of motives, not just the structure. For example, look for an explanation of reasons that the Repository and Integration components are not unified into a single component. That gives confidence that the team actually though over several alternatives before committing to current architecture. Look for links to papers, books and other sources. That gives a hint that the team spent some time "in the library" instead of trying to save time by blindly re-inventing things in the laboratory. Especially look for buzzwords. If find a buzzword used without a deeper explanation of motives it is a serious warning sign that the architecture may be buzzword-driven and therefore not sound. Look also at hints that the architecture was done all the way down. That means that the architect thought about deployment and usage of the system. Presence of an deployment diagram or a typical usage scenario is a good sign that this happened.
It is a curious thing how much can be learned from a single picture. The picture is really worth a thousand words.
Wednesday, 13 April 2011
This diagram can be found in almost any marketing document dealing with integration problems. It illustrates a concept that is known as hub and spoke. The difficult O(n2) problem of full mesh is reduced to a simpler O(n) problem by introducing a central hub. That's what the marketing guys say (although they don't usually use the O() notation).
What the marketing guys don't say is that this approach usually does not work. The most hubs are just simple message routers. This does not simplify anything. The hub just passes the message from sender to receiver. Oh yes, the hub may use abstract addresses instead of concrete ones, but that's yet another indirection thing that can be easily done without hub. Oh yes, the hub may do some basic protocol adaptation, but a well-chosen data representation library will do essentially the same thing. That won't justify the cost of the hub. And what's really the cost? Except for pretty big pile of money to pay there is an impact on systemic qualities. The hub is inherently a single point of failure. The hub introduces latencies. Hub can cause additional problems as it is yet another moving part in the system. And the communication with the hub is making the code more difficult to understand (just have a look at JMS).
This is a typical anti-pattern for many SOA-motivated deployments. Enterprise Service Bus (ESB) is usually the first component that gets deployed in SOA initiative. But it initially brings no substantial benefit, as there are no services to put on the bus. And if there are services, they are not independent and definitely not idempotent. Such services just cannot be efficiently reused in any other way that it was before the hub came into a play. In fact, the ESB should be among the last components that get deployed in SOA initiative, not the first. SOA is about the services, not about the bus.
Yet, there are few cases when the hub-and-spoke approach works:
- Asynchronous system: The hub is used to break the timing dependencies of the communicating systems. Such as in well-designed systems based on Messaging-Oriented Middleware, sending system does not need to wait for receiver to process the message. But such system needs to be designed for asynchronous operation, which is usually much more difficult to do than simple synchronous system. Also, the hub needs to be up all the time.
- Visibility: The hub is used to audit messages passing among the systems. But the hub needs to understand the protocol quite deeply to be of any real use.
- Common data model: Used to merge data from disparate data sources, converting each of them to and from a common format. The hub forms a "common denominator" communication interface. But, the interface needs to be designed very carefully, as it needs to satisfy the requirements of many communicating parties. Maintenance of such an interface may be in itself a much more difficult problem than the original full mesh thing.
Yet, most of these are very difficult to do right. And if done wrong, they may bring much more pain than benefits. Handle with care. The hub is a dragon egg and the spikes are venomous snakes.
Wednesday, 12 January 2011
The more I learn about distributed systems, the more I start to think that nobody has any idea how to make them well. It is unbelievable how little we know about distributed systems design and development. And how many failures do we still need to suffer to learn at least something?
As far back as I remember, there was Sun RPC, CORBA, DCOM, RMI, ... And now there are Web Services and REST. This is at least 20 years of development, yet all of that approaches seems to manifest the same fallacy: they try to hide the network from the developer. I will use Java JAX-WS as an example, but this is in no way specific to Java. The JAX-WS provides a runtime for a web service and generates a web service client. Both are designed to use local Java calls, efficiently hiding the network boundary. It goes a great deal for a programmer to feel comfortable. For example it hides the network exceptions from the programmer (by making them runtime exceptions). This may seem like a good approach, but it is a bad think in the very principle.
Most people would be surprised how applicable is the theory of relativity to a design of software systems. And most developers would be really surprised how slow the light is. The light will travel only approx. 300km in one millisecond. If the system needs a response under 1 millisecond, it just cannot communicate with anything that is more distant than 150km. Add a TCP three-way handshake and you are pretty much down to a size of a city. Assuming Einstein was right there is nothing, absolutely nothing, that could be done about this. No amount of technology or money can speed it up.
One millisecond is unbelievably long time for a computer system. Even a cheap computer can execute more than a million instructions in 1 millisecond. Local memory access is a bit slower, but even with that slow-down the local call is incomparably faster than a call over a fast network. Faster by several orders of magnitude. How could anyone hope to hide such a difference?
Reliability is another problem. While local call cannot (reasonably) fail, network call can and often does. Hiding the errors from the engineer does a disservice. It usually means that the network problems are ignored or handled at the top-level of an application. It means that any serious error in network communication means a sudden death of the application. Applications that are written a little bit better still manifest serious usability problems in presence of network failures. I can see that well enough on my iPhone.
Network is a significant boundary. A boundary that just cannot be swept under the carpet of a software development framework. Anyone trying to do that will most likely fail. Fail miserably.
Friday, 24 September 2010
During last few days Facebook went down and up and down and up again. It was not nice, but it was also not unexpected.
It was not nice because I suddenly haven't had a place where I could post my usual sarcastic remark (Twitter and all the rest just don't matter any more). That was painful.
It was not unexpected because Facebook is still just a technology (even though it sometimes looks like magic). Technology fails.
Facebook changed the communication paradigm. Except for private(*) point-to-point communication there are options for semi-private "multicast" status updates, groups, events, etc. I believe that the communication paradigm that Facebook popularized is a good one. However, Facebook failure disrupted communication of too many people at the same time. That failure is a good demonstration that Facebook architeture is wrong. Totally wrong. Because it is centralized, both from technological and from organizational point of view.
Centralization means that one party has a control all over the place. They can see everything, change anything, even rewrite history (that actually almost happened with terms of service and privacy settings). Think nineteen eighty-four.
Centralization means single point of failure. Good engineers know that this should be avoided, especially in Internet scale systems.
Centralization has an economic impact as well. Who will pay for all these boxes (and electricity, cooling, space, staff, ...) that are needed to handle a communication of more than a billion people?
The future of Facebook in its current form is more than questionable. Yet very few people can see it. I would not recommend to buy Facebook shares.
*) If a communication that must be shared with a strange multi-national corporation can be considered private.
Monday, 12 July 2010
Architecture and design of software systems is quite an adventure. There are very little hard constraints in software and even less in software architecture. Almost anything can be designed. And vast majority of the designs will look good and feasible. Even if quite an intensive review process is applied. It is extremely difficult to find the mistakes in software architecture just by talking about it. As a consequence I dare to speculate that all non-trivial software architectures contain at least one error.
Software architecture needs to be put into the conflict with reality as soon as possible. Only the reality can uncover the problems. The architecture needs to be quickly applied to design. Key concepts should be designed down to the details early in the project. The design needs to rapidly lead to implementation of prototypes. Prototypes needs to be immediately tested. Problems need to be addressed as soon as possible. Solutions to the problems of prototypes will backfire to the design. Changes in the design will influence the architecture. Changes in the architecture will need new prototypes ... and we have a loop here. This loop should better be convergent and finite. All the architects need this kind of loop for the architecture to be of any practical use. The difference between good and bad architect is the speed of convergence. Bad architects will need many iterations and most of them will happen during project implementation phase. Changing the architecture during implementation is really expensive. Good architects will settle down the architecture in small number iterations and will have pretty stable basic concepts before the full-scale implementation starts. Few adjustments to the architecture during implementation are always necessary, but these should not fundamentally change the basic idea. Such projects can usually be delivered with reasonable costs.
Architecture that is not validated by implementing parts of it is just a theoretical exercise. It may be a good first step, but it definitely cannot be presented as a final, practical result. Untested architecture may be good for experiments and research, but it is almost worthless from engineering point of view.
This principle applies to standardization even more intensively than to the software architecture. Standards influence a lot of engineers. Standards can make entire families of technologies to either succeed or fail. Good standards are based on a working software. Only working software can provide assurance that a standard does not have any major flaws. IETF standards are based on a working software. That's the approach that contributed to the success of the Internet as a whole. But too many of the standardization bodies does not follow this practice. Some of us can well remember the infamous example of CORBA, but it looks like most people have already forgotten. The WS-* stack seems to be heading in the same direction. And there is one particular example that I would like to mention: Service Provisioning Markup Language (SPML). SPML defines a (web) service specified using XML schema (XSD). However, the XML schema for the current version of SPML standard is not even passing validation. It violates the Unique Particle Attribution (UPA) rule. Therefore the standard SPML schema is unusable for many implementations. E.g. Java JAX-B cannot process it and therefore it cannot be a JAX-WS service. I have seen that people that use it are modifying the schema to make it usable - but then, what's the point of "standard" there?
There is very little space for innovation in standardization process. Almost none. Innovation should happen in engineering and experimental projects and only the working results of such project should be standardized. However, design by committee is a well known and widely used anti-pattern. Avoid using standards that are not based on working software. And especially avoid creating such standards.