« January »
SunMonTueWedThuFriSat
     12
3456789
10111213141516
17181920212223
24252627282930
31      
About
Categories
Recently
Syndication
Locations of visitors to this page

Powered by blojsom

Radovan Semančík's Weblog

Friday, 29 January 2010

Quite an interesting scam appeared on Facebook. It was just a matter of time when something like that will pop up, yet I was quite surprised when I have actually seen it. The scam works like this: There is a simple HTML page that promises to provide nude photos in zip file if you click on the button. However, if you click on the button you will see no butts and tits. A link to the tricky page will be posted to your facebook profile instead. If you want to try it go to http://homeslices.org/f2.html (if the page is still around). But you have been warned.

The trick is simple. The page creates an iframe containing pretty standard facebook form to share a link. However the frame is almost invisible, therefore you cannot see it. But the browser still think you can see it and is processing it. The tricky page has a "View" button on the same location as is "Share" button on the invisible facebook page. You think you are clicking on the "View" button but instead you are clicking on the "Share" button on facebook. The iframe is fetched by your browser, therefore it is your identity that is used on facebook to post the link.

This page is pretty innocent. All it does is a bit of humiliation for the victims, amusement for experts and undoubtedly a lot of fun for the author. But imagine that this very same method is used to subvert your Internet banking. I guess that the method could be adapted to subvert many of current Internet banking applications. It won't be that funny any more.

This is the price we pay for flexible presentation formats. There are two basic principles of the trick:

  1. Mix the content from two sites in one window. Content from facebook is displayed in a page where you do not expect it, it a wrong context, with a wrong URL in the URL bar.
  2. Create ambiguous display of information. The browser thinks you can see the "Share" button. If has 1% opacity, therefore it is still somehow opaque and ergo visible. Therefore it thinks that if you click to the place where "Share" button is you want to submit information to facebook. But in fact you do not see the "Share" button because if has only 1% of opacity and therefore is almost invisible. You are clicking to that area because you see "View" button that is behind it.
The first problem is a specific problem of HTML. It can be fixed quite easily, if there would be enough "political will" to do it. But the second problem is the problem. How much opaque something should be to be considered opaque enough? Should 1% grey text on white background be considered visible? Or can a 2pt big font be considered readable?

Probably the most serious implication of this problem is a bit independent from a Web. Presentation formats are very dangerous when used it a legally binding way. For example if you sign a document with a digital signature. If you sign a contract and it contains a paragraph written in light grey text on a white background, should such a text be considered part of the contract or not? Some devices may display that text as well readable while on some devices it cannot be seen. This opens up a huge door to a scam of all sizes.

This problem applies universally to any data format that includes rich presentation features: HTML, Microsoft Word documents, RTF, OpenDocument and many more. But maybe the worst aspect of all of this is that our government as well as many other governments in Europe explicitly allows such data formats for legally binding documents signed by "guaranteed digital signature". I'm really lucky that I have no qualified certificate to create such a signature.

Technorati Tags:

Posted by rsemancik at 7:05 PM in security
Tuesday, 26 January 2010

SOAP is one of the prominent protocols for remote procedure invocation (RPC). It can do more than that, but it is used almost exclusively for RPC. More specifically it is used for RPC across the Web, both internally and externally. It is used on the Web so frequently that most people working with SOAP do not even realize that it can be used without HTTP and in non-RPC way.

SOAP by itself is quite simple XML-based message format. However it is accompanied by army of profiles, recommendations and especially a set of so called WS-* specifications. That creates a "SOAP stack" that is quite complex. This labyrinth of specifications is an attempt to solve the qualities of SOAP such as security, reliability, addressing, distribution of policies, etc. It makes SOAP quite a flexible mechanism. But ...

Flexibility does not come for free. Until very recently the price was paid by suffering all kinds of interoperability problems. It was so severe that a special organization was established to improve interoperability. Now basic SOAP implementations work together acceptably well, but the situation is not that good for various WS-* extensions. It will take a lot of effort to make implementations fully interoperable. But that is not a problem of SOAP itself. It is an inherent cost of complexity and distribution.

SOAP is not the first "fabric" of distributed systems. There was CORBA before SOAP and Sun-RPC before CORBA to name just a few of many existing mechanisms. However, the designers of SOAP failed to learn from the past. The intent of SOAP was to simplify things. But SOAP stack is now almost the same complexity as CORBA was ten years ago. SOAP is XML-based and with HTTP it can pass easily through firewalls (that are broken anyway). But that's almost the only advantage over CORBA. And now about the drawbacks ...

The most serious failure of SOAP design is the lack of support for object orientation. SOAP is not about invoking methods on objects, it is about invoking operations of static services. Objects cannot be arguments in SOAP messages, cannot be returned from operations and there is no support for object references. All of that was a fundamental part of CORBA, yet there is no concept of objects in SOAP. In fact it is an odd joke to call it Simple Object Access Protocol - as it is definitely not about objects and either not simple or not a protocol (depends on your point of view).

SOAP is also not outright compatible with World Wide Web architecture. Web is based on REST style that defines few basic operations that should be common for all services. SOAP services can use arbitrary operations without any link to the operations of REST. REST architecture also naturally assumes object orientation - web resources are (almost) objects. SOAP does not deal with objects at all. Therefore applicability of SOAP on Internet scale is quite a controversial topic.

SOAP is good in the enterprise and in quite closed environments where interoperability can be assured by testing. SOAP with WSDL has quite a strong interface definition mechanism. It is a rare trait for a technology born on the Internet and it is a necessary condition for composing complex systems. SOAP is also almost the only option for integration, as CORBA is dead and asynchronous mechanisms are seen as too complex and unnecessary by buzzword-driven integrators. If we will be lucky enough, SOAP may eventually get to the state where CORBA was a decade ago ... I can almost hear the melody ... just little bits of history repeating.

Technorati Tags:

Posted by rsemancik at 9:34 AM in Software
Monday, 25 January 2010

RESTful web services are seen by many (especially young) developers with almost religious awe. Such services are built using standard HTTP protocol with usual HTTP methods as operations. RESTful web services have no arguments, they GET, PUT, POST and DELETE resource representations. The resources are identified by URLs that are also used for links among resources. Such an approach requires a fundamental change of mindset when compared to a more traditional RPC-style of building services. But that is not really a problem: most simple services can be acceptably well modeled using the RESTful approach. The problem is not in the functional aspect.

The problem is, as usual, in the tricky non-functional aspect. Web services are mechanism for communication between computers, but the Web was designed for human-to-computer interactions. Many issues appear from the blue if the Web is used for something that it was not really designed for. Let's have a look at security aspect of RESTful web services as an example.

It is difficult to authenticate invoker of the service to the provider. There are two authentication mechanisms for HTTP (basic and digest), but these are design for interactive human-to-computer authentication. HTTPS in mutual authentication mode provides another solution. This can be non-interactive, but is quite hardcoded to X.509. Under normal circumstances it can authenticate two sites to each other. What would a service need is to authenticate user to the site. If you want to authenticate user on the client side to server, you can still do that with somehow non-typical use of X.509. In that case each client site must be a certificate authority. However as certificate constraints are not well supported, root certificate authorities are not likely to issue certificates that allow creating subordinate certificate authorities to clients.

But even if HTTPS/SSL/X.509 can be fixed, it will most likely not solve the problem. I doubt that X.509 can be flexible enough to support broad variety of security schemes that Internet-wide technology requires. And the flexibility comes with a cost: interoperability. The people working with enterprise PKI know how difficult is to achieve interoperability of different X.509 implementations, and that is miles away from Internet scale. There was only a slightly improvement in two decades of X.509 existence therefore there is little hope that X.509 will be the right solution for the Internet.

There are (relatively) new security mechanism out there, but these apply more to the RPC-style web services. WS-Security and SAML are good examples. WS-Security specifies a header to SOAP request that contains security credentials. SAML specifies protocols and security token applicable for various scenarios, including Internet-scale single-sign-on and federation. However it is difficult to use SAML with RESTful web services. SAML tokens are usually many lines of XML code. In SOAP there is a place for the token in message header, but there is no such place in HTTP. I don't think that placing few kilobytes of XML data in custom HTTP request header is ideal solution. If that would work at all it will be a non-standard hack. And there is no other place in HTTP GET request for such data. There is a way how to shorten SAML token into a few bytes of SAML artifact. But artifact resolution requires additional round trip. In fact several round trips as a new TCP connection (and most likely also SSL handshake) is usually required. It also requires active client being able to listen for connections and maintenance of state on client side. There is also a question how to pass the artifact to the server. The usual way of putting that in the query string is a violation of REST principles, therefore the result will likely be non-standard solution or broken architecture.

The situation is quite similar for many other non-functional aspects. It is difficult to guarantee consistency, atomicity and coordination of RESTful web services (e.g. make them part of a transaction). As URLs are both service endpoints and object identifiers, it is difficult to move service without breaking compatibility. There is no practical interface definition language and interoperability guidelines. Each definition of RESTful service is a free-form text for humans to read and implement with very limited possibility for code generation ...

I'm not trying to tell that all that is RESTful is useless. Both REST and RESTful web services can be very useful, especially with services that shoot for Internet scale. RESTful web services undoubtedly have many advantages but also many limitations. Standard RESTful web services are not yet ready for anything but very simple public services - for that RESTful solution could be ideal. However RESTful approach fails if service quality is important. Custom non-standard solutions can help a bit, but these have their own dangers, especially if the goal is to create interoperable Internet-scale services.

Engineering is not religion and technologies should be assessed with sceptic eye. An engineer that designs anything RESTful should be well aware of the limitations of REST and Web instead of blindly following the hype.

Technorati Tags:

Posted by rsemancik at 12:43 AM in Software
Monday, 18 January 2010

The world is not an objective place. There seems to be no single point of view, no absolute truth. There is only a little piece of information that could be regarded as reliable - an information that is well summarized by the famous cogito, ergo sum. All the rest is, more or less, speculation.

Consider some quite distant land, for example Antarctica. Have you been there? Have you observed it personally? Most probably you haven't. All you know about Antarctica is second-hand information. They say that strange birds that cannot fly live in Antarctica. Penguins, that's how they are called. Would you believe that? Yes, you probably would. Have you heard that Yeti recently moved to Antarctica? Would you believe that? You probably wouldn't. Both "here be penguins" and "here be Yeti" are information. These are not facts, but mere information. It is the belief that makes them into facts.

But even things and phenomena that you personally observe cannot be automatically regarded as true. Think of David Copperfield and Statue of Liberty. People have witnessed how the statue disappeared. Yet, if you were one of them, would you believe that huge steel-and-copper statue has really ceased to exist for those few moments? Probably not. How many times have you seen pretty ladies sawn in half, disassembled into pieces and reconnected again or levitating freely in the air? What we see may not be what it appears to be. I'm sure you will be amazed by this excellent performance by my favorite duo Penn and Teller.

Think about your date of birth. Do you think that the date of your birth is an unquestionable fact? Not really. You were there when you were born, but you probably do not remember it. And you was quite incapable of checking the date for yourself at that time. Therefore you date of birth is just an information. It comes from quite a trusted source but, strictly speaking, it is not unquestionable.

Any information must be regarded with an appropriate level of confidence. You will probably not really doubt your date of birth, therefore the level of confidence is very high. You believe in that information. But you will probably doubt that Yeti lives in Antarctica (everybody knows that Yeti lives in Himalayas). Therefore a level of confidence for that information is low. You do not believe in that. However, you may slowly increase the level of confidence as more and more expeditions will report encounters with Yeti in Antarctica. As it goes beyond a certain threshold you may start believing that. And once the popular press brings a convincing evidence that what was considered to be Yeti was in fact a mutated giant rat from Mars transported to Antarctica for the sole amusement of penguins by the four headed hyper-intelligent lizards of Sirius IV, you may quite stop believing in Yeti.

Seems pretty obvious, isn't it? Now how is it related to software?

Software is all about information. However, overwhelming majority of software systems have no ability to be "somehow inclined to believe in Yeti" or "quite doubt that Yeti has moved recently". Most software systems have only one level of confidence: fact. That was not a problem when the information systems were small and disconnected. A user working with a specific system was somehow aware what is the reliability of information coming from that system. The user either knew how the system worked or slowly learned how reliable the information is by confronting it with reality. The user as a thinking human being is correcting inability of computer system to deal with uncertainty.

But such a simple approach will fail in case of global distributed hyper-connected information super-highway such as the Web or Semantic Web. Users don't know how the displayed information was acquired and processed and usually have no time spending few years confronting the information with reality. Users of the Web have no way how to asses reliability of information they see. The simple binary model of true-false will not work in this environment. Any system using such binary model that includes computer-to-computer communication on a large scale is doomed to failure.

I quite believe that the future is not really bright for Internet-scale web services and semantic web. Unless they can learn how to doubt.

Technorati Tags:

Posted by rsemancik at 1:29 PM in Software
Wednesday, 13 January 2010

I was surprised to find out that not many people can create good abstraction. Many people are good in thinking about concrete objects and problems, but only a few of these can think abstractly. We in the software industry are forced to think abstractly from the very beginning, as software itself is somehow abstract. However when it comes to creating higher-level software abstractions, people often fail.

Interfaces are probably the most significant abstractions in software. Interfaces are formed from programming languages constructs, network protocol messages, states and sequences, signals, file formats, XML tags and many other elements. Interfaces provide a basic mechanism that an architect can use to exercise control over the system. Interfaces are powerful tool to contain change, to enhance reusability, to make the system more understandable and manageable. Yet too many interface definitions are weak, imprecise, incomplete or outright misleading.

During the course of several years I found myself gradually compiling a list of items that need to be included in a good interface definition. Recently I have found the time to put it into a document, add some explanation and examples. The result is here:

Interface Definition, Guidelines and Recommendations

I have decided to publish it under Creative Commons Attribution license (CC-BY) so you can freely use it in your project as long as you give me a proper credit. I recommend you to copy and paste parts of the document to create a guidelines suitable for your project. I hope that this helps many people to improve the skill of creating abstractions.

Technorati Tags:

Posted by rsemancik at 3:57 PM in Software
Friday, 8 January 2010

The Web and especially OpenID has yet to learn important lesson: nothing is permanent. Will Norris mentions it in his post. To make his long story short, the problem is that OpenID relies on DNS and DNS names can be reassigned. With change of control of DNS name the control of associated OpenID identifier is changed as well. Therefore a user may be required to pay for a domain that he does not want any longer just to avoid losing control over the OpenID identifier. The root of the problem is that DNS is not really an identification mechanism, but rather an addressing mechanism. OpenID design does not account for that.

The purpose of address is to locate an object, therefore it contains information about object's location - directly or indirectly. Address must change if the location of the object changes. DNS is using a level of indirection to reduce the number of changes needed if object location changes, but it does not reduce them to zero. You may be forced to pay for a domain forever if you want to make DNS name a permanent identifier - assuming you can do that at all. For example the rules for sk top-level domain will force you to yield your domain in case someone registers a trademark that is the same as your existing domain name. Therefore making DNS name persistent may be quite costly. DNS domain is an address. Get over it.

The purpose of identifier is to distinguish the object from other similar objects. Well-designed identifiers does not need to change. The identifier may identify an object that does not exist any longer, but it should never identify a different object. Think of ANS.1 OIDs, ISBNs or similar identifiers. For identifiers to be efficient their assignment should be very cheap and maintenance must be extremely cheap or entirely free.

It is not wrong per se to use address in your system. But it is a mistake to use an address and assume that it has properties of identifier. It is a failure to assume that address will not change - almost as serious a mistake as assumption that identifier can always be resolved.

Technorati Tags:

Posted by rsemancik at 6:42 PM in Identity