« February »
SunMonTueWedThuFriSat
   1234
567891011
12131415161718
19202122232425
262728    
       
About
Categories
Recently
Syndication
Locations of visitors to this page

Powered by blojsom

Radovan Semančík's Weblog

Tuesday, 6 April 2010

GET, PUT, DELETE and POST are four basic operations of HTTP. Many people think these are operations of REST, however Roy Fielding does not mentions them in his dissertation where REST architectural style is defined. The only constraint of REST regarding operations is the constraint of uniform interface (which by itself is problematic). However, these four operations are the de facto uniform interface of the Web. Having unsatisfactory definition this interface is frequently misused. And no wonder ...

Today I want to talk about PUT operation. Current definition of PUT makes it almost unusable. PUT is used for resource creation and modifications. Both use cases are problematic.

The problem with resource creation is that the client invoking PUT operation must specify resource URL. But how could client know the correct URL for a new resource? The server should maintain the URL namespace, not the client. The URLs should be cool, should not be reused and should follow application conventions. Believing that the client will maintain URL namespace consistent is like believing in any of The Eight Fallacies of Distributed Computing. The clients are out of control. They can be buggy, written by ignorant coders or just outright malicious. Server should maintain the order in URL space. But PUT operation does not allow that. It is placing the responsibility for URL creation to the client. Oh yes, the server can check client's request and enforce correct URL form. But such approach has many drawback. Firstly, it may be much more difficult to check URL than to assign it. Secondly, the URL assignment logic needs to be in many places (all the clients and the server). Thirdly, the client must be able to detect URL assignment problem and re-try which complicates the client. And it may not be able to succeed at all if client's and server's URL assignment policies are incompatible. And lastly, the URL assignment may take many round-trips. This makes PUT operation almost useless for resource creation. How much easier it would be to make server responsible for URL assignment?

If PUT is used to update a resource, it is assumed that the body of PUT request is new absolute state of the resource. There is no locking, neither optimistic nor pessimistic. There are no mechanisms that would enable consistency. Therefore you just cannot have consistency with PUT. Don't we need consistency in an open world-wide distributed information space? In fact we do need it, especially now as the Web becomes "writable". Yes, the server could still check the acceptability of new resource state, e.g. by making resource version part of resource state and checking it on each PUT (let's pretend for a while that mixing data and meta-data is a good idea). But if we accept such approach, what is the difference between PUT and POST? The interface is good if there is nothing to remove, if there is no redundancy. If PUT and POST are the same one of them should go.

PUT operation should either change or die. I strongly recommend not to use it in its present form.

Technorati Tags:

Posted by rsemancik at 11:52 AM in Web architecture
Tuesday, 1 December 2009

World Wide Web Architecture, and the REST architectural style as well, deal with resources. Resource is one of the central concepts in the web. Web pages are just representations of resources, resources are identified by URIs, the web is all about resources. But what is a resource? Now, that's a mystery.

The World Wide Web Architecture document provides quite vague and indirect definition:

By design a URI identifies one resource. We do not limit the scope of what might be a resource. The term "resource" is used in a general sense for whatever might be identified by a URI. It is conventional on the hypertext Web to describe Web pages, images, product catalogs, etc. as “resources”. The distinguishing characteristic of these resources is that all of their essential characteristics can be conveyed in a message. We identify this set as “information resources.” [...] However, our use of the term resource is intentionally more broad. Other things, such as cars and dogs (and, if you've printed this document on physical sheets of paper, the artifact that you are holding in your hand), are resources too. They are not information resources, however, because their essence is not information. Although it is possible to describe a great many things about a car or a dog in a sequence of bits, the sum of those things will invariably be an approximation of the essential character of the resource.
That means that anything can be a resource. Dogs, houses, books, specific version of a book, specific paper-based copy of a book, photograph of the book, files containing data scanned from that book in pixmap format, data containing content of that book in ASCII format, HTML-formatted content of that book, the web page that contains the HTML formatted content of that book and even web page describing that book in an electronic shop - all that could be resources. But wait, isn't a web page containing HTML-formatted content of the book in fact a resource representation? Yes, it is. And many of the objects and concepts mention above may be resource representations. And they may, at the same time, be themselves a resources. In fact there seems to be no difference between representation and a resource (maybe except for non-information resources). The world of web in not black-and-white with abstract resources and concrete representations (as it seems to be at least partially assumed by REST). There are many shades of gray between abstract and concrete. And maybe the pure abstractness and pure concreteness are just theoretical extremes that cannot be reached in practice. Such a fuzziness of meaning is one the most difficult parts of Web architecture to understand.

However, allowing real-world things to be resources make a awful lot of problems. The panorama of these issues starts with the problem of who is authorized to assign URI to star known as "Sirius" (as it obviously can be a resource and it should have a single URI). Then it goes through a problem of completeness, as it is quite difficult to imagine that an "information resource" would capture all aspects, characteristics, feature and (potentially conflicting) viewpoints that concern a specific real-world thing. Many more problems follow and I'm sure we do not yet see most of them. I've tried to capture the obvious problems in my paper. Semantic web activity is trying to address some of these issues, but so far it seems that the result is to make the problems machine-processable and efficiently distributable to Internet scale. I have seen no real solution so far.

Therefore I have proposed to limit the definition of resource to only include so called "information resources". The information resources may indirectly refer to the real-world things and concepts, but Web in fact does not need (and cannot) deal with the real world directly.

Technorati Tags:

Posted by rsemancik at 2:41 PM in Web architecture
Friday, 27 November 2009

The Web, created more that 15 years ago, has changed everything. Many people cannot imagine how they possibly could work efficiently without Web, how they would kill the time or how to find they way around. The Web, looking from a user perspective, is a huge success. However, looking under the hood uncovers the surprising truth: Web is just a technology, with all the drawbacks. The deeper one goes into understanding Web concepts, the more he is surprised that Web works.

Current web architecture seems to be reflected in two documents:

  • Architecture of the World Wide Web, Volume One is a W3C recommendation that tries to put all the basic principles of Web architecture into a single document. It is a must-read for any technologist that tries to create anything on the Web.
  • Architectural Styles and the Design of Network-based Software Architectures is a dissertation of Roy Fielding, one of the authors of HTTP 1.1 specification. REST architectural style is defined in this document with an intent to guide the development of Web architecture and protocols (although I would bet that most of the web developers have no idea that REST is an architectural style or what an architectural style is).
Both of these documents were created in a retrospective fashion: to document and formalize what was already created and working. Both documents describe ideal state and does not entirely match the reality, but that is to be expected from a high-level architectural work. However, what I would also expect from an architectural work is consistency and ability to evolve. While the Fielding's work is consistent and exact, the WWW Architecture document does not feature such qualities. The definitions are vague, there are unexplained concepts and although there is no obvious major internal inconsistency, the document contradicts other documents published by W3C Technical Architecture Group (TAG). See this paper for all the details.

It looks like the very foundation of the Web is quite weak and confusing. No wonder that Web developers create all kinds of abominations, no wonder they are confused. They don't know what is good and what is wrong, what will bring the web closer to the ideal and what will be just another nail in its coffin. It is a natural outcome that most developers choose to ignore web architecture as it is mostly useless anyway. And that is going to hurt is badly in a long run.

However, there is an important lesson to be learned. Fifteen years ago the Web was just a simple hypertext system that slowly spread across the small Internet. It had only a minimal design, almost no documentation and no protocol specifications. If the Web should have to pass an architectural review or any major project scrutiny, it would probably be canceled. The web architecture team haven't managed to get the architecture right in all these 15 years. Yet the web is successful, maybe one the most successful technologies ever. Does it mean that quality and technological merit are far less important than popularity, courage and luck?

Technorati Tags:

Posted by semancik at 4:56 PM in Web architecture