Wednesday, October 03, 2007

Is the Internet inevitable?

October 4th is the 50-year anniversary of Sputnik, first man-made satellite. DARPA creation was directly attributed to the launching of Sputnik. DARPA funded a number of projects, most notably ARPANET. The packet switching protocols developed with DOD grants form the foundation of the Internet. Were Space Race politics essential for constructing a unified global data network, or would Internet be built anyway—with or without Sputnik?

As a thought experiment, imagine an alternate universe—without the Soviet Union. Without the Cold War. Without the motivation to create DARPA. With no ARPANET project, and no backbone to connect to. With research on packet switched networks being distributed among a number of entities, some commercial. Would a unified set of protocols emerge, or would the network evolve as a set of "walled gardens" that are loosely connected and weakly interoperable?

On one hand, the benefit of a unified network for the users is tremendous. On the other hand, the short term interest of commercial entities often favors closed networks. Consider the mobile phone networks. Most of the world standardized on GSM, while the cellphone users in the United States are stuck with incompatible protocols. Recent events surrounding iPhone demonstrate how commercial interests are fighting with the users who want open networks.

If you were transported to the alternate universe, would you be able to convince its residents to build the unified Internet? What arguments would you use? This is not an idle question. In our universe, there is plenty of convincing to be done—for example, there is no consolidated repository for social networking data.

Maybe we shouldn't take the Internet for granted. If politics played out differently, we could indeed have "the Internets" instead of the single integrated network.

Friday, October 20, 2006

The humane network

One Laptop per Child is an initiative designed to push one billion ultra-cheap laptops to the children of the world. As radical as this sounds it's certainly a plausible path for the evolution of computing. In any event, the network's unceasing appetite for more nodes will generate them as quickly and cheaply as possible.

But what about payment for network services? The current model is advertising but that can plausibly be seen as an artifact of the web much like it was for broadcast television. At this point, would anyone seriously advocate an advertising only model for television? PVR users will not even tolerate the constraints of fixed scheduling let alone time lost to explicit advertising.

The next billion computer users will be pulled onto the network by governments to make delivery of information services like education and training as cheap as possible. These young, predominantly third world, users will be valuable to the network not because of their incomes but simply because they are human. As the network becomes increasingly able to use human feedback the value of screen real estate and user attention will vastly outweigh the inefficient response rates of online advertising. As more and more user generated content is consumed, trends and patterns of identifiably human feedback will be the most valuable information resource in the digital universe.

Tuesday, September 12, 2006

Requiem for bytes

The Internet is simultaneously a battleground and a triumph of cooperation. Many businesses are built around the Internet and their competitive edge is often maintained using proprietary technology. This results in browser wars, portal wars, search engine wars and so on. However the most important changes are not single products from a single company but entire new protocols that create a new ecosystem for both cooperative and competitive activity. The bottleneck in the evolution of the Internet is the creation of these protocols—a breakthrough in the standartization process will lead to a breakthrough in the rate of development of the global computing infrastructure.

The Internet and the Web have been built around wire protocol standards such as HTTP and HTML. Agreeing to a common interpretation of a sequence of bytes enables data interchange and wide-scale collaboration. However, time and time again wire protocols cannot keep up with evolution of the systems they are describing and are becoming in the long term a barrier to true interoperability.

Useful interactions between systems requires complex interfaces; these interfaces cannot be completely described by a wire protocol, even an extensible one. Consider any of the existing image formats. They support simple extensions—for example, new metadata that specifies camera settings. However, adding a more complex extension—such as annotating a location of an image—is impossible, since programs that scale, crop or rotate an image won't know how to update the annotation coordinates. "Annotated image" must be described by a new image format, and all applications must be upgraded to support it.

Typically, the adoption of a new wire protocol is followed by a period of euphoria as new applications flourish. As the protocol is used widely, its limitations become apparent. The protocol is extended in ways that increasingly depart from the spirit of the original; the need for backwards compatibility becomes a growing burden. Eventually another generation of standards is produced that better addresses the requirements at the expense of backwards compatibility, pushing the infrastructure into a difficult and expensive phase of replacement.

At any given time, you can see wire protocols in every stage of the lifecycle. Today there is a great deal of activity around Atom, with new applications built around it. More mature HTML evolved a lot since the early versions, and the need to be backwards compatible with old browsers presented a significant problem. The foundation of the Internet, IPv4, is being slowly replaced by a redesigned IPv6—which requires an upgrade of the worldwide infrastructure.

As the world increasingly comes to resemble a single large computer, the need for myriad wire protocols will go away. Already Java and .NET virtual machines hide details from the application programmers, presenting only higher-level abstractions such as objects and interfaces. Wire protocols are used for persistence and for communication outside the machine; will they always be necessary? If the system transparently handles persistence, communication to the outside is not required—if everything is inside the world-wide computer, then there is no outside. The application developer never has to care about the proper sequences of bytes.

An alternative to wire protocols is to standardize on programmatic interfaces implemented using mobile code. For example, image objects include both data and code that performs operations such as scaling or rendering to the framebuffer. A document knows how to display itself interactively, print itself, or convert itself to plain text. The high-level interface is more compact and easier to get right than a wire protocol. Since objects of the same type share code, transfer and persistence can be implemented efficiently, while extensibility is practically unlimited.

The development tools must be the first to embrace the new mentality. Today's standards for describing interfaces and code—traditional text-based programming languages and bytecode specifications—are wire protocols themselves, and as wire protocols their adaptability and extensibility are strictly limited! "Program fragment" is an object that can be evaluated, edited and transformed—it should be treated by the system as an object and not as a sequence of source bytes. Language-Oriented Programming is an emerging paradigm that approaches software development without exposing wire protocols.

A successful standard for describing interfaces and mobile code can be compared to the invention of the alphabet. The alphabet allows different innovations to proceed simultaneously and take advantage of each other: new words are created from letters independently of new representations of letters such as fonts or Morse code or Braille. The move away from wire protocols will lead to independent development of richly interoperable software components and faster rate of evolution of the global computing infrastructure as a whole.

Tuesday, April 04, 2006

Phases of the Sun

Sun, returning to "The Network is the Computer" after briefly being "The Dot in Dotcom", recently launched "an easy, affordable utility computing service available to anyone with an Internet connection for $1/CPU-hr." Jonathan Schwartz stresses that Sun's Grid addresses the Long Tail of computing services: "My view—most computing will be purchased by that tail." However, Sun's Grid Compute Utility as it exists today is not truly general-purpose: it offers CPU capacity but not storage, and most applications need both.

A search engine, for example, can't take advantage of Sun's network.com service to process large amount of data, since the FAQ says: "Please note that any time your storage usage exceeds 10 GB, your job will be terminated until you comply with the 10 GB storage limit." Most likely, the PC that you use to read this post offers ten times more space than Sun's Grid: clearly, network.com is not a supercomputer for at least one dimension of "super."

Yet for certain applications, Sun's service is a good fit. If you want to render high-quality 3D animations from your basement, now you can do it with minimal startup cost. Sun's Grid and Amazon S3 suggest that both the hardware and software for the world computer is being actively developed. Sun and Amazon are not just showcasing technology, but pioneering the way to the "pay-per-use" business models of the future.

Monday, March 27, 2006

Emergent convergence

Steven Johnson in Wired: "there is reason to believe that the divided metaverse is merely a transitional phase, and that its component worlds will coalesce."

Wednesday, March 15, 2006

Gigabytes for centidollars

Storage will soon become a commodity, with companies competing on price and service levels. Amazon has entered the market with S3—Simple Storage Service which offers 99.99% availability and a straightforward fee structure:
  • $0.15 per GB-Month of storage used.
  • $0.20 per GB of data transferred.
Yet often you need to do more than just store and retrieve your data: for example, what if you want to search through it, or modify it in place? Downloading the information is slow and incurs transfer costs. The code is often smaller than the dataset, so it is more efficient and cheaper to send the program to the data, not the other way around.

Current storage services underutilize processing resources; providers will not have to make significant investments to offer advanced data manipulation interfaces. What's delaying this is the lack of standard for mobile code. Today there's no protocol which is accepted as a definite standard for program execution, in a way that HTTP has become ubiquitous for state transfer. Once this standard is developed, you will see an extra bullet point in the price plan: a fee per trillion instructions executed by the service provider.

Thursday, February 23, 2006

Dating tips for single points of failure

A typical office PC that mounts a drive over the network is not very reliable. Temporary failure of either the network or a file server will make a computer unusable. Failure of a local disk will at best require reinstallation of the OS and all the applications, and at worst can lead to irrecoverable loss of data. The overall system is less fault-tolerant than a hard drive or a network separately. Is it possible to improve the robustness of today's primary tools?

Increasing the reliability of individual parts can lead to slow progress, but the breakthroughs are achieved by putting the parts together in smarter ways. For example, RAID storage introduces a component that redundantly writes information to multiple disks. The system is prepared for the failure of each specific disk and can continue functioning without downtime.

A breakdown shouldn't be a catastrophe. The set of data that a computer user accesses is predictable; it is easy to copy it to a local disk and keep it up-to-date. If the network goes down, some information can be read and edited locally. When a network connection is reestablished, the data can be automatically synchronized. The Coda research project at CMU developed a distributed filesystem that works this way.

With disconnected operation there is the possibility that two users modify the same information thus creating a conflict. Conflicts can be resolved either automatically by the system, or by a component that is aware of the structure of the data. Ideally the code to reconcile versions should be the responsibility of application developers, who in the worst case can ask the user to merge versions.

Investment in a more reliable network or a more reliable disk drive doesn't offer nearly as good a return as an investment in fault-tolerant software. Software can utilize local storage to work around network downtime, and use the network to avoid data loss in case of a drive failure.

Wednesday, February 15, 2006

Atomic age

Cindy Cohn writes in EFF Deep Links:
Email being basically free isn't a bug. It's a feature that has driven the digital revolution. It allows groups to scale up from a dozen friends to a hundred people who love knitting to half-a-million concerned citizens without a major bankroll.
Ironically this misses an important trend: email no longer drives communities; mailing lists are being replaced by blogs and feeds. Readers originate requests to blogs—so if micropayment system is introduced, everyone will pitch in, starting a microfundraiser.

Monday, January 09, 2006

Executable Internet

George F. Colony: "The Web will fade."

Wednesday, January 04, 2006

Happy 2106!

Contrarian investing often pays off; let's try contrarian software engineering. Instead of thinking about the deadline that's a month away, consider the time frame of a century. The following Gedankenexperiment is important because it's usually easier to scale the problem down than to scale it up.
The Exotic Contract.
You are leading a team of software engineers, and you are offered an unusual contract: develop, maintain and enhance a software system over its 100-year lifespan. Here's the kicker: you are paid a lump sum (a really big lump sum: say, a billion dollars), not by the man-hour. (For the purposes of this thought experiment, how the feature set is negotiated, how the deadlines are set, and the impact of inflation are not of interest.) How you allocate the money over time is completely up to you. You have to optimize the software development process over a period of a century, and every dollar you save in the long run goes directly to your bottom line. What is the most profitable course of action?
The industry-standard approach is to develop a system and rewrite it, usually component by component, every several years. This makes sense when you are coding for a start-up that can go out of business in six months. But if you sign the 100-year contract, it is guaranteed to be a long-term project. Is throwing away code every five years really the optimal way to proceed?

Let's look at the world of computing with a 100-year perspective. Will Win32 API be relevant? No, it will be forgotten within decades at most. What about HTML? Chances are, another standard will come along. On the other hand, the concepts of 'an integer' or 'a string of characters' or 'a transaction' will still be useful. With a bit of foresight, entities can be categorized as either timeless or fleeting.

Here's a different approach to the exotic contract problem. First, select abstractions that make up a "timeless" platform, and have two parallel efforts going on: one to develop applications on the timeless platform, and another to map the timeless platform to existing software platforms such as Windows and the Web (possibly with some application-specific hints.)

For the world of databases, the relational model serves as the timeless platform: application programmers use it, and DBAs give hints by optimizing logical and physical data layout to gain maximum performance. However the relational model is not suitable for general-purpose programmingit lacks important abstractions such as 'a sequence' or 'an object'. There is no universal timeless platform available today: if there were, everybody would be using it!

Even if no one's knocking at your door offering you an exotic contract, the exotic contract problem itself is an important one. Companies come and go, but the software industry will be around for a long time. Looking at all the software development efforts as one enormous project, it makes sense to take a bird-eye's view and optimize it for the long run: there is a huge economic incentive to do so.

Monday, December 26, 2005

Magic-update

Tim Bray: "There are two kinds of software: the kind that offers to update itself when appropriate, and the kind that's broken." Ideally, once users select a trade-off between features and code maturity (the range is from "bleeding edge beta" to "proven and stable"), the software is then updated automatically. An option to easily revert to a previous version should be provided if feasible.

This goes for both web-based and installed software: the deployment models of all software systems are converging.

Tuesday, December 20, 2005

When will then be now?

Even though the question of whether art imitates life or life imitates art is a cliché, and a Black-or-White Fallacy, it can lead to interesting inquires. Consider the art of science fiction, which has a complex relationship with the life of technology. Sci-fi writers build the foundation for their work by observing existing and emerging technology. This foundation prevents their imagination from taking over completely and ending up in the fantasy genre. Individual writers may see technology progressing in different ways; yet common technological trajectory stands out among the art of writers of a single generation.

Scientists, engineers, and inventors are influenced by that trajectory. The little kid that lives inside each one of them thinks it’d be really neat to turn fiction into reality. Some dedicate their life to doing just that. Examples of fiction made into reality are numerous. The bigger ones are Čapek’s robots, Wellslasers, and Clarke’s geostationary satellites.

While cyberpunk and its corresponding science of computing is still in its teenage years, (complete with rebellion against The Man or The Machine), they enjoy the same relationship. So, what is the vision of the future of computing as described by today's cyberpunk art? What is that common trajectory? Stephenson’s Metaverse. Gibson’s Cyberspace. Even Wachowskis’ Matrix. Is there any doubt as to where The Net is going to end up?

Monday, December 19, 2005

Breaking news

"The Internet Is Broken" is MIT Technology Review's current cover story. "We are at an inflection point, a revolution point," Internet elder statesman and onetime Chief Protocol Architect David D. Clark now argues. What's next? "It's time to rethink the Internet's basic architecture, to potentially start over with a fresh design." A few more high-profile revolutionaries—and people will start paying attention.

Tuesday, December 13, 2005

The sky ahead

Computing revolutions are driven by economic factors. Computers and networks continuously become faster and cheaper. These quantitative changes periodically create qualitative improvements for users and developers.

Personal computer revolution was brought on by dropping prices on microprocessors: individuals could now afford computers. Economies of scale could be used for hardware and software, and the shrink-wrap software industry was born: software became a product.

The Web revolution happened because of faster and cheaper networking. Internet infrastructure exploded, and users could access their data and applications from any place with a Net connection. Developers could deploy the software in minutes to the audience of millions, and software became a service.

What's next? With falling prices on hardware and bandwidth, management costs start to dominate. Many of the problems that today cost hours of human attention and a lot of money are preventable: incompatibilities and interoperability issues, software catastrophes (serious bugs and viruses), hardware catastrophes (lost data due to hard drive crashes).

PCs and the Web will evolve into the Cloud: a collection of computing resources that presents a uniform user experience and minimizes administration costs. Power grid has enormous complexity, but it is all hidden behind the interface of a power outlet: the right interfaces to computing will help simplify it and decrease the total cost of ownership.

Once the management problems go away and the programs can be hosted in the Cloud, the barrier to entry in software field will be dramatically lowered. Given the right tools, the right programming model and the right economic model, the market for small components is going to take off. The line between advanced users and software developers will be erased transforming software into a medium.

Wednesday, December 07, 2005

Simply better

Personal computing and network computing each have two major advantages over each other.

For personal computing (Windows) it is:

  • Disconnected operation. Your laptop is still useful when the network is inaccessible.
  • Performance. Local applications don't have to transfer data back and fourth over the network, and therefore can be much more responsive.
The advantages of network computing (the Web) are:
  • Synchronized state. Your data and your applications are centrally managed, and they are up-to-date no matter where your access them from.
  • Ease of deployment. An application on a website is deployed to millions of users who just need to follow a link to access it.
There is no reason why you can't have a system that offers all four. For example Mac OS X with its .Mac and Software Update components simplifies synchronization and deployment in some applications. There are obvious limitations: you can't take a worldwide search engine with you into the offline world, but you can keep a copy of your mailbox in case your webmail is inaccessible.

Maintaining synchronized state is a tricky problem in the programming models popular today, but the right framework can make writing "autosynced" applications easy. In combination with the framework for deployment similar to Java Webstart, a platform can offer the best of both worlds—"either or" trade-offs will become a thing of the past.

Friday, December 02, 2005

The three-letter word

The new top-level domain, .xxx, is in the news. But is its addition technically necessary? Branding arguments aside, why can't .xxx be replaced with a subdomain, such as .xxx.us?

The domain names are used for global identification of resources and navigation. Since they have to be typed only once, three extra characters don't make a significant difference. For machine-generated content, there is zero difference: after the site is bookmarked, it doesn't matter how long its URL is. There will be no perceivable loss in productivity if the address is slightly lengthened. (This is especially true in the context of .xxx domain.)

There is no point in abolishing .com today, but there is also absolutely no good technical reason to add new generic top level domains. Introducing .xxx will not make filtering adult content any easier, since adult sites will continue to operate under .com, with new ones springing up daily. What it will do is create a windfall of profits for registrars as another land grab starts—and not surprisingly it's the registrars who lobby ICANN.

Real progress on restricting children's access to adult content can be made if robust age verification becomes part of the protocol, which requires solving the Identity paradox. As for battles over the root zone, they can be resolved by simply freezing it in the current state, removing unnecessary central control and thereby starting to address the Responsibility paradox.

Thursday, December 01, 2005

Flat-fee world

Market research shows that people want ad-free services, but won't pay for them. They will pay for quality content instead: "People are not buying HBO because it doesn't have ads, they are buying it because they want to watch the Sopranos."

People buy World of Warcraft not because it doesn't have ads, but because it has quality content. Since Blizzard gets paid, it can hire professionals to develop quality content and publish it without ads. There is a feedback loop here; the main problem is bootstrapping—achieving a critical mass of subscribers. Bootstrapping can be accomplished given enough initial investment, assuming people indeed want to play the game.

Blizzard has a monopoly on authoring content for WoW; but there is no reason an open system can't work, with multiple competing content providers. A neutral party can collect money from subscribers—a flat fee for example—and distribute it among content providers according to usage patterns. As long as the fee distribution method is fair, this model is attractive to both users and publishers. The main problem, again, is bootstrapping and gaining a critical mass of users: this can be done by producing or licensing quality seed content.

Wednesday, November 30, 2005

Les 5 paradoxes du web

( Translation of Five paradoxes of the Web by anGel )

Le web est une plateforme formidable pour fournir des services et de l'information, mais il fait bien son âge. Les choix fondamentaux de conception qui étaient exacts au début du web commencent maintenant à exploser. Ici, nous ferons une tentative d'identification des problèmes impossibles à résoudre avec le web tel qu'il est actuellement.

Les problèmes du web sont bien connus et malheureusement considérés comme "aquis". Le mois dernier, en tant qu'utilisateur de la toile, j'ai du faire face au SPAM (dans mes BAL et sur mon blog), au denial-of-service et à un vol d'identité; sans parler des bugs divers. En tant que développeur web, j'ai du résoudre des incompatibilités de navigateurs et fournir des efforts disproportionnés par rapport aux travaux que j'essayais de finir. Et c'est de pire en pire.

Beaucoup de ressources sont mises en oeuvre pour combattre les problèmes qu'engendrent l'utilisation du web; mais cela ne résout pas les principaux paradoxes de la plateforme. Bien des industries prospèrent de nos jours uniquement grâce aux imperfections du web. C'est un bon indicateur du fait que la communauté des utilisateurs informatiques devrait investir dans une nouvelle infrastructure globale et s'attaquer aux problèmes fondamentaux avec les solutions fondamentales.

  • Tout est gratuit, mais rien n'est gratuit (Paradoxe de compensation)

    Beaucoup de services web sont gratuits pour les utilisateurs, parcequ'on ne peut pas facturer ces services - mais fournir ces services coute de l'argent. Ceci rend les business models insoutenables et exige de doser les ressources utilisées. Même sans limites explicites, les limitations matérielles et de bande passante du fournisseur conduisent souvent à des denial-of-service pendant les heures de pointes - ou pendant les attaques.

  • Nous ne savons pas qui vous êtes, pourtant il n'y a aucune intimité. (Paradoxe d'identité)

    Il n'existe pas de système d'identité globale: un site web ne peut pas vous accueillir par votre nom, sauf si vous avez renseigné un champ spécifique auparavant. Les mécanismes de gestion d'identité sont maladroits et conduisent parfois au vol d'identité. En même temps, il existe des moyens détournées d'envahir l'intimité d'un utilisateur: l'adresse IP, les cookies, les en-têtes de provenance, les 1-pixels GIFs" dans les emails.

  • Mêmes ligne de code, pourtant un rendu différent (Paradoxe de compatibilité)

    Développer des applications web exige de sacrifier un de ces trois ingrédients importants : capacité, compatibilité, ou vitesse de développement. Conduire des tests de compatibilité sur toutes les versions de navigateurs est un luxe que peu peuvent s'offrir. Peu importe si un navigateur est plus "compatible" que les autres; dans les faits, vous devez être compatible ou perdre des utilisateurs.

  • Le code parcourt le réseau, mais il n'est pas mobile (Paradoxe de frontière)

    Le web est asymétrique : il y a un client et il y a un serveur. Le client parle une langue (Javascript), le serveur en parle une autre (généralement pas Javascript). Pour traverser la frontière entre le client et le serveur, le code doit être traduit dans une autre langue. Peu importe la vitesse du réseau, la mobilité du code est donc limitée la vitesse de conversion du programmeur entre client et serveur.

  • Le web n'est pas assez décentralisé, pourtant il n'est pas assez centralisé (Paradoxe de responsabilité)

    La gestion DNS est centralisée; les autorités de certification sont aussi pour la plupart centralisées. Cette centralisation donne des monopoles aux organisations de contrôle, et les rend globalement vulnérables dans le même temps. Mais il n'y a personne à qui en réferrer quand une entité agit mal (ex: le spam), tant que les autorités du web n'acceptent pas d'engager la responsabilité des "citoyens" de la plateforme web.

Identifier les problèmes est la première étape necessaire. Ce blog va tenter d'explorer des solutions possibles, souvent plus radicales qu'autre chose. Penser hors de la bulle "web" est le seul moyen de vraiment progresser. Le bon plan, c'est que la prochaine plateforme gagnante résoudra chacun des 5 paradoxes du web.

In the trenches

Browser wars are still raging. Explorer Destroyer distributes scripts to harass IE users, and Google is paying a $1 bounty for every person who switches to Firefox because of this campaign:
IE users will see a friendly message stating that they need to download and install Firefox to view your site. This script will do the most good and make you the most money. Can you handle it? (At least try it for a day to see how good it feels.)
How good does it feel to deny users access to content based on their choice of a Web client, the choice that in corporate environments is often forced on users by system administrators who control every piece of software that is installed?

The Web browser market today is a zero-sum game: marketshare must be pried away, and even small gains now come with a big price tag. In stark contrast, competition for the next platform's users has not even begun: a relatively small investment buys a monopoly. Who will take advantage of this opportunity?

Tuesday, November 29, 2005

One percent of a hundred billion dollars

Computer crime is a big business:
"Last year was the first year that proceeds from cybercrime were greater than proceeds from the sale of illegal drugs, and that was, I believe, over $105 billion," McNiven told Reuters.
We lose billions of dollars because of Web security issues, and there are billions spent on fighting them. It will not get any better because of the fundamental design flaws in the platform; it will only get worse and cost more next year. Why not take one percent of the money that goes into patching the latest problems and do something proactive instead of reactive: hire the best security professionals in the world, peer review like mad, and come up with a state-of-the-art platform.

Let's not waste any time. The later design of the Web's successor starts, the more security problems it will have and the more money will be lost.