Things, as I learn...: March 2007

It might all started with screen scrapping of legacy systems. Screen scraping is a technique used match a legacy system user UI, as an interface for input to an newly developed system.

If its for legacy systems, then why its being applied to web sites these days. With the fast of growth in application development scenario, the web applications become legacy!

Web scrapping, could be termed as process of extracting a piece of information of your interest from a webpage online. Recently there have been significant work is going on the things that would require taking the webapps to the next level.

Web scrapping is lot easier than screen scrapping of legacy systems. The output of web apps is being a HTML code, which could be represented as an DOM tree, and could be navigated easily by machines/bots.

Yes its easier to navigate, but is it easier to locate an item of interest? Not actually. HTML code is mostly about styling, to say how the date may appear to the user. Usually a page will contain less data, and more styling demarcations added for proper presentations; like: <b> - for bold data, <u> - for underline. Other than these there is also lot of styling code is mixed with the potent data that webpage is showing up. So it's tougher for a machine to separate data, from style information.

GreaseMonkey might be the first tool released, which help people to customize the webpage on the client side. Like the next time you wont like the blue background on the MSN home page, you can change it before the website loads on your browser. It's simple in functionality, but you need to know DOM Structure (tree representation of the web page). Later people started posting their script on the web (http://userscripts.org/).

Chickenfoot is another recent tool on the rise. Writing script here doesn't need knowledge of DOM representation. Read my earlier post on this. I too had my hands on trying out these two, sometime back.

These are just the start of the road that reaches to our dream (Semantic Web). The Semantic Web is all about adding meaning to data, which is mingled with style information in various web sites. If a consultant puts his appointment list online. Then web crawler scanning it should make sense of it rather just seeing it as numbers and text; that it's a calendar data and it belongs to him.

A webpage is seen as propriety information of the owner of the website. Extracting a part of it, and using it elsewhere is a copyright or legal issue. But lately this kind of outlook is changing, atleast they are willing to share even if not giving it free. Some websites like Google Maps, Flickr, del.icio.us, Amazon are providing an alternative API that fetches the information, which you usually get only through browsing their web pages.

These alternate API are the way for bots, to extract the data they wanted out of the website. This is one step towards semantic web, where the data is presented in web as directly machine readable form. Here alternate via for reading the data is provided as API service. These API calls are generally SOAP calls, as part of web service. Even debate about using REST architecture/SOAP RPC architecture goes on. These API kind of interaction within an enterprise system, is called SOA (Service Oriented Architecture) when rightly modeled and built.

As more number of websites expose there data a webservice via API, the re-mix style of applications came online. They were called as mashups. Mashups is/are applications that are formed from mix-up of data from various other applications. They generally don't have data of their own; they rather mix up and form a complete view from others.

With API's it easier to extract data, than previously used method web scrapping, which heavily dependent on the current structure of the site. It breaks even for minor changes in layout or style change.

The mashups are extremely grown now. You could see almost new mashups forming every day. See this page, the programmable web. According to this source, right now there are 1668 mashup applications, 395 services are available as API, and almost 3 Mashups are constructed everyday.

Most of these API's are free, but some needs paid license. Amazon requires a special license if you need to use their book search API. But still if you could invite reasonable revenue to amazon via orders placed through your site, then you could make some money too.

On the marketing front, exposing your site data as API services definitely increases your chance of higher revenue than selling all of it by yourself. Say a local chinese portal gets your global data and show translated versions to their users. This increases your global reach. A popular site for classical music discussions, selling related artists' tracks right there have higher chance of getting sold than that of a show-case site of the record company.

A site that shows books listed by user personal interest, is rather lucrative than huge common-to-all showcase site. This kind of site is now easier to form, with two API services. One from the site maintaining user's personal interests, maybe manually collected preferences or even collected automatically by the users web browsing tastes. And other API calls to amazon books store.

If you planning to launch a GPS website, that pin points you position on this globe. Then you don't need to build the map of the world all by yourself, of course very tedious work. Alternate would be borrowing the map service from Google, and then overlapping your positions on the map.

So mashups are fun, faster, and fruitful too. :)

Today completed reading the book - Blink by Malcolm Gladwell, which I bought some months back.

Blink is about "power of thinking without thinking"

It's a book about rapid cognition, about the kind of thinking that happens in a blink of an eye. When you meet someone for the first time, or walk into a house you are thinking of buying, or read the first few sentences of a book, your mind takes about two seconds to jump to a series of conclusions. Well, "Blink" is a book about those two seconds, because I think those instant conclusions that we reach are really powerful and really important and, occasionally, really good.

How our brain without much conscious effort, rapidly analyzes the information, and favors some decision. This is the reason, we judge people by their look. Sometimes what we judge is right, and at sometimes is wrong. We are mostly unable to explain why we had a gut feeling like that.

Believe it or not, it's because I decided, a few years ago, to grow my hair long. If you look at the author photo on my last book, "The Tipping Point," you'll see that it used to be cut very short and conservatively. But, on a whim, I let it grow wild, as it had been when I was teenager. Immediately, in very small but significant ways, my life changed. I started getting speeding tickets all the time--and I had never gotten any before. I started getting pulled out of airport security lines for special attention.

- Author

After reading this book, you may have clear view of when to use this blink positively.

Tipping Point - is a previous book by the author. That's too a wonderful book about social interactions, and how a news becomes hit or miss. Who makes it hit? How it transcends. Tipping Point is basically the point at which if the influences are with right person, it will become hit, if it were under the other it goes to the other end.

Both the books are full of real and much happened social situations, and analysis of reasons behind them. That makes reading this book interesting, like: What made crime-rate increase or decrease? How graffiti on the trains influences mugging? The theory of broken window. How the not most obvious things play a greater role in the result?

Finally, before finishing this note, I just remember how I started reading the books of this author. It's started from the forward of this article - The Art of Failure (Why some people choke and others panic), from a friend.

Things, as I learn...

Wednesday, March 14, 2007

With mashups, webapps becoming legacy

Monday, March 12, 2007

Blink!

Recommended Blog Posts