Things, as I learn...: 2007

Wednesday, December 05, 2007

How to get friendly urls

URLs are face of an website, the urls are indexed by search engines. They are the one links your website. if your urls change, you lose any search ranking advantage you have.

With the technology behind websites changes often, what happens to the urls?

The items list page of an shopping site changes like:

www.shopping.com/shoppinglist.html

www.shopping.com/shoppinglist.jsp

www.shopping.com/shoppinglist.php

www.shopping.com/shoppinglist.jsf

www.shopping.com/shoppinglist.seam

Also most urls are not readable, or too long.. with parameters being added to the url. http://shopping.com/list.jsp?itemid=1234

it would be more readable if it were: http://shopping.com/list/item/1234

the advantages here is not just readable urls, you are also abstracting our urls from implementation. it protects your technology changes, protects from http get (params in url) or http post implementation..

Samples:

http://mail.google.com/mail/#inbox

http://mail.google.com/mail/#sent

also you can have permalinks like http://shopping.com/deals/today

Enough talked on the advantages, let see how we can get that working.

Apache http server has a module named: mod_rewrite http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html which can transform between urls, by url rewriting. It's basically can match any expression on incoming url, and replace with different pattern. It uses regular expression to find and replace patterns.

There is a similar implementation in java, which is http://tuckey.org/urlrewrite/

These modules are efficient, and at the same time complex to learn. So always you could implement your own url rewrite module.

The web.xml can have servlet mapping with url-pattern=/ which acts as front controller and redirects to appropriate url.

There are some limitations in this approach.

to be continued..

Friday, September 14, 2007

Feeds and REST-ful URL Schemes

Feeds are also known as RSS. It was once a extra feature. But now its integral part of content sharing between sites. With improvements to Google Reader interface, its getting easier to read at one place rather scattering around with multiple tabs in browser. Further you can share the favored links as rss feeds. So integrating this content in your blog, or sharing with other friends is easier.

All this seems as a step towards semantic web.

While mentioning about feeds, full story feeds is what I prefer, as I don't have to come out of my feed reader to get the full information. Having just the topic in the feed is not worth the feed subscription. As Scoble said FaceBook is becoming an huge aggregator, Google is also catching up will lot of feed integration between their apps. Their own social network site (Orkut) is also catching up with feeds, and other face book like changes. Yahoo has a service named pipes, which helps to create custom feeds out of web-pages.

as the number of feeds grow, one thing I noticed id REST-Style urls are being used commonly. This might be because, rest-style url supports natural customization to feeds by adding words to the url which is intuitive rather a complex url,

somesite.com/feed/all

somesite.com/feed/history

somesite.com/feed/today

somsite.com/feed/userName/tag

All of them use this kinda of url rather complex query parameter style. Check google picasweb feed , del.icio.us feeds, etc

Though REST style in architecture is not widely adopted, rest-style URL is widely accepted by all and solving part of the problem in web.

Tuesday, June 12, 2007

Faking the performance

Read this article to know, how applications fake their performance report.

http://blogs.msdn.com/oldnewthing/archive/2005/03/11/394249.aspx

Applications by adding part of their work to system start-up, terribly slow the start-up time. This is evil. The user loses time, even if he is not gonna use that application. Applications that want to gain such performance benefit should atleast consider using idle CPU cycles to do so, thereby the wait time for user to start with his work would be less.

The application should have a process, which monitors the cpu usage and triggers the appropriate pre-loader program for its fast start-up functioning. If more applications want to preload their application, then there would be more similar processes that runs just to monitor free cpu cycle.

If the OS provides an asynchronous loader (which loads the registered components when it finds idle cpu time) it would be easier further. This shifted work from the application to OS will benefit, not as it was in earlier case.

Wednesday, March 14, 2007

With mashups, webapps becoming legacy

It might all started with screen scrapping of legacy systems. Screen scraping is a technique used match a legacy system user UI, as an interface for input to an newly developed system.

If its for legacy systems, then why its being applied to web sites these days. With the fast of growth in application development scenario, the web applications become legacy!

Web scrapping, could be termed as process of extracting a piece of information of your interest from a webpage online. Recently there have been significant work is going on the things that would require taking the webapps to the next level.

Web scrapping is lot easier than screen scrapping of legacy systems. The output of web apps is being a HTML code, which could be represented as an DOM tree, and could be navigated easily by machines/bots.

Yes its easier to navigate, but is it easier to locate an item of interest? Not actually. HTML code is mostly about styling, to say how the date may appear to the user. Usually a page will contain less data, and more styling demarcations added for proper presentations; like: <b> - for bold data, <u> - for underline. Other than these there is also lot of styling code is mixed with the potent data that webpage is showing up. So it's tougher for a machine to separate data, from style information.

GreaseMonkey might be the first tool released, which help people to customize the webpage on the client side. Like the next time you wont like the blue background on the MSN home page, you can change it before the website loads on your browser. It's simple in functionality, but you need to know DOM Structure (tree representation of the web page). Later people started posting their script on the web (http://userscripts.org/).

Chickenfoot is another recent tool on the rise. Writing script here doesn't need knowledge of DOM representation. Read my earlier post on this. I too had my hands on trying out these two, sometime back.

These are just the start of the road that reaches to our dream (Semantic Web). The Semantic Web is all about adding meaning to data, which is mingled with style information in various web sites. If a consultant puts his appointment list online. Then web crawler scanning it should make sense of it rather just seeing it as numbers and text; that it's a calendar data and it belongs to him.

A webpage is seen as propriety information of the owner of the website. Extracting a part of it, and using it elsewhere is a copyright or legal issue. But lately this kind of outlook is changing, atleast they are willing to share even if not giving it free. Some websites like Google Maps, Flickr, del.icio.us, Amazon are providing an alternative API that fetches the information, which you usually get only through browsing their web pages.

These alternate API are the way for bots, to extract the data they wanted out of the website. This is one step towards semantic web, where the data is presented in web as directly machine readable form. Here alternate via for reading the data is provided as API service. These API calls are generally SOAP calls, as part of web service. Even debate about using REST architecture/SOAP RPC architecture goes on. These API kind of interaction within an enterprise system, is called SOA (Service Oriented Architecture) when rightly modeled and built.

As more number of websites expose there data a webservice via API, the re-mix style of applications came online. They were called as mashups. Mashups is/are applications that are formed from mix-up of data from various other applications. They generally don't have data of their own; they rather mix up and form a complete view from others.

With API's it easier to extract data, than previously used method web scrapping, which heavily dependent on the current structure of the site. It breaks even for minor changes in layout or style change.

The mashups are extremely grown now. You could see almost new mashups forming every day. See this page, the programmable web. According to this source, right now there are 1668 mashup applications, 395 services are available as API, and almost 3 Mashups are constructed everyday.

Most of these API's are free, but some needs paid license. Amazon requires a special license if you need to use their book search API. But still if you could invite reasonable revenue to amazon via orders placed through your site, then you could make some money too.

On the marketing front, exposing your site data as API services definitely increases your chance of higher revenue than selling all of it by yourself. Say a local chinese portal gets your global data and show translated versions to their users. This increases your global reach. A popular site for classical music discussions, selling related artists' tracks right there have higher chance of getting sold than that of a show-case site of the record company.

A site that shows books listed by user personal interest, is rather lucrative than huge common-to-all showcase site. This kind of site is now easier to form, with two API services. One from the site maintaining user's personal interests, maybe manually collected preferences or even collected automatically by the users web browsing tastes. And other API calls to amazon books store.

If you planning to launch a GPS website, that pin points you position on this globe. Then you don't need to build the map of the world all by yourself, of course very tedious work. Alternate would be borrowing the map service from Google, and then overlapping your positions on the map.

So mashups are fun, faster, and fruitful too. :)

Monday, March 12, 2007

Blink!

Today completed reading the book - Blink by Malcolm Gladwell, which I bought some months back.

Blink is about "power of thinking without thinking"

It's a book about rapid cognition, about the kind of thinking that happens in a blink of an eye. When you meet someone for the first time, or walk into a house you are thinking of buying, or read the first few sentences of a book, your mind takes about two seconds to jump to a series of conclusions. Well, "Blink" is a book about those two seconds, because I think those instant conclusions that we reach are really powerful and really important and, occasionally, really good.

How our brain without much conscious effort, rapidly analyzes the information, and favors some decision. This is the reason, we judge people by their look. Sometimes what we judge is right, and at sometimes is wrong. We are mostly unable to explain why we had a gut feeling like that.

Believe it or not, it's because I decided, a few years ago, to grow my hair long. If you look at the author photo on my last book, "The Tipping Point," you'll see that it used to be cut very short and conservatively. But, on a whim, I let it grow wild, as it had been when I was teenager. Immediately, in very small but significant ways, my life changed. I started getting speeding tickets all the time--and I had never gotten any before. I started getting pulled out of airport security lines for special attention.

- Author

After reading this book, you may have clear view of when to use this blink positively.

Tipping Point - is a previous book by the author. That's too a wonderful book about social interactions, and how a news becomes hit or miss. Who makes it hit? How it transcends. Tipping Point is basically the point at which if the influences are with right person, it will become hit, if it were under the other it goes to the other end.

Both the books are full of real and much happened social situations, and analysis of reasons behind them. That makes reading this book interesting, like: What made crime-rate increase or decrease? How graffiti on the trains influences mugging? The theory of broken window. How the not most obvious things play a greater role in the result?

Finally, before finishing this note, I just remember how I started reading the books of this author. It's started from the forward of this article - The Art of Failure (Why some people choke and others panic), from a friend.

Thursday, February 22, 2007

Net Neutrality

http://www.google.com/help/netneutrality.html

Today the Internet is an information highway where anybody – no matter how large or small, how traditional or unconventional – has equal access. But the phone and cable monopolies, who control almost all Internet access, want the power to choose who gets access to high-speed lanes and whose content gets seen first and fastest. They want to build a two-tiered system and block the on-ramps for those who can't pay.

Eric Schmidt (Google)

Many industry leaders including Tim Berners Lee, had spoken on this.

So far only some of the states USA have approved this bill. (Maryland, California, etc.,)