Technology Trends

Google

After PageRank, Here Comes LexRank

Today, if you want to know what’s going on in the world, you can watch TV, read your newspaper or use Internet to browse news sites. But imagine a day when you just have to enter a few words on your computer, such as “Olympic Games,” push a button, and be able to read an automatic — and accurate — summary of what appears in major sources about this specific subject. This is the goal of a project which started at the University of Michigan and is explained by Technology Research News in “Summarizer ranks sentences.” This new multi-document summarization technique, named LexRank, searches similarities among sentences and rates them via a concept of ‘prestige score’ analogous to the one used by Google’s PageRank. “In a sense, sentences vote for each other just by virtue of being similar to each other,” said one of the researchers. This algorithm may also be applied to automatic translation and question answering in a year or two. Read more…


Let’s start with a description of the project.


Researchers from the University of Michigan have developed a multi-document summarization technique that compares sentences and has the effect of sentences voting for the most important among them. The method, dubbed LexRank, combines the content-sorting concepts of prestige and lexical similarity to find the most important sentences in a group of documents on the same subject.

Algorithms that use prestige to sort information have been around since the ’90s. It is possible to find the most prestigious, or popular member of a network by analyzing the relationships among network members. In a social network, for example, the most prestigious individual can be identified by analyzing the social relations among all pairs of members of the group.

Now, let’s look in more details at how the LexRank algorithm uses similarities among sentences.


The researchers’ lexical centrality algorithm compares the lexical similarity of sentences. “Lexical similarity can be thought of as a measure of the word overlap between two sentences,” said Gunes Erkan [, one of the researchers.] “For example, ‘Bush went to China’ and ‘George Bush visited China’ are fairly similar in a lexical way [but] ‘Bush visited China’ and ‘Blair is the prime minister of the United Kingdom’ have no overlap at all,” he said.

The researchers’ system considers a sentence important if it is similar to many other sentences and if those other sentences are themselves important. “In a sense, sentences vote for each other just by virtue of being similar to each other,” said Dragomir Radev [, an assistant professor at the University of Michigan.] “The sentences with the highest scores… are considered to contain the gist of the document and are presented as the multi-document summary,” he said.

This algorithm is already used for a Web-based news summarization site, NewsInEssence. Please note that this is an experimentation and that the site is not always on. If you cannot access it from the previous link, try this one.


LexRank could have some other usages.


The researchers are also looking for other uses of the lexical centrality algorithm. Possibilities include automatic translation and question answering, said Radev. The method could potentially find sentences that are likeliest to contain the answer to a given natural language question, or, in the biomedical domain, sentences that are most likely to contain important facts like particular protein interactions, said Radev.

The research work was presented in July 2004 during the Empirical Methods in Natural Language Processing (EMNLP 2004) conference held in Barcelona, Spain. Please check the EMNLP 2004 Proceedings if you’re inetrested in the subject.


And for more information, here are links to two technical documents about LexRank, “LexPageRank: Prestige in Multi-Document Text Summarization” (PDF format, 7 pages, 84 KB) and “LexRank: Graph-based Lexical Centrality as Salience in Text Summarization” (PDF format, 23 pages, 272 KB).


Will LexRank become one day as popular as PageRank is today? We’ll know it in a year or two.


Sources: Kimberly Patch, Technology Research News, April 20/27, 2005; and various websites


Related stories can be found in the following categories.



  • Databases

  • Google

  • Internet

  • Search

  • Software


Quarter Pounder Mac mini Cluster (with cheese)

Big companies usually don’t like when people such as you and me are criticizing their products or making fun of their brands. You probably remember the two British guys who posted some weeks ago a spoof advertisement on the Web suggesting Volkswagen’s cars were so tough they could withstand a suicide bombing. They were threatened by the company, which later agreed to drop action against them. Now, a group of Italian people has designed a single Web page making fun of three of the biggest brands on the planet: Apple, Google and McDonald’s.. Read more…


Let’s start with the image created by the people at red-lobster.it



So how these Italians designed this Quarter Pounder Mac mini Cluster — or ‘Photoshop’ cluster? They put together for just $2,000 four Mac minis to create the slowest supercomputer in the world, still reaching a teraflops/s — which is obviously impossible, but funny.


Recently, they connected their cluster to a Google Mini search appliance — which is also known online for looking like a block of cheese.


And here was born the “Quarter Pounder Mac mini Cluster (with cheese).”


What will happen next? Will armies of U.S lawyers sue these Italian guys? Or will Apple, Google and McDonald’s smile? It’s hard to know.


And just for fun, here is a link to a list of trademarks owned by McDonald’s Corporation. Some of these trademarks are “Hamburger University” or “Super Size.”


Sources: Various websites


Related stories can be found in the following categories.



  • Apple

  • Google

  • Humor


Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!