Cool online enterprise search demos

A new site called Open Test Search has created demos of several enterprise search engine, and allow you to test them out online. Have a look on http://www.opentestsearch.com/ .

Posted in Uncategorized | Leave a comment

China launching state-run Google competitor “Panguso”

China is definitely going there own way when it comes to search. Already data from China research firm iresearch.cn showed that local China search engine Baidu had 73% of the Q4 2010 China search engine revenue.

Today the plot thickens when the state-run China Mobile and state-run news agency Xinhua together launched Panguso. A new, Chinese internet scale search engine.

This comes only days after the Chinese government agency “State Administration for Industry and Commerce” open what appearer to be Chinas first anti-monopoly investigation into Baidu businesses practices, after request from Chinese encyclopedia website, Hudong.com. Baidu may have lost the governments favor.

Posted in Web search | Comments Off

Google Search Appliance going for the cloud

Google have released a new version of the Search Appliance that have a “Cloud Connect” featur that enables unified search of your Google Docs, Google Sites and Twitter from the GSA. This can then be merged with information about the people in the organization taken from LDAP/Microsoft AD and more traditionally sources like file en email.

The Google Sites function is especially interesting because it allow you to make vertical search engines using the Google index. For example one can create a collection of blogs and industry websites and see the results in the GSA. All this without having to crawl then yourself. One could properly also add the whole Google index as a Google Sites and display Google results together with your own. Great for searching for technical documentation that may have newer versions on the web.

Read more at http://googleenterprise.blogspot.com/2010/10/new-google-search-appliance-bridge-to.html

Posted in Enterprise search | Comments Off

Is Exalead moving forward? Got money from Dassault Systèmes

I have been a fan of Exalead for a long time. They have great technology normally not available in web search. Like support for queries with regular expressions, wildcard, phonetic search and proximity search. Se http://www.exalead.com/search/web/search-syntax/ for full list.

Exalead is one of the few organizations that cold compete with Google if they wanted. Recently they have been acquired by Dassault Systèmes. Lest cross our fingers in hope that this mean they cold have the funding to make a run at the global search marked. Even if they only managed to grab a 1% market share, they would have 1.5% of Google’s money. With is a lot.

Unfortunately it is little public statistics about the size of their marked share, but they appear to be big in France.

Read more about what is happening  to Exalead her: http://blog.exalead.com/2010/07/26/exalead-at-the-forefront-of-search-and-innovation-in-europe/

Posted in Enterprise search | 1 Comment

Markov Chain spam

Markov chains software is getting popular by search engine spammers to create pseudo random text that is unique. Using a complicated random technique with the property that the next state depends only on the current state, like this:
They can then hire an Indian “seo” company from ebay to create thousands of blogs that contain this texts.
The Markovs chain software have reasently gotten quite good. For example the paragraph below is automatically generated.
Captcha generally (but not always)solve the problem of comment and other spam. But this comes at a price. Users with low visibility and other disablities find solving captcha hard. And blind users cant solve it unless you provide an alternative audio captcha. Why, even Seth hates it!

I am not sure even humans can decide with 100% assurance that this is spam. I have been fighting this for a while. Recently come across this thesis writen by Ben O’Connor http://maths.dur.ac.uk/Ug/projects/library/CM3/000424248r.pdf short version: http://www.fmnetwork.org.uk/files/spam.pdf . We are looking into implementing it. If we have any luck I will post en update here.

Further reading about Markov chains and it usage is search engine spam:

Markov Chains [Spam that Search Engines like - Pt 1]: http://en.kerouac3001.com/markov-chains-spam-that-search-engines-like-pt-1-5.htm
Posted in Web search | Comments Off

AlltheWeb going down

Those that have been in the search industry a while probably remember AlltheWeb, the internet arm of Norwegian enterprise search company Fast Search & Transfer. AlltheWeb newer really took off, but did give Google a run for its money. Sin’s 2004 it has been own by Yahoo, but have had some kind of independent life for itself. Apparently using the Yahoo index, but displaying different search result, and having some more tools.

Today it appears that this is coming to an end. All searches on AlltheWeb.com is now being redirected to search.yahoo.com .

I think it is sad seeing the last remainder of this internet pioneer disappearing.

Posted in Web search | Comments Off

Detecting metasearch

From time to time I see someone proclaiming there “new” search engine. For me, working with search technology it is interesting to know if this is a real new search engine, based on ther own technology, or just a metasearch of Google/Yahoo/Bing.

To test for this you can search for “your ip” in the search engine. The search results will then show pages that shows the ip address of the visitor. For a search engine result page this is the ip address of the crawler boot.

Oh behold, the ip belongs to Google Inc: http://whois.domaintools.com/66.249.71.77 . Meaning this is metasearch of Google. Normally not so interesting for me.

Posted in Tip | Comments Off

Is it hard to write your own search engine?

From time to time I meet developers that are contemplating writing ther own search engine from scratch.

At list writing a successful web search engine is hard. It is like doing many startups at ones. There are currently startups working on labeling spam, on data clusters, on cloud storage, distributed search and hardware monitor.

As a search company you will have to do all thus part, and preferably be as good as the big players. Anna Patterson hav written an article a good about this. It is from 2004 but I still feel it is relevant.

Anna Patterson, Why Writing Your Own Search Engine Is Hard:
http://queue.acm.org/detail.cfm?id=988407

There is also a thread on this at Sirdf: http://www.sirdf.com/forum/viewtopic.php?t=5

Posted in Search technology | Comments Off

Microsoft: Drops *nix support in next Fast ESP release

Bjørn Olstad, CEO at Fast posted yesterday that the Next version of Fast esp will not run on Linux or Unix:
http://blogs.msdn.com/b/enterprisesearch/archive/2010/02/04/innovation-on-linux-and-unix.aspx .

This leaves a lot of users without an upgrade path, and can be a great opportune for the competition.

Posted in Enterprise search | Comments Off