As per the requirements of the Microsoft Scholars Program, I am to provide some suggestions for Bing’s products. I noticed that at MSRA, it seems that most people use Google. Why don’t they use Microsoft’s own Bing? Here are some technical comparisons.

Exact Matching Capability

The search keyword is “b4: experience with a globally-deployed software defined wan”, which is a paper from SIGCOMM 2013, announced around June.

In Google, only three keywords were entered, and the autocomplete has already come out. The first result is the paper itself, the second is the official SIGCOMM website, and all 10 results on the first page are relevant.

b4 experience with a globally-deployed software defined wan - Google Searchb4 experience with a globally-deployed software defined wan - Google Search

Bing search cannot autocomplete this keyword. After entering the full keyword and pressing enter, the first result is the same as Google’s. Out of the 10 results on the first page, only 3 are relevant.

By adding “SIGCOMM 2013” to the keywords, 9 out of the 10 results on Bing’s first page are relevant. It shows that Bing has indexed these web pages, but due to the lack of precise matching capability for long sentences, it couldn’t retrieve them.

b4 experience with a globally-deployed software defined wan - Bingb4 experience with a globally-deployed software defined wan - Bing

In search engines, long sentence matching is a difficult task because it’s hard to efficiently build an index. In traditional search engine technology, articles and queries are divided into “words”. Although the relevance ranking algorithm can consider the proximity of keywords, achieving the effect of whole sentence search is still difficult. I haven’t done research in this area, so it’s up to Bing’s colleagues to study :)

Complex Page Crawling Capability

Still taking the above paper as an example, the following link is the SIGCOMM 2013 schedule, listing all accepted papers.

http://conferences.sigcomm.org/sigcomm/2013/program.php

This page ranks second in Google search results, but it can’t be found in Bing no matter how you search. Directly entering this link, it turns out that Bing has indexed this page, but the cached page only has the navigation bar and no content. The problem is clear: the main content of this page is loaded by Ajax, and the crawler needs to simulate user clicks and execute JavaScript to get the main content of this page. Bing obviously didn’t do this. As more and more web pages are loaded with Ajax, if they can’t be effectively crawled, it will greatly affect the completeness of the search engine.

“Box Computing”

Search for “University of Science and Technology of China” in Google and Bing respectively:

University of Science and Technology of China - Google SearchUniversity of Science and Technology of China - Google Search University of Science and Technology of China - BingUniversity of Science and Technology of China - Bing

Obviously, Google’s search results look much better than Bing’s. On the right, there is structured information related to the University of Science and Technology of China, and the search results section also has “News for University of Science and Technology of China”, aggregating some news about the University of Science and Technology of China. Searching for formulas, stocks, weather, sports events, etc. in Google, there is structured information, which is very convenient. Here I borrow the term “box computing” from Baidu, which in principle is to display some structured information after matching some specific keywords. Baidu combines the display of structured information with search engine promotion, which not only makes users feel convenient, but also profits from it, and does not give people the excuse of “manipulating search results”. If Bing could display more structured information, the page wouldn’t look so monotonous.

Query Expansion

Continuing with the example of “University of Science and Technology of China”. In the search result page summary, Google highlighted both “University of Science and Technology of China” and “USTC”, proving that Google considers them as the same word, performing a query expansion. Bing only highlighted the four characters of “University of Science and Technology of China”. A deeper search can reveal that Bing also does query expansion, considering USTC, University of Science and Technology of China, and USTC as synonyms, but at least it didn’t do so on this page.

The failure of query expansion means that Bing does not have enough information when extracting page summaries, and the displayed page summaries seem to be randomly extracted sentences.

Page Weight and Content Quality

Continuing with the above example. Google search gives higher weight to wiki, encyclopedia-type websites, so the information obtained is more nutritious. As for why wiki, encyclopedia-type websites have higher weight, I don’t know… I don’t know if Bing uses PageRank (seems to be Google’s patent) or similar algorithms.

Chinese Word Segmentation

Below is a Chinese sentence query, it is said that someone has searched for some “discordant” content.**

Can MP3 be used after being washed with water - Google SearchCan MP3 be used after being washed with water - Google Search Can MP3 be used after being washed with water - BingCan MP3 be used after being washed with water - Bing

For such sentence queries, the quality of word segmentation is crucial. Bing segmented this sentence into “MP3 / was / washed / afterwards / can / not / can / use”, the effective query words are only MP3, wash, and several common words are interfering (it seems that Bing did not list these common words as stop words), the search effect is naturally not flattering.

To highlight the importance of word segmentation, input the sentence segmented according to Bing’s segmentation method into Google, and the search results plummeted.

MP3 was washed afterwards can not can use - Google SearchMP3 was washed afterwards can not can use - Google Search

As long as “can or cannot” is considered a word (i.e., “mp3 was washed afterwards can or cannot use”), Google’s search results basically return to their original state:

mp3 was washed afterwards can or cannot use - Google Searchmp3 was washed afterwards can or cannot use - Google Search

Bing Dictionary

Off-topic a bit, I also want to talk about a small problem with the Bing Dictionary. The advertisement below sometimes does not display, and what comes out is like a piece of the IE error page being cut off. It is recommended to replace it with a decorative image or simply leave it blank when the advertisement image cannot be loaded.

CaptureCapture

Comments