Azure vs. inhouse SQL Server - the Wiki project setup

Recently I have been trying to make sense of the cloud computing and specifically of Windows Azure.

I did some performance tests of SQL Azure and established some bottlenecks.

Luckily, it so happened that during a chat with Alan Smith, we decided to take on a project and compare the capabilities of an in-house SQL Server vs. SQL Azure.

To be honest, I felt very happy to accept the challenge!

Finally there would be a practical comparison of Azure and in-house SQL Server.

So here is the challenge:

I wrote to Alan: What is the main goal with the wiki project we are doing? Is it to have the best performing searches through the wiki articles for the best cost? Can we write a quick project objective just so we know we are on the same page?
Alan responded: My plan was to create a very basic text search by indexing the word counts in the articles, so that if you search for “computer” the top result will be the article with the most instances of the word “Computer” in it. I also plan to create a tag-cloud, where if you select “Computer” it will get the top 10 pages with “Computer” in it, and then create tags for other word based on popularity.
The goal is to have a quick and accurate search, so I can have a website in Azure that will show the search query time in seconds, like google does. It will also serve as a good scenario where table storage can be used for large data sets and give an idea of best practices when working with “big data” in Azure.

The idea is to take the entire data dump from Wiki: http://dumps.wikimedia.org/enwiki/latest/. The files are 27 and are between

enwiki-latest-pages-articles1.xml-p000000010p000010000.bz2
to
enwiki-latest-pages-articles27.xml-p029625017p037804211.bz2

The files are different sizes.

I am using a dual core Acer desktop with 8GB RAM (of which SQL Server 2012 is using 4) and some lousy home use HDD – I think it is Seagate at 7,200 RPM.

Let’s see how the challenge goes.

Stay tuned for more details. And here is the next part of Azure vs. inhouse SQL Server: setting up the database and importing the data.

 

 

1 comment to Azure vs. inhouse SQL Server – the Wiki project