Project Hercules: The source database setup

As mentioned earlier, the project Hercules is about building an incredibly well-performing Datawarehouse from a transactional database of a fictitious company.

In short, the transactional database model looks like this:

SchemaProd

How much and what kind of data is there:

Bids – 15 million bids by 1 million users

Categories – 300 with subcategories up to 3rd level

SoldItems – 8 million of the listing items

Users – 5 million unique ones

Countries – 10 countries

ListingItems – 10 million unique ones, 4 million allow bidding, 6 million do not.

 

And here is how the schema creation script looks like:

 

USE [HerculesProd]
GO
 
CREATE TABLE [dbo].[Bids]
    (
      [BidID] [int] IDENTITY(1, 1)
                    NOT NULL ,
      [ListingID] [int] NOT NULL ,
      [UserID] [int] NOT NULL ,
      [IP] [varchar](15) NULL ,
      [BidAmount] [money] NOT NULL ,
      [CreatedDate] [datetime] NOT NULL ,
      [DeletedDate] [datetime] NULL ,
      [MaxBid] [int] NULL ,
      CONSTRAINT [PK_Bids] PRIMARY KEY CLUSTERED ( [BidID] ASC )
    )
 
GO
 
CREATE TABLE [dbo].[Categories]
    (
      [CategoryID] [int] IDENTITY(1, 1)
                         NOT NULL ,
      [ParentCategoryID] [int] NULL ,
      [RootCategoryID] [int] NOT NULL ,
      [CategoryName] [varchar](255) NOT NULL ,
      [CategoryLevel] [smallint] NOT NULL ,
      CONSTRAINT [PK_Categories] PRIMARY KEY CLUSTERED ( [CategoryID] ASC )
    )
 
GO
 
CREATE TABLE [dbo].[Countries]
    (
      [CountryID] [smallint] IDENTITY(1, 1)
                             NOT NULL ,
      [CountryName] [varchar](50) NOT NULL ,
      CONSTRAINT [PK_Countries] PRIMARY KEY CLUSTERED ( [CountryID] ASC )
    )
 
GO
 
CREATE TABLE [dbo].[ListingItems]
    (
      [ListingID] [int] IDENTITY(1, 1)
                        NOT NULL ,
      [UserID] [int] NOT NULL ,
      [ShortDesc] [varchar](80) NULL ,
      [UsersRef] [varchar](600) NULL ,
      [CategoryID] [int] NOT NULL ,
      [StartDate] [datetime] NOT NULL ,
      [EndDate] [datetime] NOT NULL ,
      [Duration] [smallint] NULL ,
      [TotalBids] [smallint] NULL ,
      [Visits] [smallint] NOT NULL ,
      [ListingCreated] [datetime] NOT NULL ,
      [IP] [varchar](15) NULL ,
      [DeletedDate] [datetime] NULL ,
      [HasAuctionOption] [bit] NOT NULL ,
      [HasFixedPriceOption] [bit] NOT NULL ,
      [Quantity_start] [int] NOT NULL ,
      [Quantity_remaining] [int] NOT NULL ,
      [LastSoldItemDate] [datetime] NOT NULL ,
      CONSTRAINT [PK_ListingItems] PRIMARY KEY CLUSTERED ( [ListingID] ASC )
    )
 
GO
 
CREATE TABLE [dbo].[SoldItems]
    (
      [SaleTransactionID] [int] IDENTITY(1, 1)
                                NOT NULL ,
      [ListingID] [int] NOT NULL ,
      [SellerUserID] [int] NOT NULL ,
      [BuyerUserID] [int] NOT NULL ,
      [FinalValue] [money] NOT NULL ,
      [CreatedDate] [datetime] NOT NULL ,
      [DeletedDate] [datetime] NULL ,
      [IsSoldAsAuction] [bit] NOT NULL ,
      [FeeCreated] [bit] NULL ,
      CONSTRAINT [PK_SaleTransactionId] PRIMARY KEY CLUSTERED
        ( [SaleTransactionID] ASC )
    )
 
GO
 
CREATE TABLE [dbo].[Users]
    (
      [UserID] [int] IDENTITY(1, 1)
                     NOT NULL ,
      [Alias] [varchar](50) NOT NULL ,
      [Email] [varchar](255) NOT NULL ,
      [FullName] [varchar](50) NOT NULL ,
      [Address] [varchar](50) NOT NULL ,
      [City] [varchar](36) NOT NULL ,
      [Zipcode] [varchar](20) NOT NULL ,
      [Total_Rating] [int] NULL ,
      [DateCreated] [datetime] NOT NULL ,
      [CountryID] [smallint] NULL ,
      [Bid_Rating] [smallint] NULL ,
      [DeletedDate] [datetime] NULL ,
      [FirstName] [varchar](30) NOT NULL ,
      [LastName] [varchar](30) NOT NULL ,
      CONSTRAINT [PK_UserInfo] PRIMARY KEY CLUSTERED ( [UserID] ASC )
    )
 
GO
 
ALTER TABLE [dbo].[Bids]  WITH CHECK ADD  CONSTRAINT [FK_Bids_ListingItems] FOREIGN KEY([ListingID])
REFERENCES [dbo].[ListingItems] ([ListingID])
GO
ALTER TABLE [dbo].[Bids] CHECK CONSTRAINT [FK_Bids_ListingItems]
GO
ALTER TABLE [dbo].[Bids]  WITH CHECK ADD  CONSTRAINT [FK_Bids_UserInfo] FOREIGN KEY([UserID])
REFERENCES [dbo].[Users] ([UserID])
GO
ALTER TABLE [dbo].[Bids] CHECK CONSTRAINT [FK_Bids_UserInfo]
GO
ALTER TABLE [dbo].[ListingItems]  WITH CHECK ADD  CONSTRAINT [FK_ListingItems_Categories] FOREIGN KEY([CategoryID])
REFERENCES [dbo].[Categories] ([CategoryID])
GO
ALTER TABLE [dbo].[ListingItems] CHECK CONSTRAINT [FK_ListingItems_Categories]
GO
ALTER TABLE [dbo].[ListingItems]  WITH CHECK ADD  CONSTRAINT [FK_ListingItems_UserInfo] FOREIGN KEY([UserID])
REFERENCES [dbo].[Users] ([UserID])
GO
ALTER TABLE [dbo].[ListingItems] CHECK CONSTRAINT [FK_ListingItems_UserInfo]
GO
ALTER TABLE [dbo].[SoldItems]  WITH CHECK ADD  CONSTRAINT [FK_SoldItems_BuyerUserInfo] FOREIGN KEY([BuyerUserID])
REFERENCES [dbo].[Users] ([UserID])
GO
ALTER TABLE [dbo].[SoldItems] CHECK CONSTRAINT [FK_SoldItems_BuyerUserInfo]
GO
ALTER TABLE [dbo].[SoldItems]  WITH CHECK ADD  CONSTRAINT [FK_SoldItems_ListingItems] FOREIGN KEY([ListingID])
REFERENCES [dbo].[ListingItems] ([ListingID])
GO
ALTER TABLE [dbo].[SoldItems] CHECK CONSTRAINT [FK_SoldItems_ListingItems]
GO
ALTER TABLE [dbo].[SoldItems]  WITH CHECK ADD  CONSTRAINT [FK_SoldItems_SellerUserInfo] FOREIGN KEY([SellerUserID])
REFERENCES [dbo].[Users] ([UserID])
GO
ALTER TABLE [dbo].[SoldItems] CHECK CONSTRAINT [FK_SoldItems_SellerUserInfo]
GO
ALTER TABLE [dbo].[Users]  WITH CHECK ADD  CONSTRAINT [FK_Users_Countries] FOREIGN KEY([CountryID])
REFERENCES [dbo].[Countries] ([CountryID])
GO
ALTER TABLE [dbo].[Users] CHECK CONSTRAINT [FK_Users_Countries]
GO

As a next step, I will be using the SQL Data Generator by Redgate to populate data in the database for the past 24 months.

 

Work on itWell, now...OKGoodGood job! (No Ratings Yet)

If you give a rating less than the maximum, you must leave a comment with suggestions on how to improve the post. :)

Loading ... Loading ...

 

Project Hercules: delivering value to business, part 1

It has been quiet for a while on this blog, however a new project is on the way: project Hercules.

The idea is to develop a sample project of analysis techniques by using SQL Server 2012 (the BI stack) and show different ways of effectively adding value to business.

The data setup is simple (here are some business rules):

  • a fictitious company with its data is created for the purpose of blogging.
  • the company owns a retail site where several million users have login profiles
  • users can sell items to each other
  • the items can be sold by bidding, by fixed price or by volume (there are volume discounts)

The goal of this blog series is to show how an effective data analysis can provide insights into business decision making and how to track and channel user behaviour.

Some of the data analysis techniques described will be:

  • cohort analysis – segregating the users by their activity over a defined time periods and comparing the results
  • RFM segmentation – RecencyFrequencyMonetary segmentation based on user activity and before and after analysis from marketing promos
  • predictive analysis of user behaviour
  • financial reporting over various time periods with sliding windows, medians and aggregations
  • others…

The main focus of the blog series will be the technology part, and specifically the business insights available by using SSAS in SQLServer 2012. Also, data visualization techniques will be a main task.

In the next blog post of the series I will describe the database model of the transactional database and some techniques for offloading data to the DW.

Later on I will discuss the DW model itself, the pre-aggregations and the building of the cubes.

Work on itWell, now...OKGoodGood job! (1 votes, average: 5.00 out of 5)

If you gave a rating less than the maximum, you must leave a comment with suggestions on how to improve the post. :)

Loading ... Loading ...

 

The Server is too old, really?

image001

 

This is one of the greatest SQL Server notifications I have ever seen in the past years. The funny thing is that I got this notification while I was using SSMS 2012 and was trying to replicate some data from 2012 to 2012.

Work on itWell, now...OKGoodGood job! (No Ratings Yet)

If you give a rating less than the maximum, you must leave a comment with suggestions on how to improve the post. :)

Loading ... Loading ...