SCALABLE SCHEDULING OF UPDATES IN STREAMING DATA WAREHOUSES

SCALABLE SCHEDULING OF UPDATES IN STREAMING DATA WAREHOUSES

We discuss update scheduling in streaming data warehouses, which combine the features of traditional data warehouses and data stream systems. In our setting, external sources push append-only data streams into the warehouse with a wide range of interarrival times. While traditional data warehouses are typically refreshed during downtimes, streaming warehouses are updated as new data arrive. We model the streaming warehouse update problem as a scheduling problem, where jobs correspond to processes that load new data into tables, and whose objective is to minimize data staleness over time (at time t, if a table has been updated with information up to some earlier time r, its staleness is t minus r). We then propose a scheduling framework that handles the complications encountered by a stream warehouse: view hierarchies and priorities, data consistency, inability to preempt updates, heterogeneity of update jobs caused by different interarrival times and data volumes among different sources, and transient overload. A novel feature of our framework is that scheduling decisions do not depend on properties of update jobs (such as deadlines), but rather on the effect of update jobs on data staleness. Finally, we present a suite of update scheduling algorithms and extensive simulation experiments to map out factors which affect their performance.

Existing System:

This enables a real-time decision support for business-critical applications that receive streams of append-only data from external sources. Applications include:
. Online stock trading, where recent transactions generated by multiple stock exchanges are compared against historical trends in nearly real time to identify profit opportunities;
. Credit card or telephone fraud detection, where streams of point-of-sale transactions or call details are collected in nearly real time and compared with past customer behavior;
. Network data warehouses maintained by Internet Service Providers (ISPs), which collect various system logs and traffic summaries to monitor network performance and detect network attacks.

Proposed System:

There has also been work on supporting various warehouse maintenance policies, such as immediate (update views whenever the base data change), deferred (update views only when queried), and periodic.
However, there has been a little work on choosing, of all the tables that are now out-of-date due to the arrival of new data, which one should be updated next. This is exactly the problem we study in this paper.
Immediate view maintenance may appear to be a reasonable solution for a streaming warehouse (deferred maintenance increases query response times, especially if high volumes of data arrive between queries, while periodic maintenance delays updates that arrive in the middle of the update period).

Software Requirements:

.Net
Front End – ASP.Net
Language – C#.Net
Back End – SQL Server
Windows XP
Hardware Requirements:
RAM : 512 Mb
Hard Disk : 80 Gb
Processor : Pentium IV

FUTURE ENHANCEMENT:

The main feature of our framework is the ability to reserve resources for short jobs that often correspond to important frequently refreshed tables, while avoiding the inefficiencies associated with partitioned scheduling techniques.


Comments are closed.