Category Archives: Data Mining

Data Warehouse Manager

The Data Warehouse (DW) Manager provides leadership of all aspects of DW activities including oversight of the design and development of the new Warehouse, management of Current/Future reporting requirements, and oversight of the Extract, Transform and Load (ETL) processes.

The perceived strength of data warehousing within an organization will be the sum of the strength of the Project Managers. Project Managers must deliver commitments and must deliver on time. They will do this by culling resources from within the data warehouse team and from consultancy as necessary and establishing partnerships with other internal support organizations required to support a data warehouse iteration. A Project Manager delivers by:

• Maintaining a highly detailed plan and obsessively caring about the progress on it.
• Applying personal skill and judgment to everything on the project. This is a real value-add of the
Project Manager. It is the Project Manager’s job to exercise relevant discretion.
• Matching team member’s skills and aspirations as closely as possible to tasks on the plan.
• Tracking all relevant metrics for each iteration:
– Project Plan milestones
– Issues list
– Adherence to change control practices
– Adherence to source code control practices
– Documentation fit for users and support personnel
– Architectural components adherence to fit for purpose and standards
– Regression testing performed and tests updated based on changes
– Team members fit for tasks and career-enhanced

What do you mean by data warehouse? Explain the query manager.

Data Warehouse :

In computing, a data warehouse (DWDWH), or an enterprise data warehouse (EDW), is a system used for reporting and data analysis. Integrating data from one or more disparate sources creates a central repository of data, a data warehouse (DW). Data warehouses store current and historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons.

The data stored in the warehouse is uploaded from the operational systems (such as marketing, sales, etc., shown in the figure to the right). The data may pass through an operational data store for additional operations before it is used in the DW for reporting.

A data warehouse maintains a copy of information from the source transaction systems. This architectural complexity provides the opportunity to :

  • Congregate data from multiple sources into a single database so a single query engine can be used to present data.
  • Mitigate the problem of database isolation level lock contention in transaction processing systems caused by attempts to run large, long running, analysis queries in transaction processing databases.
  • Maintain data history, even if the source transaction systems do not.
  • Integrate data from multiple source systems, enabling a central view across the enterprise. This benefit is always valuable, but particularly so when the organization has grown by merger.
  • Improve data quality, by providing consistent codes and descriptions, flagging or even fixing bad data.
  • Present the organization’s information consistently.
  • Provide a single common data model for all data of interest regardless of the data’s source.
  • Restructure the data so that it makes sense to the business users.
  • Restructure the data so that it delivers excellent query performance, even for complex analytic queries, without impacting the operational systems.
  • Add value to operational business applications, notably customer relationship management (CRM) systems.
  • Making decision–support queries easier to write.

Query Manager : 

The query manager the system component that performs all the operations necessary to support the query management process. The system is typically constructed using a combination of user access tasks, data warehousing monitor tools , native database facilities and shell script.

The architecture of query manager perform following operations :

– Direct queries to the appropriate table

– Schedule the execution of the user queries

What is data mining? What are the functions of data mining? Write about association analysis with an example.

Data Mining :

Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information – information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.

Data mining is primarily used today by companies with a strong consumer focus – retail, financial, communication, and marketing organizations. It enables these companies to determine relationships among “internal” factors such as price, product positioning, or staff skills, and “external” factors such as economic indicators, competition, and customer demographics. And, it enables them to determine the impact on sales, customer satisfaction, and corporate profits. Finally, it enables them to “drill down” into summary information to view detail transactional data.

Functions of Data Mining :

1. Class Description
2. Association
3. Classification
4. Prediction
5. Clustering
6. Time-series analysis

Association Analysis :

The purpose of association analysis is to find patterns in particular in business processes and to formulate suitable rules, of the sort “If a customer buys product A, that customer also buys products B and C”.

Tip : If a customer buys mozzarella at the supermarket, that customer also buys tomatoes and basil.

Association analysis also helps you to identify cross-selling opportunities, for example. You can use the rules resulting from the analysis to place associated products together in a catalog, in the supermarket, or in the Web shop, or apply them when targeting a marketing campaign for product C at customers who have already purchased product A.
Association analysis determines these rules by using historic data to train the model. You can display and export the determined association rules.