Oracle Data Warehousing Unleashed

Contents


- 4 -

Surefire Ways to Make Your Warehouse Fail


Do you need to read this chapter? Try this simple formula: If you think you don't, then you do! "It can't happen to me" is so overused it is a cliché. "It can't happen to me" is what everyone thinks before it happens to them. "It can't happen to me" is the first step down the road to project disaster.

On the other hand, a healthy respect for the potential for project failure and a little knowledge about where others have (repeatedly) run into trouble should help you choose a path to project success. You do not have to go down any of the nasty paths to project failure that have already been explored by others before you.

Just think of this chapter as a set of highway warning signs (see Figure 4.1) on some of the branches on the road to success: They read "dead end road," "bridge out," "swamp road," "ambush and car jacking ahead." Why would you want to go that way?

FIGURE 4.1. The data warehouse road to success.

On the other hand, maybe you would like to sour the career and/or take over the project of a colleague you don't like. Or maybe you have an uppity subordinate you can't get rid of? See if you can find a way to get them to try a few of the things in this chapter. (It might be best if you are subtle and can get them to think these ideas are their own, since they are really going to hate the results.)

But whatever you do, don't go here yourself, and be very suspicious of the motivations of anyone who seems to be trying to nudge you into making any of the mistakes mentioned in this chapter; maybe they just don't know any better--or maybe they read this chapter, too!

Take a look at Figure 4.2, this could be you!

FIGURE 4.2. This could be you.


NOTE: Some of the material in this chapter is presented tongue in cheek. I got the idea one day while I was thinking about how to write about data warehouse methodology. This led me to think about the fact that many data warehouse project failures can ultimately be traced back to some fairly common mistakes in methodology and practices--technology itself usually isn't the real problem.
Then I thought: "Some of these mistakes get repeated so often, you'd think people wanted to fail!" This led me to think: "Great, I'll write a chapter on how to make a warehouse fail. This chapter can underscore in a light-hearted way the `dark side' of the previous chapter on methodologies and project management." Please read on and have some fun; but notice, there is a serious message here as well. The real failure rate for data warehousing is very high, I'd hate to see that happen to you. Please, please avoid the pitfalls in this chapter, have a successful data warehouse, and "live long and prosper."


CAUTION: Realize from the note that this is tongue in cheek--DO NOT do these things!!! This chapter is all about what NOT to do!!!
If you are actually thinking of doing any of the nasty, negative things I suggest in this chapter to kill off a perfectly good project and ruin someone's career, shame on you! Not only that, but if you get caught doing it, guess who is going to do the suffering? Remember, I have let the cat out of the bag with this book, so you can no longer plead ignorance...
Remember, too, Master Yoda's excellent advice: "Beware the Dark Side, Luke. Once you start down that path, forever will it dominate your destiny."

What Is a Fatal Error (or What Is Project Success)?

Before we can talk about project failure, we need to talk about project success. A successful data warehouse project includes the following elements:

Obviously, there are many other criteria of success you might want your data warehouse to include. If you think about it, some of those criteria are actually restatements of the above general goals, or are details that support these more general goals. There may be some additional goals that are intrinsic to your particular situation (and some general ones I have no doubt overlooked).

In any case, a successful data warehouse should at least have all the above characteristics. Conversely, a warehouse project that fails to meet any of the above criteria is a failure, no matter what other qualities it may possess.

It really isn't difficult to create a failure. It may be painful and expensive, but not difficult. This chapter lists some surefire ways to squeeze out the positive characteristics listed above--if failure is your goal.

On the other hand, if you would like your data warehouse project to succeed, the mistakes discussed in this chapter should be avoided at all cost.

This chapter provides you a (relatively) short list of particularly virulent mistakes to avoid, not an exhaustive catalog of all the mistakes that were ever made or could be made. If you do avoid the mistakes in this chapter, your chances for project success are dramatically improved.

Good luck!

Fatal Errors--A Non-Exhaustive List

In this section of the chapter, I list and discuss a number of errors that seem to just keep getting repeated on one data warehouse project after another. Most of these errors have, historically, tended to be fatal. Despite the positive press being given to data warehouse success stories, there is an extremely high failure rate in data warehouse projects. Many of these failures are attributable to the errors listed here.

Neglect Executive Sponsorship

For a data warehouse project to succeed, it must have active, informed executive sponsorship. Executive sponsorship is not optional. Executive sponsorship is not "just for big projects." It is essential to all data warehouse projects.

For more information on executive sponsorship, see Chapter 3, "Methodology and Project Management," and see the discussion on the Magilla Gorilla Factor in Chapter 6, "Defining Your Data."

Avoid Sharing the Project with an End-User Project Manager

If you want your data warehouse project to fail, avoid sharing responsibility for the project between a technical project manager and an end-user project manager. If an end-user project manager is appointed in spite of your best efforts to the contrary, try to make sure they function in name only. Keep him or her as ignorant as possible, because that will help you keep the reins of power in your own able hands.

Many end-user managers will help you implement this strategy, because they often have been assigned this job on top of their existing one, and have limited time to invest in this "extracurricular activity."


TIP : To get the end user to opt out of active participation, sympathize with how much they have to do back at their usual job and offer to "take care of things" for them.


Because this person may be clever enough to recognize an effort to cut them out when they see it, you probably want to do this on a case-by-case basis, rather than proposing it as a blanket strategy. That way you are "just being helpful" rather than trying to make a run on their turf.


It may also be helpful to drop a lot of computer acronyms and generally spout out a lot of computing gobbledygook. You will know you are starting to succeed when the end user's eyes glaze over and/or they start to fidget and look at their watch.


Sharing project responsibility with an end-user project manager who has equal responsibility for and stake in the project success will go a long way toward creating project success. The end-user project manager will:


CAUTION: If the organization has been wise enough to appoint a capable individual as end-user project manager and has made it his full-time job, it will be difficult to induce project failure by cutting them out of the loop.


In the first place, this is likely to be an individual savvy enough to recognize an end-run when he sees it and capable enough to deal with it (that would mean trouble for you). In the second place, a person who has this as his full-time job will be measured by its success; such a person will be motivated to make it succeed and is, therefore, less likely to become passive and leave it in the hands of others.


In a case like this, you might be best advised to look elsewhere in this chapter for other ways to cause the project trouble, or failing that, you might just go ahead and let it succeed!


Limit End-User Involvement

Here's a mistake in which both IT/IS and the end-user community can share equally (and often do): Neither one really wants to deal with the other.

On the IT/IS side of the house, end users are a nuisance. They persist in speaking in business terms, and, worse, they talk about boring business events (instead of the latest technology). They can never make up their minds, and are always running off to deal with some business problem. Worst of all, they can't get their computer terms right. (NO! A mainframe is NOT a CPU!)

On the end-user side of the house, computer people are more than a nuisance, they are a pain. They persist in speaking in acronyms, and worse, they talk about (yawn) computer stuff (instead of how to run the business). They are constantly waffling on everything, and are always running off to fix some system that has crashed. Worst of all, they can't get their business terms right. (NO! Profit is NOT a nifty new software package!)

Let's step back for a moment and gain some perspective. The purpose of the data warehouse is to provide end-user decision makers with access to data that will facilitate their making decisions--at least that ought to be the goal. Data that facilities end-user decision making is by definition relevant, timely, comprehensible, and readily explored and manipulated--by the end users.

How, exactly, will a project team determine what is relevant, timely, comprehensible, and readily explored/manipulated by end users without their intense, continual involvement?

Virtually every successful data warehouse (or decision support application) I have ever seen involved intense end-user participation throughout the entire life of the project.

Many of the serious project problems that I have observed can be traced to failing to get hands-on user involvement in some aspect of the project.


NOTE: Project success is particularly affected by the degree to which end users are involved with their data and their tools from the outset. When users can see and handle the implementation of their requirements in prototype form and in pilot projects before committing to large scale development, the results tend to reflect real end-user requirements. That makes for happy customers and successful projects.

Use a Classic "Waterfall" Project Methodology

Want your warehouse project to fail? Want to be the object of derision and/or hatred by your clients? The classic "waterfall" type project methodologies are a perfect vehicle to achieve this dubious condition.

Insist on the project team using a classic "waterfall" type project methodology instead of the less formal, interactive, cyclical methodology recommended in this book (and used by most competent data warehouse practitioners today).

If you have operational experience that goes back for a while, the waterfall approach is probably what you are used to and understand. The waterfall is easier to represent in project management tools like Microsoft Project. If the classic waterfall life-cycle methodology was good enough for building an operational system (payroll or whatever), then it is good enough for data warehousing! Besides, you don't understand anything else, and neither does your manager....

The immediate result of using a waterfall methodology is that the project bogs down in endless quarrels with the clients, a blizzard of change requests, and protracted delivery, as unending meetings are held to "get things under control" and programmers struggle to implement nonsensical specifications that are not worth the paper on which they are printed.

The end result will likely be that the warehouse will not meet end-user needs--if it is ever deployed at all.


CAUTION: If you have followed the waterfall sign-off process rigorously, you will have the end-user's signature on meticulously detailed written specifications that turn out to not represent what the end user (and the business) need at all. Because you have their signature, you think you have them (that's what counts, isn't it?), but they will hate you, and the business needs of the organization will not be met.


Guess what this leads to on projects with the cost and visibility of a data warehouse?


Now suppose you would like a project that looks modern while still retaining the substance, disadvantages, and failure potential of a waterfall methodology. A good way to achieve this is to include on your project plan a few cosmetic "JAD sessions" and "prototyping" activities, while weighing the entire project down with excessive formal deliverables and formal sign-offs (including for the prototyping). This will create a lot of focus on process rather than on product, which will make all the less technically adept politicians and "meeting goers" on the team happy. They will have something to do that they think they understand, and it will give them a sense--although not necessarily the substance--of being in control. It will also regularly disrupt project progress with overhead to formalize the innumerable deliverables, delays for formal reviews, and then more delays because some key player or players are on vacation, in the hospital, or just too busy to sign off and get on with things.

Use a Technical Project Manager without Data Warehouse Experience

One of the surest ways to induce project failure is to appoint a project manager who does not understand and has no prior experience with data warehousing concepts and methodologies. Usually such a person will have managed operational system projects using a "waterfall" project methodology.

The considerations, methods, and CSFs (critical success factors) endemic to data warehousing are significantly different from those in operational systems. A veteran of operational projects will never figure out the differences in time to save the project. If you are trying to kill both the data warehouse and this guy's career, appointing him to do this job without extensive retraining and without step-by-step mentoring by someone who has gone this way before is a sure way to do it.

Freeze the Specification as Early as Possible

For poor results and truly dissatisfied clients, freeze the specifications as early as possible. Then limit or bar changes until the warehouse is delivered. If you permit changes at all, use a formal change control mechanism that requires sign-off by numerous persons and is as cumbersome as possible. This will "cover" you later when (not if) the end users don't like what they get. They signed off on it didn't they? What's the problem?

Early specification freezing is a characteristic of the waterfall methodology discussed previously in this chapter. It originated to limit change in an error of slow coding and batch compiling with low productivity programming tools. (And in a business climate that was less dynamic than the one we have today!)

Specification freezing is mentioned as a separate topic here because it is so ingrained in the thinking of "classic" project managers. It crops up again and again in project plans that have supposedly been built on "new methodologies." Specification freezing contributes a great deal to what is fundamentally wrong with the waterfall methodology for data warehousing.

Work Multiple Projects in Parallel

If you would like to make a mess out of your data warehouse, and create some new "islands of automation" while you are doing it, try implementing multiple data warehouse subject areas in parallel. Novice project teams will have enough challenge trying to find out how to do their own job, without having to coordinate with other teams. Implement your whole warehouse as a set of parallel projects and you will achieve nearly total chaos.

On the other hand, once you have a team that has successfully worked together to implement several data warehouse projects by doing one subject area at a time, you may be able to successfully work several small subject areas in parallel. Until then, do them one at a time. Your team is learning. One must learn to crawl before walking, and walk before running.

Simultaneously Implement Operational Systems and Related Warehouses

Implementing a new operational system? Design and implement the data warehouse that will draw data from that operational system at the same time.

At this point some readers are probably thinking: "You're making that up--nobody would attempt something that obviously doomed to failure!" Other readers are thinking: "But that makes sense: Implementing the operational system and the decision making system at the same time will provide full system capabilities at the outset."

Unfortunately, both parties would be wrong. People really do make decisions like that, and it really is catastrophic when they do.

Just for the record, the same end users are going to be key to both projects, and they cannot possibly spread themselves widely enough to play significant roles in both projects and do their normal jobs.

Furthermore, until the operational system is up and running, there is no data to extract and feed to the data warehouse. How are you going to do interactive exploration of end-user requirements and prototyping against data in a system that is not yet deployed? How are you going to understand the new coding structures that will be used (these are often not fully worked out until the operational system is fully deployed--so you will be shooting at a moving target), and translate them into a form that is usable in warehouse. How are the end users going to give you insights into a system that they are not yet using? (Oh! You'll find out from the manuals? Not!)

Remember, all operational systems (and the data they contain) are not alike. Until an organization has used an operational system for at least a year, it is not ready to implement any new system that depends on the new operational one.

Postpone or Avoid Metadata

The data warehouse environment is complex. To manage it, you need a central repository of information about the source and target databases, about the data cleaning and transformation rules, about the programs and processes that extract, move, clean, transform, sort, load, and analyze the data (including scheduling, versioning, data dependencies, process dependencies, automated error procedures, etc.), and much, much more. You MUST  be able to quickly analyze the impact of changes in the source, transformation, and target systems. You MUST  be able to quickly analyze the impact of component failure on down stream systems and databases.

If you do not use a metadata tool from the outset to capture and maintain your metadata, your project will bog down in information overload before you have even finished implementing, and it will be impossible to maintain it efficiently in production.

Of course, you can try to keep up with your metadata using a word processor or spreadsheet, but you will find this will cost a great deal in time without achieving satisfactory results. In the end, lack of a suitable metadata tool (and its disciplined use from day one) will inevitably result in cost overruns and an unmaintainable data warehouse.

Focus on Technology First

Although this mistake is most popular with IT/IS professionals, some consultants and end users get caught in it as well. There seems to be a fascination with technology evaluation (and on getting all that cool new technology on your resume).

If you want your data warehouse to succeed, this is a serious mistake. Focus first on the business needs the warehouse is to meet, and then on the technology and system architectures to meet these needs. There are several reasons for this:

If you have relatively little experience with data warehousing projects and technology, you may feel that you are in a catch-22: You must concentrate first on business requirements, yet you are supposed to use prototyping to establish those requirements, and prototyping requires tools. However, it is not as hopeless as all that.

The prototyping tools you need to explore business requirements need not be the tools that you use later in the project. You can use relatively inexpensive personal computer-based tools like Personal Oracle, Personal Express, Oracle Discoverer, and Sagent for early exploration of business requirements. Then you can move on to tools with larger capabilities (Oracle Server, Express Server) for pilot projects and larger data warehouses.


TIP: You can frequently get evaluation copies of data mart class tools from software vendors. These tools are typically characterized by a relatively high level of user-friendliness and productivity. They may be easy to install, learn, and use. This could be perfect for an early exploration of your business requirements based on sample data from the subject source systems.


Using the tool in this way will give both the technical and end-user side of the project a good feel for the issues they must consider in data warehousing while providing a good view of how well that particular tool might support those needs.
Even if these tools will not support the volume of a full-sized data warehouse, they may still prove useful for implementing data marts that are ancillary to the larger data warehouse.




CAUTION: If possible, avoid a 30-day trial license. Try for 60- to120-day licenses instead. A 30-day trial is not enough to get a particularly good feel for the product or for your first cut at exploring requirements.


Even if you have a schedule that says you will be finished in 30 days, there are just too many things that can happen to introduce delay. Then you will be high and dry at a critical phase; and at the very least, there will be further delays while you deal with the problems of getting a license extension. I have seen nice, tidy little projects go into a budget tailspin over nothing more than waiting (in some cases weeks) for vendors to provide a new "unlock key" for an evaluation copy of their software that had reached its too short expiration date. Don't think it can take weeks to get a new key? Think again! The person in charge of it is on vacation (or too busy selling to someone else), the fall back person is in Canada doing who knows what, and no one is sure who has the authority to back them up. Then voice mails and conference calls get missed, e-mails are lost, time zone differences create misunderstandings, and so on, ad nauseam. Trust me, this is not the voice of theory speaking; it is the voice of experience.


Create Your Own Data Warehouse Software

This is a special case of "putting technology first." It turns out that a lot of IT/IS personnel (and some consultants too!) are budding systems software developers who would far rather create a metadata repository, extract and transform package, or OLAP front-end tool than use an existing one.

There is always some reason for this, such as the perennial favorite: "our needs are so unique." Baloney. This is just the not-invented-here syndrome at work.

The other motivation for rolling one's own seems to be the wistful idea that it will save money. "The programmers are already on staff and think they can knock it out in a week or two, and there isn't anything extra in the budget for buying the software...."

When engaged in a buy-versus-build decision, you should consider the following:

This is not to say that building is always the wrong answer. Sometimes it is necessary, but this is rare indeed.


TIP: A good compromise between buying and building might be to buy an extensible product instead of the tool you are thinking of building.
For example, if you need some custom transformation algorithms for the data going into your data mart and no available tool supports it, consider a tool like Sagent (from Sagent Technology, Inc.) that will let you provide custom add-on algorithms written in C++, Visual Basic, etc., rather than creating an entire new tool from scratch.


In the same way, if you need more metadata capabilities than are available from existing repositories, pick one that is extensible and that uses a standard RDBMS (preferably Oracle!) as a repository engine. This way, you might be able to add support for the additional metadata you need without having to build an entire metadata repository yourself. An alternative might be to pick another class of tool (such as a scrub/transform tool) that has an extensible, embedded metadata repository. Many of these tools are establishing partnerships with third-party products such as front-end query tools that extend their capabilities. For instance, Carleton and Informatica have these alliances. Both have extensible metadata repositories that can be accessed by a variety of front-end tools. This means that you can customize their repository by adding Oracle data structures, and be able to browse it from a standard front end, such as from Business Objects or Cognos.




TIP: Maybe you have been dying to get rid of that pesky IT person or CIO that you can't stand. Maybe you can talk them into a really bad decision to build instead of buy. You need to be careful about this, because you don't want to go down with him (or her). However, perhaps you can get them to grind the data warehouse project to a halt with an in-house software development effort that takes months while the CEO stews. Perhaps if you could do it and still come up smelling like a rose yourself, it would be worth it.

Rely on Unproved Technology

Rely on unproved technology--it will be the wave of the future (someday). Make a new vendor with an unproved product and a limited or non-existent customer base the centerpiece of your project. Or, at the very least, put a beta version of an existing product on your critical path.

Rely on "Silver Bullet" Technology

If you can just find the right "silver bullet" technology, you will be saved from having to deal with the need for data warehouse specific methodology, planning, resources, executive sponsorship and communication, rigorous and daily work with end users, personnel with real data warehouse experience, etc.

The right "silver bullet" may even save you from having to plan for and put in place adequate hardware and software infrastructure during the development phase of the project.

The right "silver bullet" will soon sweep the industry due to its ability to work these miracles, and there you will be, riding the wave with it on your resume!

Everybody knows this is a fallacy, but people fall for it every day. They would rather believe the latest P.T. Barnum (read: the latest technology fad, or maybe a technology salesperson with a mission) than they would Frederic Brooks.


WAR STORY: A few years ago, I got to witness a particularly serious case of the technology "silver bullet" syndrome. (I am pleased to say that I was only an observer in this fiasco, not a participant.)


A new technology fad was raging, and one of the more outspoken technologists in a large company really caught the bug. The particular technology with which he had become infatuated would save us from all ills and from all effort. No more tedious application of sound system engineering principles, no more worries about everything from business requirements to capacity planning. Every meeting that attempted to deal systematically with hard system issues was sidetracked by evangelical calls to adopt this miracle technology and be saved.


Eventually this wonderful technology and its proponent (some people called him "Mr. Silver Bullet") found a powerful sponsor in the IT/IS political arena. The deal was done. Over a million dollars was spent in direct software costs, and additional money was spent on hardware, training, and consulting. Vendor people and consultants were everywhere, new projects were staffed, and years passed.
To the best of my knowledge, all this never resulted in a single completed system ever being deployed on a production basis. The miracle technology was compatible with almost nothing that was really strategic to the company. This lead to costly in-house programming efforts to force interfaces that were technically and semantically dubious (at best). It also lead to continued delays while vendors and internal sponsors made each other and the world promises that no one could keep.


In the end, years passed and the company missed many windows of opportunity while this fiasco drained off money, time, and resources that could have been spent on solving real problems. Eventually a lot of people lost their jobs, but "Mr. Silver Bullet" stayed on to make other contributions.


Folks, I have news for you. The werewolves discussed in this chapter are real; but silver bullets are a myth.

If you want your data warehouse to fail, put your faith in a "silver bullet" technology. If you want it to succeed, use technology liberally to achieve your ends, but rely on a lot of hard work coupled with data warehouse specific experience and the judicious use of the information you found in this book.

Otherwise, when the werewolves come for you, and you pull out your technology gun that's loaded with your latest silver bullet, you will find that you are shooting blanks.

Run OLAP on Top of the Operational Database

This error was the reason data warehousing was invented in the first place. If you want your data warehouse to fail in as spectacular a way as possible, repeat this mistake now.

Believe it or not, people are still trying this, usually with the motivation of saving on disk storage. "All we want is a little OLAP processing, why should we have to double our storage space to do that?" (Well, actually, you might wind up using more than double the storage to do it right, but that is another story.)

The problems with this plan are numerous, and include:

Again, if you want your data warehouse to fail, this is a perfect way to ensure it. If you want your data warehouse to succeed, then avoid this mistake.


NOTE: It is possible that you might actually use a copy of an operational system (e.g., Oracle Financials) database as a temporary measure during early requirements gathering. The way you would do is to copy the schema to another system, and then load a subset of the data into this "new" database, and use an OLAP front-end tool like BusinessObjects against it. BusinessObjects has the capability to map between an entity-relationship database schema in Oracle (and some other RDBMSs) and its own dimensional view of the data.


The problem with this approach is that extracting an appropriate and consistent subset of the production database would be a challenge, and performance might be pretty erratic. After all, the underlying database schema was designed for OLTP, not OLAP.


Because you will soon need to move on to a better design, you should consider carefully whether using a copy of the production database for requirements modeling will be worth it or not.


Run the Data Warehouse Database on the Server for the Operational System

This mistake is similar to running an OLAP front-end tool on your production database. Usually, the plan is to add more disk drives to an existing production system in order to save money on hardware. This seldom works, and usually leads to serious problems. Consider the following:

Use Normalized Data Structures

Using complex data structures that end users do not understand is a sure way to prevent them from getting value from the data warehouse.

When using a relational database like Oracle, experience has shown that a relatively simple star schema is usually the best approach to providing data structures that end users can use.


WAR STORY: I once attended a seminar in database design for a decision support tool that was new to me. The case study at the end of the seminar was intended to test our understanding of what we had (hopefully) learned. Other design teams produced complex, elegant designs--at least I guess they were elegant, they were so complicated and difficult to understand that it was hard to tell.


My schema design was totally different. I had simply studied the sample queries that "end users were likely to use" and designed the schema to make it as easy as possible for end users to pose those queries.


After our schema designs were all complete, they were judged by the resident chief technical giant in this product. Much to my surprise, I was awarded first prize. The reason: He judged that the queries would run the fastest against a schema that reflected end-user access patterns. I don't know if a schema design that is highly legible to end users will always turn in the best performance, but it is certainly the right place to start. The query the end user can't pose will never run to completion, and that is the longest response time of all.


Again, for warehouse success, design your schemas for ready use by end users; they will love it.

For warehouse failure, create a complex, technical design that takes an IT expert to figure out and use; the end users will stay away in droves.

Define a Complex Warehouse Architecture

It has become popular to define complex data warehouse architectures based on consensus management among the technical members of the team. This is where every techie gets to squeeze their favorite technology into the warehouse, whether or not it is needed and works well with the other technologies present. I call this the "kitchen sink" architecture, because it includes everything available but the kitchen sink.

This kind of excess is usually justified as "selecting the best of breed." While it is, obviously, a good thing to have the best tools possible, complexity is hazardous and costly. As the number of elements in a design rises linearly, the risk of interoperability problems and failure goes up exponentially.

It is best to minimize the number of components in a design to the fewest number that can do the job well. That means the technical team must discipline itself to the KIS principle: Keep It Simple.


CAUTION: The above call for simplicity in data warehouse architecture is not a call to deny the warehouse team the tools it legitimately needs to get its job done. The practice of data warehousing is characterized by heavy dependence on numerous tools. The opposite (and equally dangerous) extreme of too many technology components is not enough. Failing to acquire adequate tools soon enough tends to produce slow moving projects with poor quality deliverables, not to mention eroded morale of all concerned.


Like so many other things in systems practice, what is needed is a reasonable balance between two undesirable extremes.


Skimp on Disk Storage

The rate at which data warehousing eats up disk storage is staggering, but these days disk storage is relatively cheap. Skimping on disk storage may seriously damage the warehouse project by not supporting the level of detail needed by the end users, by creating slow, awkward system maintenance and database backup/recovery scenarios, by making it more difficult to get required data movement and processing through the required time window, and so on.

A properly sized data warehouse will accommodate more than the data needed for OLAP query processing. It will support rapid roll-back if a load fails to complete, or if data quality assurance after the load indicates that there is something wrong with the data. A properly sized warehouse will also support detective work on previously loaded source data if problems are detected after the fact.

Remember, too, that indexes are usually required for data warehousing applications--lots of them. The only exception is when you want to only run parallel query, and you haven't migrated to Oracle8 (Oracle8 has parallel index scans). So many indexes are the rule, and they require lots of space. It is not uncommon to see all the indexes for a given table taking up more space than the table itself.

A data warehouse lives on its stored data. If you skimp on disk storage, it may not be able to fulfill its mission.


NOTE: The need for adequate disk storage applies to the requirements definition and development stages of a warehouse project as much as it does to the deployed warehouse. You may need less storage in the early stages than in the deployed warehouse, but you still need enough to get the job done. I have seen requirements definition and/or development bogged down (and missing schedule dates) on more than on warehouse project simply due to inadequate disk storage. The developers spent more time loading files to and from tape, compressing and uncompressing files, and looking for lost files that had been stashed somewhere (Or maybe deleted? Who knows?) than they did developing the system. (needless to say, moral suffered some). The reason? Someone had gotten too "cost conscious" on disk storage.


On the other hand, I have seen projects speed along, assisted by availability of plenty of free storage to do things like keeping multiple experimental data files on line simultaneously.


Ignore Data Movement Metrics

Want to really blow the credibility and viability of your data warehouse? Do capacity planning that focuses on disk storage and ignores data movement.

Capacity planning for data movement includes the following:

If you cannot move the amount of data you need at speed at which you need to move it, your data warehouse will be a spectacular failure.

It is typical for there to be a relatively tight time window in which data can move off the source systems into the warehouse and from one warehouse stage to another. These deadlines must be met, or the data will not be available to the end users.


WAR STORY: One time a team on which I was working was called in to solve a problem with a data warehouse load being unable to make it through the required time window. The diagnosis turned out to be relatively simple: The client had bought some very expensive data warehouse hardware with more than adequate disk storage for the database, but had failed to properly size the connectivity to the source system. The source data simply could not make it through the available time window and onto the disk farm of the data warehouse using this connectivity scheme. Investigation revealed that for technical reasons with this connectivity setup the data movement was being piped through a single process on the source side, and this process was, essentially, single threaded. A good deal of money had been wasted on this particular connectivity setup, some of which had to be replaced at additional cost.

The client in this War Story was lucky. I have seen situations where clients have purchased hardware that had impressive specifications on paper, but, because of design issues too complicated to discuss here, would never reach anywhere close to the potential. In some of these cases, this was the top of the line for the vendor--there was nowhere else to go without starting over and changing vendors.

Defer Disaster Planning

The title really ought to read: "Defer Planning for Component Failure, System Failure, Operational Failure, and complete Operational Disaster." But who could live with a title like that?

The temptation in designing a data warehouse is to focus on cool new front-end tools, and on getting something implemented as soon as possible. Unless care is taken to ensure otherwise, the technical members of the team will predominate in applications programming and database backgrounds, and will lack the operations and network backgrounds needed for complete disaster planning.

To avoid this mistake, you should include operations and network people on the team from the start. They do not have to be full-time team members in order to contribute in this area; but if you do not want your warehouse to fail after it is deployed, plan for disaster.

Mistakes That Cause Pain

We have already discussed a variety of surefire ways to make your data warehouse project fail, and there has been something there for you, whether you are an executive on the business side of the house, an end user, an IT/IS professional, or whatever.

Now I will briefly present some things you can do that will (probably) not be fatal to the project (unless you do too many of them, or have bad luck...), but can surely cause some pain.


TIP: Always investigate the economics of getting the vendor to bring the training on-site. Sometimes on-site training will not pay off, but often it will reduce both tuition and travel costs. As a bonus, you will probably have more flexibility in scheduling.


Having said this, don't sabotage your on-site training with inadequate space and equipment, demanding "accelerated" training schedules, or permitting attendees' regular jobs to intrude on their training time, etc.--unless, of course, you want it to fail....



NOTE: While end users will want Web browser access for other reasons (like ease-of-use and dial-in access), it is also the case that browser access enables a "thin client" architecture that may prove valuable for a lot of reasons (like ease of administration and less powerful and expensive end-user client equipment than is possible with the "fat client" OLAP front ends available today).

How You Can Contribute to Project Success

I have already discussed some surefire ways to get your data warehouse project into serious trouble. Now I will discuss some ways you can contribute to its success instead. This discussion is organized by functional area:

The purpose of this discussion is to point out how your own functional area can contribute to data warehouse success. While it is worthwhile to understand the contributions needed from other functional areas, it is most important that you understand and commit to doing those things that are needed from your own area.

For the most part, the recommendations in this section can only be carried out by the functional area for which they are made. If your functional area does not make these specific, positive steps toward success, no one else can do it for you.

If your functional area does not take these positive steps, it is contributing to data warehouse failure, not success. Why would you want it to fail?

The Executive Suite

The End-User Department(s)

Project Managers and Project Leaders

The day of the minimally involved project manager who administers project budgets and milestones against frozen time lines and lists of deliverables is over.

As we have discussed elsewhere, this ("waterfall") methodology will not work in data warehousing. That means the appropriate project management style must change to fit the new methodology.

The needed style, both for the project methodology and for the associated project management is much more dynamic. Change is the rule, rather than the exception.

This means that data warehouse project managers must be more actively involved in the project's progress than with older methodologies. You must understand both the business and technical issues that are the CSFs (critical success factors) for the project, and be able to articulate them clearly at both the technical and the executive level. You must understand your client's strategic business drivers and understand how this project is to contribute to them.

Monitor the JAD/RAD sessions closely and compare progress with the project's budget in terms of both time and cost. While an interactive, iterative, hands-on methodology is a requirement for data warehouse success, this approach can quickly get away from you if it is not watched.

It is essential that you constantly review progress and priorities with the end-user community. Have their priorities and perspectives changed, based on what they have discovered since your last review? Is the project still on track with their current priorities? Note that this requires a project management style that is very end user oriented. You will need to interact with and understand the end-user community much better than has often been the case with many classic IT/IS project management styles.

Remember that project change does not necessarily mean that you have to blow your budget. If you began the data warehouse project by setting the expectation that direction and emphasis may change and that what gets implemented for the current work (and budget) cycle is driven by the end users' business priorities, then you can negotiate as you go.


WAR STORY: Once I was managing a decision support project (we'd call it a data mart today) for a functional area of a large bank holding company. Early on we had agreed to cover a certain number of topical areas within a certain budget. We were working the areas in priority order; we were right on budget (well, a little ahead, actually). Everything was going fine, except, based on the experience we had gained, I could foresee that the last, lowest priority area was going to be a lot harder than we had thought, an absolute budget buster. There was no way we could do it and stay within budget.


I was dreading our next status meeting. We could drop the last, lowest priority area, but that wasn't going to look all that good. We could go back to the well for more money, but my sense was that my client wasn't going to like doing that.
When we met, each found the other had the same problem. The client had, based on his experience with the first few areas, discovered something much more important that needed to be done. No, he didn't want to go back to the well. No, he didn't really need the nasty, low priority budget buster we hadn't worked yet; this new thing was much more important. It was also a lot easier to do.
We all left thrilled; the client had his priorities met, and the project finished on time and in budget.


In the above war story, note the positive results from end-user interaction, priorities, negotiation (informal), and compromise. Notice how working with the clients saved the project budget instead of blowing it. In the old days, we used to call this kind of thing "common sense" and "getting along with the customer." Now we need fancier terms for the same thing, but the idea is the same. Ignore them to your peril; help them, and they will help you.

IT/IS Management (Also for Consulting Managers)

Data warehouse projects should only be undertaken after obtaining deep, serious executive commitment and the accompanying acquisition of a specific, committed executive sponsor who is going to apply the influence, budget, and patience to see the project through.

It is essential that you appoint a project manager who understands and has prior experience with data warehousing concepts and methodology.

Do not give way to the temptation to appoint "good old Joe" as project manager because he's available or because he needs to be rewarded for years of faithful service on OLTP systems.

The considerations, methods, and CSFs (critical success factors) endemic to data warehousing are significantly different from those in operational systems. Others have already blazed the trail on what does and does not work in data warehousing; it was hard, painful, expensive experience. The price of exploration has already been paid by others, and there is no need for you to pay it again.


TIP: If you want a project manager who lacks experience to become proficient in data warehousing, the answer is to obtain data warehouse mentoring services from a consulting firm that is experienced in data warehousing and has a well-laid out program for knowledge transfer.


Do not permit the consultants to do the project for you. Require that they train your staff and walk them (including "good old Joe") through every step of the process.


At present, rigorous mentoring combined with a lot of reading in good data warehouse books (like, ahem, this one) and articles is the best way to build usable experience without paying the cost of failure.




NOTE: With respect to having consultants mentor internal IT/IS talent on their first data warehouse project, do not expect to save money on consulting fees for your first data warehouse project by having the consultants mentor your staff. On a reasonably sized data warehouse pilot project, it will take as much consulting time to teach your people to do it as it will for the consultants to do it themselves-maybe more.


Furthermore, you are asking the consultants to divulge valuable information, practices, and skills that may have the long-term effect of reducing their market opportunities. This is not a project where qualified consultants will be motivated to cut pricing to the bone. Remember, by definition, you are asking them to forgo most, if not all repeat business with you in this area.


Be prepared to pay a reasonable, even premium, fee for data warehouse mentoring and knowledge transfer. Then insist on a quality result.


Insist on beginning data warehousing in your organization with a pilot project that:

This will be a learning experience. Be prepared (than means set expectations and allocate budgets) to rework significant parts of that first data warehouse effort within the first 12 to 18 months of its initial deployment. Because the first attempt will be a learning experience, expect to have to improve on it. To repeat what has already been said, play it smart: Learn to crawl before you walk, and walk before you run.

Insist on beginning data warehousing in your organization by putting together an overall, strategic plan that is driven out of the client organization's strategic mission and strategic challenges. Do not permit this exercise to last more than six to eight weeks, at most, and require the plan to be focused on defining business needs and challenges, while scoping the technology issues that are inherent in the source systems of interest. Do not, under any circumstances, permit this effort to focus on data warehouse technologies and architectures. (Regarding technologies and architectures, see the next paragraph below.) Even if you intend to do the entire data warehousing project in-house, it may be prudent to retain an experienced, high profile, data warehousing consultant who is accustomed to drawing up quick, economical, high level, strategic data warehouse plans (and can show you sanitized samples of past work). Internal IT/IS professionals are not noted for their ability to move at the top of organizations' management hierarchy and express themselves in clear, succinct, business terms. This is an area where a little external help can get you a long way.

Do not permit the warehousing project to begin with technology evaluations and data warehouse architecture designs. Do this only after developing the business requirements for the data warehouse. Otherwise, you are focusing on tools without having defined the business problem you have to solve, and you will certainly try the patience of your executive team and client sponsors, if you do not lose them altogether.


NOTE: Resist the temptation to build any tool you can buy. Building will gratify your technical staff, drain your budget, infuriate your clients, and make your project so late that it will be canceled (or outsourced). Your clients are interested in rapid resolution of their business problems, not in providing an exciting infrastructure programming experience for your staff. Further, the estimated cost of software infrastructure development is always the tip of the iceberg--the other 90 percent of the cost is lurking under the surface, just waiting to sink your data warehouse. (And if the development cost doesn't get you, the maintenance cost will.)

Make sure the data warehouse project staff gets the tools and training it needs early on. Rapid, economical, successful data warehouse projects are partially predicated on heavily leveraging appropriate skills and tools from the earliest stages of requirements definition. Many of the tools, perspectives, and design skills needed in data warehousing are quite different from those needed in OLTP systems and client/server programming. If you are going to use internal staff without significant prior experience in data warehousing, give the project the opportunity to succeed it deserves -- get its staff the best tools and training available; and consider finding them some mentoring from consultants who have been down the road before. Unless, of course, you have an agenda to see the project fail....

Technical Team Members (Whether in IT/IS or in a Consulting Organization)

Become end-user oriented, focus on solving the client organization's business problems, not on beefing up your resume.

Never assume you know more than your clients, or know what they need: you don't.

Communicate with your end-user counterparts often. If you have worked for an entire week without reviewing results with them, you are probably working in a vacuum.

Be prepared for change, and be patient with your clients. They are having to understand a new paradigm for analysis and decision making. They will have to make some course corrections as they go, and that will affect your design, data model, applications, everything.

Create designs that are simple, and are oriented toward end-user comprehension, rather than technical elegance.

Design your processes and procedures with cascading component failure in mind: it will happen. Cascading failure is a situation that arises in complex systems when one, or perhaps several simultaneous, unrelated, failures encounters or leads to other failures and starts a chain reaction that goes places you would have never thought possible. Jurassic Park is an example of several simultaneous unrelated failures interacting in unexpected ways that lead to cascading, catastrophic results. Data warehousing is by definition a complex system: multiple technologies are typically being used to integrate (and reinterpret) data from potentially numerous, previously unrelated systems and/or technologies. You do not want your data warehouse project to be remembered as another Jurassic Park. To minimize your chances for and effect of cascading component failure:

Remember, Murphy is out there and he is hunting you and your warehouse. Your alternative to planning for failure avoidance and containment is to hang up a "Welcome to Jurassic Park" sign over your desk and then try to get the T-rex back in her pen while the raptors try to eat you.

Always suspect the data. It will be dirtier and more convoluted than you think. Neither the end users nor the source system DBAs and programmers understand the data as well as they think they do. There are traps in there, just waiting for you to assume something.

Summary

In summary, there are a number of things you can do to kill your data warehouse project, or at least harm it significantly (and your career with it). Some of these are a matter of common sense, but it is astonishing how fast common sense falls by the wayside and rationalization sets in when ambition meets lack of experience and/or pressure resulting from time and resource constraints.

I have seen or read about every one of the fatal mistakes discussed in this chapter. I have seen most of these mistakes repeated more than once. The results are almost invariably the same: project failure, dramatic cost overruns, furious users (many with substantial influence), damaged careers, and outsourcing (or switching to a new consulting organization).

At the rate some of these mistakes are being made, you would almost think the goal for the data warehouse project was failure, instead of success.

On the chance that project failure is not your goal, the list below summarizes the positive steps you can take to avoid some surefire ways to make your project fail.

So, now you've read a lot of tips on how to make your data warehouse fail. I surely hope that you will avoid these mistakes yourself, and that you won't take me seriously and do them to someone else. Meanwhile, I am aspiring to maybe someday be the Scott Adams of data warehousing. (Yeah, yeah, I know: You know Scott Adams, he is your friend, and I am no Scott Adams. I said aspiring, didn't I?)


Contents

© Copyright, Macmillan Publishing. All rights reserved.