Intelligent Business Systems


   How Well Do You Know Your Customers? 
   Intelligent Business Systems: A Three-Level Approach 
   Intelligent Data Analysis 
      The Intelligent Processing Difference 
      Intelligent Processing Asks Better Questions, Offers Better Answers 
   The Power Behind the Intelligence 
      The Scalable Disk Array: The Heart of Every Intelligent Business System 
      SuperSPARC Processing Nodes 
      Fast Access to Other Corporate Systems 
      A Powerful Array of Standards-Based Software 
         Oracle 
         Decision/SQL 
         Parasort 
         Integrating Familiar Client Applications 
   Summary 


1.0 How Well Do You Know Your Customers?

This question, more than any other, crystallizes the challenge of doing business in the 1990s.
Understanding the preferences, habits, and trends of the world in which you operate is critical --
because these behaviors will determine your future success.

Getting close to customers means anticipating what they will do next. It means asking a lot of
questions -- questions whose answers once depended on somebody's hunch, a committee
consensus, or an outdated rule-of-thumb. Who will be next year's big spenders? Which clients will
default on their accounts before the end of this year? Which product will capture the greatest
market share in the third quarter? Why are some people buying a product while others are not?
Thinking Machines now offers you comprehensive answers to these questions from the most
objective source possible: your data. 

If you have a lot of data, you have the answers to these and many other business questions buried
deep inside it. Good answers. Answers that come from the actual transactions of your business.
Answers that can make you efficient, productive, and profitable.

So how do you get them out?

Don't count on mainframes or warehouse environments: mainframes lack the power and
warehouses lack the analytic capabilities necessary to find subtle patterns in billions of pieces of
information. What you need is a system that can tell you where your customers are headed next
week, next month, next year. Right now. 

What you need is an Intelligent Business System from Thinking Machines Corporation.


2.0 Intelligent Business Systems: A Three-Level Approach

Intelligent Business Systems from Thinking Machines combine the expandable power of massively
parallel processing, the compatibility of standards-based relational database technology, and the
predictive capabilities of intelligent algorithms. This combination provides a return on
information-system investments that grows with your data, because that data will yield answers
whose business value rises faster than its storage and processing costs.

   1. Our expandable parallel hardware is essential to business computing because it keeps
   your information system affordable at each growth step. Intelligent Business Systems grow
   with your company's data. Adding capacity and performance is as easy as plugging in a
   module. These modules are cost-effective because they use mass-produced components
   such as microprocessors and small disks -- the same components that are in your
   workstations. 
   2. To manage your data, Thinking Machines business systems offer standard database
   environments, including Oracle. Our systems interconnect with existing platforms using
   standard hardware and software protocols. For example, you can easily feed transaction
   data from dedicated OLTP systems into our system for intelligent decision support.
   3. Intelligent algorithms make the transition from retrospective to predictive uses of data.
   Predictive power comes from data set size. The larger the data set, the more telling the
   patterns that are embedded in it. But traditional algorithms can't discover these patterns.
   They were designed in the mainframe era to recapitulate comparatively small amounts of
   data. Thinking Machines intelligent algorithms can predict what your customers will do
   tomorrow. That's what we mean when we say that our Intelligent Business Systems get you
   closer to your customers. 


3.0 Intelligent Data Analysis

Early on, Thinking Machines recognized that transaction databases would require, and benefit
from, a new generation of intelligent prediction tools. Since 1984, we have been developing and
testing just such a tool set. Back when large datasets were rare, we sought out the biggest
available, such as those at the U.S. Census Bureau. We have worked with some of the most
successful companies in America, refining our techniques for intelligent database processing.
Every new experience honed the accuracy and predictive capabilities of these tools.

Today, as the data-rich business environment becomes the norm, Thinking Machines is ready. Our
suite of tools recognizes subtle shifts among billions of pieces of data, then uses these patterns to
predict product preferences, buying habits, and market tendencies. These are not black-box tools
that give you conclusions without explanations. Our tools are engineered to give you both the
answer and the reasons why. Because of our ten-year head start, we are the only company in the
industry to offer these capabilities.


3.1 The Intelligent Processing Difference

Traditional statistical methods of data analysis work well with small amounts of data. Given files
with a dozen or so fields per record, these mainframe-era techniques can find important
correlations efficiently. 

But when the amount of data goes up -- as it already has in virtually every business -- traditional
methods bog down. Circumventing this problem is generally left to the interpreter, or data analyst,
who picks and chooses a few fields to analyze. Out of one thousand fields, an analyst might have to
discard 980, then try to work with the remaining twenty. That's not very intelligent.

Intelligent methods adapt to the data, without preconceived notions about what the answer will
look like. Instead of discarding 980 fields before it even begins, for example, an intelligent algorithm
keeps all thousand in play, and gradually learns which ones contribute positively to the analysis.
Only when it understands which fields of data really matter does it rule out the rest. And it does this
for every individual question, because fields that are irrelevant to answering one business question
may hold the key to answering the next one. The quality of the answer you get from an intelligent
algorithm is based on the amount of data you analyze: the more clues you have, the better it gets.
The quality you get from a traditional approach depends much more directly on the analyst doing
the work.


3.2 Intelligent Processing Asks Better Questions, Offers Better Answers 

Thinking Machines intelligent processing is breakthrough technology that leverages large, rapidly
expanding databases to bring companies close to their customers. Our intelligent processing toolkit,
Darwin, incorporates sophisticated prediction and classification algorithms, some inspired by
neurobiology. Darwin currently includes four tools: StarMatch, StarNet, StarTree, and StarGene.
StarMatch, which uses memory-based reasoning (MBR) technology, compares in parallel the
characteristics of one database record to all others in the database to find similar situations which
can be used to predict outcomes. StarNet uses artificial neural network (ANN) technology to
create the rules for defining a record group; StarTree uses a parallel implementation of techniques
similar to Classification and Regression Trees (CART) approaches to perform this function.
StarGene uses evolutionary techniques drawn from genetics to optimize existing prediction
algorithms. 

The power of Darwin's techniques lies in their generality. If the answer is in the data, Darwin will
find it. Here are just a few examples of questions that Thinking Machines clients have asked us to
answer, using actual business data. They are questions that traditional approaches were not
designed to address. These are the kinds of questions that put you close to your customers, not just
close to your business.


"Give me a breakdown of all customers likely to default in the coming year."

Predicting which customers are likely to default, before they actually do, is a concern for every
business. That's why a leading consumer company asked Thinking Machines to use its data to
attack this problem. 


From the baseline of the number of defaulters that could be picked out by just guessing,
Darwin was able to predict twice as many additional defaulters as traditional statistical
techniques.

Darwin analysis started with a historical database of customers already known to have defaulted or
not defaulted. Our task was to predict which customers would default during the next six months,
based on this data. We applied Darwin's StarTree tool to the data. 

StarTree, which uses technology similar to Classification and Regression Trees (CART)
techniques, surveyed the large, historical database. It considered every field of data as potentially
important, choosing the most relevant fields to compose a system of rules for predicting future
defaulters. 

After creating rules for finding likely defaulters, Darwin then applied StarMatch. This tool, which
uses techniques comparable to the K Nearest Neighbor (KNN) model, is able to apply rules to
individual records in ways that are too complex to carry out on a mainframe. StarMatch compared
in parallel the characteristics of each new customer to all those in the historical database. When the
nearest matching customer turned out to be a defaulter, the system had powerful evidence that the
new customer might also default.

It took Darwin just an hour to build the rules for predicting defaulters -- and even less time to
make its predictions. When all was said and done, Darwin's tools were twice as effective at zeroing
in on likely defaulters when compared with standard statistical techniques. What's more, the
Intelligent Business System provided a 192X performance and a 55X price/performance
advantage over the company's mainframe. 


"Do three things: predict year-end demand, tell me which customers will fuel that demand, and
tell me why."

Companies with seasonal sales patterns are always trying to develop strategies to even out their
revenue curves and make more efficient use of their production capacity. Recently, Thinking
Machines helped the division of a Fortune 100 client get behind the peaks and valleys of its sales
charts and see how customer subgroups affected its bottom line. 


Based on its predictions of year-end demand, Darwin discovered three distinct classes of
customer behavior.

The division's business was quite seasonal -- sales were high in the first quarter, but they dropped
at mid-year, before growing back in the third and fourth quarters. Giving us its
January-November sales data, the division asked Thinking Machines to predict December
revenue, and then to identify the customers who would drive that year-end demand. 

StarTree scoured the data for subtle patterns. It worked its way back through the historical
records, evaluating every field along the way. It examined customer spending as many as six
months prior to December to refine its end-of-year predictions. 

Once it had forecast December revenue, Darwin used these predictions to generate complete
annual spending profiles for all the division's customers. In so doing, it uncovered multiple distinct
buying patterns. The top ten percent of all customers bought at a rate consistent with, though less
pronounced than, the division's overall seasonal variations. Of the remaining customers, most
bought steadily through August. But Darwin isolated one large subgroup whose sales were quite
different from the rest. 

Why did this segment virtually stop buying for three months? What other tendencies did this group
exhibit? Was there any way to boost their spending? Darwin's intelligent processing had taken the
company from the realm of hypothesis testing into that of true data exploration. 

To verify the accuracy of its results, the client then applied Darwin's model to a new, untested
dataset; its predictions proved accurate within five percent. 


"Define a highly concentrated micromarket within our database."

Many businesses want to perform more tightly targeted marketing, often referred to as
"micromarketing." Micromarketing involves making special offers to customer segments that meet
very select criteria. To make this approach effective, companies need a fast, efficient way of
identifying unique customer subgroups within their databases. Intelligent Business Systems help
make that possible.


Late last year, a nationwide company asked Thinking Machines to find and define highly
concentrated subgroups within its database that would respond favorably to future, targeted,
mailings. Its goal was straight-forward: to reach a higher percentage of customers likely to
respond, while minimizing mailings to those unlikely to do so. The Intelligent Business System had
to produce a quantitatively defensible profile of each micromarket it defined. 

We applied Darwin's StarTree tool against a large database with over 1,000 fields per record. The
tool's task was to create rules that could identify responders and non-responders to future
mailings. StarTree's search was exhaustive -- it scoured every field in the record, considering
millions of combinations that might characterize customer behavior. 

By analyzing historical records, Darwin zeroed in on a concentrated 5% segment of the database
that would account for nearly sixty percent of all potential responses. Targeting only these
customers cut mailing costs by a factor of twenty, and yielded a six-fold improvement in response
rate. 


"Create a model that really explains why some customers renew their subscriptions and others
don't."

Why do good customers suddenly take their business elsewhere? Every year, companies around
the world confront this question, and struggle to find cost-effective methods for accurately
predicting customer behavior. Intelligent processing now makes such predictions possible.


Because of their limited capacity for data, traditional methods (left) often operate on only
1-2% of the data available in each record. Darwin techniques (right) look at every bit of
data in the record.

Thinking Machines worked with a major U.S. service bureau to build a system for predicting
customer attrition, or non-renewal of membership. There were significant challenges: to salvage
an account, the company had to know in advance it might lose it; and to reach only those it was in
danger of losing, the attrition prediction had to be painstakingly precise.

Thinking Machines used StarNet to implement a neural network. Our massively parallel processing
made for quick modification and improvement of the neural net's behavior and predictions. After
analyzing all the records in the database, Darwin identified a set of subscribers ten times more
likely than the average customer to cancel their accounts within the next few months.

Using conventional programming techniques, this investigation could easily have taken a year,
without the accuracy delivered by Darwin. Because Darwin's modules train themselves, we were
able to produce superior results in just two weeks. 


"Suggest ways of regrouping our customers into new market segments."

Intelligent Darwin technology is a powerful aid for companies engaged in direct marketing. Darwin
supports the design and targeting, for example, of unique customer catalogs, each with its own
customer subset and products known to appeal to that particular group.


Darwin can cluster individual customers into market segments based on actual similarities
in the data, rather than using guesswork or last year's segmentation.

Conceptually, we start by creating a custom catalog for each customer. These individual catalogs
provide the starting point for optimization. The system then randomly clusters pairs of customers
into the same catalog. 

When predicted net revenue increases from the clustering, the system keeps the pair together, and
eliminates a catalog. If the pair has nothing in common, Darwin returns each back to its original
catalog. The system then applies this approach again and again -- millions and millions of times --
until it discovers the best balance of catalogs and overall net revenue. Using this method, Darwin
can find profitable groupings that no one knew existed, because the new market-segment
definitions emerge from the data itself.


"Find some customer-preference patterns I might not be aware of." 

Intelligent Business Systems cannot make people buy your product, but they can discover
undetected, and perhaps unusual, purchasing patterns. The generality of Darwin and the sheer
power of its underlying hardware let you make the most open-ended kinds of requests possible:
"Go find me something interesting."


Darwin highlights patterns and correlations that it discovers by performing an
unstructured analysis of the data.

With a few rules that define a minimum acceptable correlation, Darwin can investigate every field
in every database record and compare them to all others. You might, for example, run such
analyses overnight, using the full power of the system without interrupting prime shift processing.
Or you can analyze a database on a time-shared basis, using system resources only when they are
available.

You can then decide which correlations to explore further. You may discard a purported correlation
between a person's middle initial and their account balance. But you may want to follow up on an
unexpected relationship between their account balance and the dates of their monthly payments, or
differences between their mid-week and weekend buying patterns. Only Intelligent Business
Systems from Thinking Machines let you explore this full range of opportunities.


"I need this answer in an hour."

Not all questions are discovery-type questions. When you know exactly what you want, the
system's Decision/SQL facility can provide the exact answer 50-500 times faster than a
traditional mainframe. Decision/SQL means you can ask more complex questions about bigger
datasets, and still get the answer fast.


Consider these four businesses: a credit union, a department store chain, a regional telephone
company, and a nationwide credit-card firm. Each company uses more data than the previous one,
and the complexity of their queries grows along with the size of their datasets. 

The credit union, for example, may ask a very simple question, "How many members have written
over $1,000 worth of bad checks in any given month?" Questions like these can be answered in
about an hour on a workstation.

Meanwhile, at the department store, the level of query complexity increases. A manager might ask,
"How many customers made at least one third of their purchases in a single department?" In a
typical retail environment, answering this question entails searching through roughly 30-million
transaction records, or a 3-Gbyte database, and takes an hour to answer using a mainframe
computer.

At a regional telephone company, a company official might ask, "How many customers made more
than $100 worth of phone calls from Minneapolis to Chicago in November?" The phone traffic
between the two cities could easily generate 50 Gbytes of data. A small Intelligent Business
System running Decision/SQL could process this query in an hour.

Of the four businesses, the credit-card firm would have the most data, as every customer purchase
would create a new transaction record. At this level, queries become very complex. A marketing
manager might ask, "How many accounts have total charges of $10,000 or more with airlines in
calendar years 1990-1993? Exclude accounts that were not active for the entire period, break the
results down by state, and give me the average airline charges for all accounts." This kind of query
involves about 100 Gbytes of transactional data and 1.2 Gbytes of customer data. Yet processing
all this information using Decision/SQL on a large Intelligent Business System takes just one hour.


"Predict how external changes like currency rates might affect my business."

While working to better understand their customers, companies must also keep an eye on all the
other variables that affect their business. The power of intelligent processing lies in its
generality. Whether it's fashion trends or exchange rates, if you have a lot of data about the past,
Darwin can analyze it for the telltale patterns that hold clues to the future.


In a competition sponsored by the Santa Fe Institute, Darwin out-predicted aall other
entraants.

The Santa Fe Institute recently held a Time Series Prediction and Analysis Competition. The
contest included leading computing firms and universities from around the world, and challenged
participants to predict an unspecified business trend; in fact, contestants did not even know that one
of the files contained exchange-rate data. Each participant received a set of 30,000 data points.
Contestants then had to predict six future values of this data.

Using Darwin's StarNet tool, Thinking Machines constructed an artificial neural network for
making our predictions. The network employed a back propagation algorithm that used the 30,000
data points to learn about historical data patterns and behaviors. The CM-5 sped through this
learning process, updating hundreds of millions of linkages per second. Developed in just two
weeks, the Darwin model received first prize in the competition.


4.0 The Power Behind the Intelligence

Answers to tough questions like those on the previous pages illustrate how Darwin can take large
datasets and turn them into clear answers and trustworthy predictions. The more data detail there
is, the clearer the answers and the more trustworthy the predictions. Darwin tools can do all this
because they operate in a very advanced hardware and software environment. Indeed, the job of
the underlying Intelligent Business System hardware and software is to provide an economical and
expandable home for the data, and an equally powerful environment for analyzing it.


4.1 The Scalable Disk Array: The Heart of Every Intelligent Business System 

At the heart of every Intelligent Business System is an advanced form of the RAID (Redundant
Arrays of Inexpensive Disks) storage architecture pioneered by Thinking Machines (our patents
go back to 1985) and since adopted by every major manufacturer of commercial data processing
systems. As you know, RAID uses large numbers of mass-produced workstation disks to provide
storage that is more reliable, more expandable, and higher performance than traditional drives.
Data is always recorded redundantly across multiple drives. If an individual drive fails, it is
electronically replaced by a spare. The redundancy information is used to reconstruct every bit of
data that was on the disk that failed and copy it to the spare. 

We were the first company in the industry to ship a high-performance RAID disk system, our
DataVault, which we announced in 1987. Among our DataVault customers, Dow Jones &
Company has been in continuous 7-by-24 production operation for over five years. 

Our Scalable Disk Array uses mass-produced, 3-1/2" disk drives that can be removed and
replaced without powering down the system. Each disk holds approximately 2 Gbytes of data. An
array of 256 drives, which fits within a single cabinet, holds half a terabyte. Thinking Machines
routinely operates systems with multiple cabinets; we are experts in the support of very large
system configurations. In fact, of the 100 most powerful computers currently installed in the world,
we have installed 31 -- more than anyone else, including IBM. 

The system is designed to mix multiple generations of disk technology in the same cabinetry. Those
who use 2-Gbyte drives today, for example, can expand with the next generation of 5-Gbyte
drives when they move into volume production.

A single expandable data network interconnects all Intelligent Business System components: disk
modules, processing modules, and input-output interfaces. The capacity of this network grows
with the total number of modules connected to it; in large configurations it routinely sustains total
data rates of many Gbytes/sec. So-called Symmetric Multi-processor (SMP) systems do not
expand in this critical way -- their processors and disks attach to a fixed bus whose performance
does not expand. 

The Scalable Disk Array offers unique flexibility as you grow. You can add a combined storage
module and network interface to the system, or simply attach a storage module to an existing
interface. Either way, the data capacity increases. But when you add a new interface, the
performance of the disk system increases as well. Only Thinking Machines business systems offer
this ability to tune capacity and performance independently. Our installed base includes systems
that sustain disk-transfer rates from 9 Mbytes/sec to 250 Mbytes/sec, the fastest in the computer
industry.

A second unique advantage of the Scalable Disk Array is that you can flexibly specify the level of
RAID redundancy you need. An individual file system can use hundreds of drives, or just a few.
Each structure has its own site-specified complement of parity and spare drives. The file structure
heals itself if there is a component failure by reconstructing the information and writing it to a spare
drive. You can then remove the failed drive for repair or replacement.


4.2 SuperSPARC Processing Nodes

Every processing node inside an Intelligent Business System is, in essence, a SuperSPARC
workstation, compatible with the other SPARC workstations in your organization. Rated at 64 Mips
apiece, these processing nodes connect to the same data network that the disks connect to. 

Because processing nodes and storage nodes attach to the same data network, their numbers can
grow independently in any proportion. As the amount of data grows, additional storage nodes
accommodate it. As the daily volume, or complexity, of the questions grows, additional processing
nodes accommodate them. Thinking Machines is the industry leader in scalable computing. Our
current customer base represents a performance spectrum of more than a factor of sixty, all using
the same hardware components and the same applications software. The scalability of Intelligent
Business Systems makes capacity planning easy. You buy only what you need, knowing the
expansion headroom is there when you need it. 


4.3 Fast Access to Other Corporate Systems

A centralized Intelligent Business System continually receives data from other corporate systems
and disburses extracts of its data to them. Powerful input/output subsystems move this data at very
high speeds. The CM-5 business system provides mainframe connections using standard
mainframe channels. Alternatively, you can connect to a high-performance gateway such as an
RS/6000, communicating using industry-standard interfaces with peak rates of up to 100
Mbytes/sec. You can also use FDDI, Ethernet, and Token Ring connections.


4.4 A Powerful Array of Standards-Based Software

Intelligent Business Systems feature Oracle7 relational database management software; optimized
Decision/SQL query facilities; and Parasort by MRJ, Inc., a high-speed sorting utility. They also
offer links to familiar, standard client applications.


4.4.1 Oracle

Oracle's open architecture integrates Oracle and non-Oracle DBMSs and one of the industry's
most comprehensive collection of tools, applications, and third-party software into an industry
standard environment. Oracle Corporation was a pioneer in offering a true relational database
management system commercially, and has continually led innovations in the database field. Oracle
is portable. Applications developed for Oracle can be ported to other platforms with little or no
modification. Oracle is compatible with industry standards, including most industry standard
operating systems. Oracle is connectable. It allows different types of computers and operating
systems to share information across networks. The capabilities of Oracle result in a comprehensive
and powerful system for information storage and retrieval.

Oracle7 Version 7.1 establishes Oracle as a clear leader in parallel processing, programmable
server technology, and large database support. It introduces the parallel query option, which
significantly improves performance for lengthy data-intensive operations. The parallel query option
allows Oracle7 to split up query execution, data loading, and index creation tasks, and execute
them concurrently on multiple processing nodes. The combination of CM-5 computing power and
Oracle7 parallel features extends Oracle capability into the multi-terabytes range.

The CM-5 is the ideal parallel environment for Oracle7, because both processing nodes and disk
storage nodes connect as peers to the same expandable network. Any processing node can fetch
data from any disk storage node without having to go through any other processors. The network
grows in performance to match the total number of nodes (processing and storage) in the system.


4.4.2 Decision/SQL 

For users processing high volumes of very complex queries on the non-transaction versions of
their data, Thinking Machines offers Decision/SQL. Decision/SQL provides a query-only version
of the Structured Query Language (SQL) standard and a set of load utilities to optimize load
performance. It takes full advantage of the CM-5 architecture: all system processors work
cooperatively on a query. Knowing the data will not be changing on the fly, the system organizes it
in a manner that is optimized for retrieval. Decision/SQL uses few locking mechanisms or indices.
Instead, it employs a vertical-partitioning approach and advanced parallel algorithms to accelerate
processing. 

Decision/SQL fully utilizes the underlying I/O bandwidth of the hardware. It accesses the data in
bulk for queries that do full-table scans and for loads. Performance on GROUP BY, JOIN, and
ORDER BY operations is exceptionally fast. Decision/SQL can process complex queries on
databases ranging in size from tens of gigabytes to terabytes. It outperforms traditional DB2
systems by factors of 50 to 500, depending on the number of processing and disk nodes in the
system.


4.4.3 Parasort 

To deliver the full processing capabilities of our Intelligent Business Systems, Thinking Machines
offers Parasort, a parallel sorting package developed by MRJ, Inc. Parasort supports databases of
hundreds of gigabytes in size, and offers performance that grows with the size of a CM-5 business
system. Parasort features include support for multiple keys of differing data types, the ability to
process variable- and fixed-length records, checkpointing, and the ability to handle database
merge operations.


4.4.4 Integrating Familiar Client Applications 

Intelligent Business Systems let you employ familiar third-party applications, including graphical
user interfaces (GUIs), fourth-generation languages, cross-tabulation programs, application
generators, and data analysis packages. Decision/SQL supports the Sybase Open Client/Open
Server protocol, Microsoft's ODBC protocols, and such popular tools as GQL from Andyne
Computing, Powersoft's PowerBuilder, Impromptu from KnowledgeWare, and Trinzic's Forest &
Trees. Similarly, Oracle provides access to client applications through the SQL*Net protocol,
providing access to all tools that conform to the standard.


5.0 Summary 

Back in the mainframe era, when data was scarce, companies could gain competitive advantage
simply by having data when their competitors didn't. In today's data-rich business environment,
the mere possession of lots of data no longer gives a competitive edge, because everybody has lots
of data. 

Today, competitive advantage comes from what you do with your data: 

   Companies whose computer systems grow smoothly and economically to take in the most
   recent data have an advantage over those whose systems are slow and painful to expand.
   Companies who use their data to predict the future have an advantage over those whose
   systems only report the past. 

Thinking Machines is the only company in the industry with the breadth of systems and applications
experience to deliver both of these competitive advantages in a single system. To some, the
combination will seem a breakthrough. To others it is simply intelligent. 

Thinking Machines Corporation. All rights reserved. Thinking Machines and Connection Machine
are trademarks of Thinking Machines Corporation. All other trademarks are the property of their
respective owners. Photography: Steve Grohe