Expert systems are traditional AI tools used for various diagnosis and recognition tasks. Certainly these systems are data-intensive and rely heavily on rules, patterns, and of course, data and lots of it.
A typical credit card fraud detection system will consist of:
Therefore, the main point here is that an expert system is really not a "smart" system by a system with a knowledge base of large number of pattern cases that fall within its domain of expertise, as well as lots of data about various cases that it already come across or potentially may come across in the future.
Here is a rule-based example:
if (A and B) or (C and D) or (A and
D) then occurrence of (B and C) raises red-flag with certainty
0.75.
A, B, C, and D are various situations.
There are two types of rules that can be supported by the systems:
Certainly, the primary rules in the system should be those that detect any fraudulent cases on the use of credit card.
After setting up both the knowledge and rule bases, perhaps we could take a look at issues of parallelism. Since parallel computing plays some quite useful role here, we can parallelize our rule-based system as follows: suppose we have a reasonably large network of workstations or processors. We start by assigning one or more rules to each processor. So, with condition-testing performed in parallel, the time to find rules that can fire is greatly reduced. If the action part of a rule consists of subsections, it may be possible to parallelize their execution as well.
A neural network (as shown below) would also require a large number of fraudulent and non-fraudulent pattern cases for training. This is particularly true for feedforward nets. before training begins, a proper representation of these cases need to be put together and the network weights are set up to reflect that representation.
Throughout training, the net's connectivity or connection weights change from one state to another reflecting the case (fraud, non-fraud) at hand.
Some disadvantages: the user certainly will need a huge network with perhaps no more than 2 hidden layers. Also, the most important thing to come up with is the representation of the cases. Mapping from symbolic to numeric will need to be done as well as the other way around.
Perhaps, if putting together a large knowledge base with an equally large set of rules is not a problem then expert systems would be more suitable for this domain. I'll go further and say that a hybrid or a multi-phase approach would be the most suitable for this kind of application. First, pre-processing is done for the incoming cases to infer any new properties or clues about which direction that we might be heading with case, that is, so far does it look like a case of fraud or not. Second, armed with this information, we go on to the next phase using systems such as neural nets (feedforward or feedback) to come up with the conclusive result. The feedback neural net paradigm will require a different training technique than the above feedforward net. It is probably simpler to stick with the feedforward approach and leave the feedback network to be used in the physical optimization approach discussed next.
One of the most widely used AI-based heuristic is Tabu Search (TS). It does quite well in areas where search and comparisons are done at a high frequency. For more on this method, please see my answers to problem 1. In order to apply the TS algorithm of Figure 1.2, we need to map the representation of the problem into a structure that can easily be handled by the search process.
The main ingredients of Figure 1.2 can be set up as follows: f is the objective function of measuring how close we are in detecting whether the case at hand is fraudulent or not. This function will either be minimized or maximized, depending on the direction of our search. T, which is the tabu list, initially will be empty but as the search progresses, cases or patterns that have been searched and compared will be added to become tabu. V*, at every iteration, will hold solutions, or partial solutions of the case that at the end may possibly help in forming the final solution. X is the set feasible solutions. f* will act as the lower bound for f. and N(S): for a possible solution S would be the set of other plausible neighboring solutions that are obtained by some variation on a rule or a test case in the knowledge base.
As we stated previously, TS will perform search intensification quite well. That is, it will explore more thoroughly portions of the search space that seem "promising". Therefore, if we encounter some promising leads during the search, they will be looked at more closely to see if they can lead to a partial or an overall solution.
The other main property of TS is search diversification, which is a mechanism to "force" the search into previously unexplored areas. For the fraud detection case, it is quite possible to overlook certain clues about either the way the card is used or the type of merchandise purchased or the time frequencies the purchases are made, etc. Perhaps, some of these overlooked clues may be of great help in shaping up partially or fully the final picture about the case. Using search diversification, TS will explore these clues if they are available somewhere in the search space.
certainly, TS is an interesting search approach that we can use for tackling the detection problem. To speed the process of detection, we could adapt probabilistic TS. This will be faster because it only deal with a part of the overall search space. But the disadvantage is that good solutions can be missed.
One possible system would be a feedback network with a non-binary encoding since the final decision of any detection would be made in one-of-K with K > 2 . To simulate the network, we can use the Potts spins model by first deriving the mean field equations corresponding to K-state Potts spins. One advantage of this method is that by using the mean field approximation, continuous variables "feel their way" in a fuzzy manner towards the neighborhood or the area of the desired solution. This is in contrast to expert systems approach, where moves are performed within the discrete solution space.
On the surface, credit card fraud detection does not look like an optimization problem, but dealing with large number of test cases, rules, data, constraints, etc., it can be viewed as an optimization problem. Essentially, what we try to optimize is the degree of detection of the case at hand. Does it look more of a fraudulent than non-fraudulent or is it the other way around? Answering this question is quite similar to optimizing a set of figures either up or down, etc. Using this neural net/mean field approach, here is roughly how we go about doing the simulations:
Perhaps, a better approach is to couple a case-based or a rule-based expert system with SA resulting in a multi-phase processor. Since fast responses are quite essential when dealing fraud detection problems, pre-processing can be performed by the first phase, perhaps producing new information or relations that can be used in the second phase to help come up with the final solution. This surely will not only speed up the process but also produce a more accurate result than using SA without the preprocessor.