Comparisons Between Various Optimization Methods

Expert Systems for Detection of Credit Card Fraud

Expert systems are traditional AI tools used for various diagnosis and recognition tasks. Certainly these systems are data-intensive and rely heavily on rules, patterns, and of course, data and lots of it.

A typical credit card fraud detection system will consist of:

Set of rules (assuming the system is a rule-based)
various histories and patterns of both legal and illegal transactions. These patterns have to be present in the knowledge base, otherwise detection may not occur. The more of these case patterns present, the more the likelihood of detecting similar cases. Also, it is always possible that any detected case of fraud may not be an identical case to any single pattern but perhaps partially overlapping one or more patterns already present in the knowledge base. This, of course, totally depends on the rules as well as the available data.

Therefore, the main point here is that an expert system is really not a "smart" system by a system with a knowledge base of large number of pattern cases that fall within its domain of expertise, as well as lots of data about various cases that it already come across or potentially may come across in the future.

Here is a rule-based example:
if (A and B) or (C and D) or (A and D) then occurrence of (B and C) raises red-flag with certainty 0.75.
A, B, C, and D are various situations.

There are two types of rules that can be supported by the systems:

The standard If-Then rules: when the condition is true, permit the assertion with some certainty.
The query rules: also has an If-Then structure, in addition, provided the rule's preconditions are satisfied, return to the user for more information. User's responses represent the assertion.

Probability in the assertion represent the degree of plausibility of permitting the assertion. An example of this would be if the credit card is used for a purchase with an amount much higher than the average of, let say, the last six months, and used with a higher frequency in a different geographical region than the holder's region, then probability of fraud can be a bit higher than normal. Here, normal is the use of the card in the same geographical region of the holder with proper verification of ID (unfortunately, this is almost never done) and an amount that falls within a certain average.

Certainly, the primary rules in the system should be those that detect any fraudulent cases on the use of credit card.

After setting up both the knowledge and rule bases, perhaps we could take a look at issues of parallelism. Since parallel computing plays some quite useful role here, we can parallelize our rule-based system as follows: suppose we have a reasonably large network of workstations or processors. We start by assigning one or more rules to each processor. So, with condition-testing performed in parallel, the time to find rules that can fire is greatly reduced. If the action part of a rule consists of subsections, it may be possible to parallelize their execution as well.

Neural Nets for Detection of Credit Card Fraud

A neural network (as shown below) would also require a large number of fraudulent and non-fraudulent pattern cases for training. This is particularly true for feedforward nets. before training begins, a proper representation of these cases need to be put together and the network weights are set up to reflect that representation.

Throughout training, the net's connectivity or connection weights change from one state to another reflecting the case (fraud, non-fraud) at hand.

Some disadvantages: the user certainly will need a huge network with perhaps no more than 2 hidden layers. Also, the most important thing to come up with is the representation of the cases. Mapping from symbolic to numeric will need to be done as well as the other way around.

Perhaps, if putting together a large knowledge base with an equally large set of rules is not a problem then expert systems would be more suitable for this domain. I'll go further and say that a hybrid or a multi-phase approach would be the most suitable for this kind of application. First, pre-processing is done for the incoming cases to infer any new properties or clues about which direction that we might be heading with case, that is, so far does it look like a case of fraud or not. Second, armed with this information, we go on to the next phase using systems such as neural nets (feedforward or feedback) to come up with the conclusive result. The feedback neural net paradigm will require a different training technique than the above feedforward net. It is probably simpler to stick with the feedforward approach and leave the feedback network to be used in the physical optimization approach discussed next.

Heuristic Methods for Detection of Credit Card Fraud

One of the most widely used AI-based heuristic is Tabu Search (TS). It does quite well in areas where search and comparisons are done at a high frequency. For more on this method, please see my answers to problem 1. In order to apply the TS algorithm of Figure 1.2, we need to map the representation of the problem into a structure that can easily be handled by the search process.

The main ingredients of Figure 1.2 can be set up as follows: f is the objective function of measuring how close we are in detecting whether the case at hand is fraudulent or not. This function will either be minimized or maximized, depending on the direction of our search. T, which is the tabu list, initially will be empty but as the search progresses, cases or patterns that have been searched and compared will be added to become tabu. V^*, at every iteration, will hold solutions, or partial solutions of the case that at the end may possibly help in forming the final solution. X is the set feasible solutions. f^* will act as the lower bound for f. and N(S): for a possible solution S would be the set of other plausible neighboring solutions that are obtained by some variation on a rule or a test case in the knowledge base.

As we stated previously, TS will perform search intensification quite well. That is, it will explore more thoroughly portions of the search space that seem "promising". Therefore, if we encounter some promising leads during the search, they will be looked at more closely to see if they can lead to a partial or an overall solution.

The other main property of TS is search diversification, which is a mechanism to "force" the search into previously unexplored areas. For the fraud detection case, it is quite possible to overlook certain clues about either the way the card is used or the type of merchandise purchased or the time frequencies the purchases are made, etc. Perhaps, some of these overlooked clues may be of great help in shaping up partially or fully the final picture about the case. Using search diversification, TS will explore these clues if they are available somewhere in the search space.

certainly, TS is an interesting search approach that we can use for tackling the detection problem. To speed the process of detection, we could adapt probabilistic TS. This will be faster because it only deal with a part of the overall search space. But the disadvantage is that good solutions can be missed.

Physical Optimization Methods for Detection of Credit Card Fraud

One possible system would be a feedback network with a non-binary encoding since the final decision of any detection would be made in one-of-K with K > 2 . To simulate the network, we can use the Potts spins model by first deriving the mean field equations corresponding to K-state Potts spins. One advantage of this method is that by using the mean field approximation, continuous variables "feel their way" in a fuzzy manner towards the neighborhood or the area of the desired solution. This is in contrast to expert systems approach, where moves are performed within the discrete solution space.

On the surface, credit card fraud detection does not look like an optimization problem, but dealing with large number of test cases, rules, data, constraints, etc., it can be viewed as an optimization problem. Essentially, what we try to optimize is the degree of detection of the case at hand. Does it look more of a fraudulent than non-fraudulent or is it the other way around? Answering this question is quite similar to optimizing a set of figures either up or down, etc. Using this neural net/mean field approach, here is roughly how we go about doing the simulations:

Map the problem onto a feedback network by a suitable encoding of the solution space and an appropriate choice of an "energy" function.
Utilize prior knowledge (if it exist) about phase transition properties from analyzing the linearized dynamics. MF equation also need to be set up with some care in cases when the penalty terms are nonpolynomial, as in the case of inequality constraints.
While annealing solve the corresponding mean field equations iteratively.
When the MF equation have converged, check the solution of their legality, whether or not they satisfy the basic rules, etc. If not, use method such as greedy heuristics for post-processing, or re-anneal the system with modified constraints coefficients.

As we all know mean field annealing is only a deterministic approximation to simulated annealing, by attempting to average over the statistics of the annealing process. This will result in a faster execution time at the expense of solution quality. Therefore, the alternative is to adapt simulated annealing (SA) itself to this problem. First, we need to come up with a suitable method of moves and comparisons on the search space, as well as with a good adaptive annealing schedule. Second, we have to put together a representation of our knowledge base that would be mappable into SA search space. Without having really a good mapping, SA won't be able to explore the search space in any efficient way, resulting on solutions with unacceptably high costs.

Perhaps, a better approach is to couple a case-based or a rule-based expert system with SA resulting in a multi-phase processor. Since fast responses are quite essential when dealing fraud detection problems, pre-processing can be performed by the first phase, perhaps producing new information or relations that can be used in the second phase to help come up with the final solution. This surely will not only speed up the process but also produce a more accurate result than using SA without the preprocessor.

Saleh Elmohamed