Factoring on the Web A Prototype of WebWork Pervasive Implementation of HPCC Supercomputing 95 December 3-8,1995 San Diego Convention Center Boston University, Cooperating Systems, NPAC Presentation by Geoffrey Fox Syracuse University 111 College Place Syracuse NY 13244-4100 Abstract of Supercomputing 95 Fafner Presentation This short presentation has some notes on factoring on the Web prepared by Jim Cowie from material from Lenstra and Leyland Also a summary of key features of Fafner as a World Wide Computing System Go to http://http.npac.syr.edu/factoring.html for a full description and pointers to other collaboraters and further instructions on how to get your software and get started on breaking the Bank of England We describe why RSA security status is equivalent to factoring large numbers into two large primes A very handwavy description is given of the strategy to find factors with the GNFS -- Generalized Number Field Sieve and its predecessors Key Features of FAFNER/WebWork Dec 4,1995 -- I See also discussion in webworksept95 and (computing part of) sc95tutorial presentations Currently RS130 Ongoing execution rate is a sustained 5 gigaflops per second RSA130 needs a few teraop-hours and should finish in next two months RSA130 is running on a range of machines from an IBM 386 laptop to a SP2 all over the world. Web approach "beta release" at moment and has run without serious complaint for two weeks including one software upgrade. Will release generally soon Key Features of FAFNER/WebWork Dec 4,1995 -- II Computer resource allocation performed with a combination of human and computer agents. RSA155 -- a key of size used by significant enterprises -- is expected to require about 2 teraop-months There is an interesting memory -- CPU tradeoff Algorithms are more efficient with large memory but Small memory machines cheaper and more common Settop boxes would have 4-8 megabytes of memory each We requires a 4 megabyte workspace as minimum Key Features of FAFNER/WebWork Dec 4,1995 -- III Fafner is part of an incremental strategy of understanding use of Web Technologies in Computing Fafner addresses embarassingly parallel computations Dataflow is next big class with "WebFlow" a bit like a Web version of AVS Eventually we need to tackle major "loosely synchronous/Data Parallel" class Here we start with "Global Arrays" from Pacific NorthWest Laboratory as these applied to many interesting chemistry problems involving large grain size matrix computations which should be efficient on the Web with high overheads at present This will lead to WebHPF perhaps built around HPVRML! RSA: Public Key Cryptosystem RSA Security Systems built around numbers RSA-N = Prime1 * Prime2 A Product of two large primes which are secret whereas "everybody" in principle is meant to be able to know public modulus RSAm which can be used by anybody to encrypt messages Only the recipient is meant to be able to decrypt message using secret prime factor Secrecy of the cryptosystem is that of the secret key Secrecy of the secret key is no more than the factorization of the public modulus How hard is Factoring the Public Modulus? Rivest, Shamir and Adleman (RSA!) posed RSA-129 in August 1977 Scientific American, predicting 40 quadrillion years as a solution time Actual time to crack RSA-129 was about eight months on the Internet (April 1994) A nice illustration of importance of algorithm improvements as well as VLSI based compute performance improvements! Still, many sites still use keys as small as 512 bits long (155 decimal digits) to protect enormous amounts of code and data Using the latest factoring technology it will only take a few tera-op months to factor a 512 binary bit number! General Number Field Sieve -- GNFS: The latest (mathematical) factoring technology Sieving over sets of "Q-values" (i.e., very large ranges of integers) Task specifications are small (the integer bounds of the Q-range) Task results are large (collections of "partial relations") These results must be accumulated, sorted, and run through a neat graph algorithm before they are fed into the final bitmatrix reduction step (done on Maspar in previous factoring efforts but any reasonable machine will handle) Forms an ideal, embarrassingly parallel application for Web-wide coordination: very large, naturally distributed database computation which can be done with PC's up. Note we build on and still support factoring by email preferred by users in secure/uncommunicative environments! Sieving (RSA-129 and MPQS, RSA-130 and GNFS) RSA129 used the double partial variation of the Multiple Polynomial Quadratic Sieve (MPQS), the predecessor to the Number Field Sieve (NFS), in turn the predecessor to the General Number Field Sieve (GNFS) General factoring challenge: find x,y such that x^2 == y^2 mod RSA-N. This would give good odds that gcd(x+y, RSA-N) is not 1 or RSA-N. Therefore, we'd have a factor of RSA-N. BASIC OBSERVATIONS AND TERMS Numbers, "a", for which we can find an integer "b" where a == b^2 (modulo rsa129) are called quadratic residues modulo RSA-N . Roughly half the integers are quadratic residues Values of "a" which factorize into only small primes out of a short list (the "factor base") plus at most two more primes < 2^30 are "special" If there are 2 more such primes in the factorization of "a", then "a" if a "double partial", 1 such prime, a "partial", and none, a "full" relation. SIEVING THE Q-INTERVAL Quadratic residues given by a certain polynomial and divisible by a particular prime are regularly spaced. (Don't ask why unless answerer is Math Whiz). In sieving you add the log of the prime every such regular distance through initially zeroed memory, for each prime in the factor base. When done, we pick out places where the accumulation exceeds a threshold; that's where you find quadratic residues divisible by lots of small primes. These residues are therefore more likely to be double-partial, partial, or full relations! Extracting The Relations For places that satisfy threshold constraint, the client workers emit the prime factorizations for the "a"s and print out the matching "b"s. If there's a set of "a"'s whose product's factorization consists of primes raised to EVEN powers, then the product is a square, and so is the product of all the matching b^2 terms (trivially). This would give the happy result that x^2==y^2 (modulo RSA-N), and we'd have our factorization! FIRST Backend Processing STAGE GRAPH REDUCTION Finding such a set of "a"'s with prime factors raised to an even power, is pretty non-trivial, and is not performed by the sieving clients or servers. Initially wee derive a network in which nodes are primes and edges are drawn between the two large primes in a double-partial relation, or between 1 and the single prime of a partial relation. Then we run around looking for cycles in the network, which represent large primes that appear as squares when the two corresponding residues are multiplied together. Multiply all relations together, divide by all squared large primes, and you turn lots of partials and double partials into full relations. SECOND Backend Processing STAGE GRAPH REDUCTION Take the processed partials/double-partials and the true "fulls" and now we want to find a new "a" which on combining current a's is a proper square number! Perform a Gaussian elimination looking for a linear dependency module two of a bitmatrix of F columns and R rows, where F is the size of the small-primes factor base and R is the number of quadratic residues. This is a big but not tera-op (for RSA155) computation. But at the end, if a dependency falls out, we know the factors of RSA-N. Notes on Current -- Dec 2 -- Status We are about half way through he RSA130 project with some 40 million relations -- mainly at this stage from email initial implentation Total computing power used is a few teraop hours -- RSA155 will use teraop-months! Computational resources are drawn from USA, Canada, UK, The Netherlands, Germany, Australia and Taiwan and so this is the world wide metacomputer! Currently using spare cycles on 100's of workstations -- main platform Should be complete in a month or so