-
Notifications
You must be signed in to change notification settings - Fork 8
Imported from svn://scm.gforge.inria.fr/svnroot/ecm/
License
GPL-3.0, LGPL-3.0 licenses found
Licenses found
GPL-3.0
COPYING
LGPL-3.0
COPYING.LIB
sethtroisi/gmp-ecm
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This is the README file for GMP-ECM. (See INSTALL-ecm for installing GMP-ECM and the ecm library, and README.lib for using the ecm library.) Table of contents of this file: 1. Basic usage. 2. How to use P-1, P+1, and ECM efficiently? 3. Extra factors and Brent-Suyama's extension. 4. Memory usage. 5. Expression syntax reference for GMP-ECM's syntax parser. 6. Options -param, -sigma, -x0, -y0, -A, -torsion. 7. Options -save, -resume and -chkpnt. 8. How to get the best of GMP-ECM? 9. GMP-ECM and GPU. 11. Record factors. 11. Known problems. ############################################################################## 1. Basic usage GMP-ECM reads the numbers to be factored from stdin (one number on each line) and requires a numerical parameter, the stage 1 bound B1. A reasonable stage 2 bound B2 for the given B1 is chosen by default, but can be overridden by a second numerical parameter. By default, GMP-ECM uses the ECM factoring algorithm. Example: To run one curve of ECM with B1=1000000 on each number in the file "composites", run ecm 1000000 < composites To use a B2 value of ~5*10^8 instead of the default value of ~10^9, run ecm 1000000 5e8 < composites Scientific notation is accepted for B1 and B2 values. The actual B2 value used may be larger than the specified value to let parameters satisfy some conditions imposed by the stage 2 algorithm. To run one curve with B1=11e7 on M1061, simply do: echo "2^1061-1" | ecm 11e7 To run more than one ECM curve on each input number, use the -c parameter. Example: to run 100 curves with B1=1000000 and default B2 on each number in "composites", run ecm -c 100 1000000 < composites To use the P-1 or P+1 factoring methods, use the -pm1 or -pp1 parameter, respectively. Example: to use the P-1 method with B1=10^9 on all numbers in the file "composites", run ecm -pm1 1e9 < composites Note that, unlike for ECM, using the same B1,B2 bounds on one number is quite useless for P-1, and of limited use for P+1. See "2. How to use P-1, P+1, and ECM efficiently?" ############################################################################## 2. How to use P-1, P+1, and ECM efficiently? The P-1 method works well when the input number has a prime factor P such that P-1 is "smooth", i.e., has all its prime factor less or equal the step 1 bound B1, except one which may be less or equal the second step bound B2. For P=67872792749091946529, we have P-1 = 2^5 * 11 * 17 * 19 * 43 * 149 * 8467 * 11004397, so this factor will be found as long as B1 >= 8467 and B2 >= 11004397: $ echo 67872792749091946529 | ./ecm -pm1 -x0 2809890345 8467 11004397 GMP-ECM ... [powered by GMP ...] [P-1] Input number is 67872792749091946529 (20 digits) Using B1=8467, B2=6710-19370830, x0=2809890345 Step 1 took 3ms Step 2 took 14ms ********** Factor found in step 2: 67872792749091946529 Found input number N There is no need to run P-1 several times with the same B1 and B2 as there is for ECM, since a factor found with one seed will (almost always) be found by another one. Note: if a factor of P-1 if known, you can specify it with -go. For example it is known that the Fermat number 2^(2^n)+1 has factors of the form k*2^(n+2)+1, thus 2^(n+2) is a known factor of P-1. The -go expression has a special placeholder 'N' which stands for the input number, for example: $ echo "2^(2^12)+1" | ecm -go "N-1" -pm1 1e6 The P+1 method works well when the input number has a prime factor P such that P+1 is "smooth". For P=4190453151940208656715582382315221647, we have P+1 = 2^4 * 283 * 2423 * 21881 * 39839 * 1414261 * 2337233 * 132554351, so this factor will be found as long as B1 >= 2337233 and B2 >= 132554351: $ echo 4190453151940208656715582382315221647 | ./ecm -pp1 -x0 7 2337233 132554351 GMP-ECM ... [powered by GMP ...] [P+1] Input number is 4190453151940208656715582382315221647 (37 digits) Using B1=2337233, B2=2324738-343958122, x0=7 Step 1 took 750ms Step 2 took 120ms ********** Factor found in step 2: 4190453151940208656715582382315221647 Found input number N However not all seeds will succeed: only half of the seeds 'x0' work for P+1 (namely those where the Jacobi symbol of x0^2-4 and P is -1.) Unfortunately, since P is usually not known in advance, there is no way to ensure that this holds. However, if the seed is chosen randomly, there is a probability of about 1/2 that it will give a Jacobi symbol of -1 (i.e., the factor P will be found if P+1 is smooth enough). A rule of thumb is to run 3 times P+1 with different random seeds. The seeds 2/7 and 6/5 have a slightly higher chance of success than average as they lead to a group order divisible by 6 or 4, respectively. When factoring Fibonacci numbers F_n or Lucas numbers L_n, using the seed 23/11 ensures that the group order is divisible by 2n, making other P+1 (and probably P-1) work unnecessary. As of version 6.2, a new stage 2 for the P-1 and P+1 algorithms is implemented. It uses less memory and is faster than the previous code, thus allowing larger B2 values. If GMP-ECM is configured with the "--enable-openmp" flag and is compiled with a compiler that implements OpenMP, it uses multi-threading for computation of polynomial roots and NTT multiplication. When not using the NTT, it benefits from multi-threading only in the computation of roots phase. The number of threads to use can be controlled with the OMP_NUM_THREADS environment variable. Unlike the previous generic stage 2, the new stage 2 cannot use the Brent-Suyama extension (-power and -dickson parameters). Specifying these options on the command line forces use of the generic stage 2. Note: the notation of the parameters follows that in the paper, the number of multi-point evaluations (similar to "blocks") is given by s_2. You can specify a lower limit for s_2 by the -k command line parameter. The ECM method is a probabilistic method, and can be viewed in some sense as a generalization of the P-1 and P+1 method, where we only require that P+t+1 is smooth, where t depends on the curve we use and satisfies |t| <= 2*P^(1/2) (Hasse's theorem). The optimal B1 and B2 bounds have to be chosen according to the (usually unknown) size of P. The following table gives a set of nearly optimal B1 and B2 pairs, with the corresponding expected number of curves to find a factor of given size (column "-power 1" does not take into account the extra factors found by Brent-Suyama's exten- sion, whereas column "default poly" takes them into account, with the poly- nomial used by default: D(n) means Dickson's polynomial of degree n): digits D optimal B1 default B2 expected curves N(B1,B2,D) -power 1 default poly 20 11e3 1.9e6 74 74 [x^1] 25 5e4 1.3e7 221 214 [x^2] 30 25e4 1.3e8 453 430 [D(3)] 35 1e6 1.0e9 984 904 [D(6)] 40 3e6 5.7e9 2541 2350 [D(6)] 45 11e6 3.5e10 4949 4480 [D(12)] 50 43e6 2.4e11 8266 7553 [D(12)] 55 11e7 7.8e11 20158 17769 [D(30)] 60 26e7 3.2e12 47173 42017 [D(30)] 65 85e7 1.6e13 77666 69408 [D(30)] Table 1: optimal B1 and expected number of curves to find a factor of D digits with GMP-ECM. After performing the expected number of curves from Table 1, the probability that a factor of D digits was missed is exp(-1), i.e., about 37%. After twice the expected number of curves, it is exp(-2), i.e., about 14%, and so on. Example: after performing 8266 curves with B1=43e6 and B2=2.4e11 (or 7553 curves with -dickson 12), the probability to miss a 50-digit factor is about 37%. From version 6.0 on, GMP-ECM prints the expected number of curves and expected time to find factors of different sizes in verbose mode (option -v). This makes it easy to further optimize parameters for a certain factor size if desired: simply try to minimize the expected time. (lengthy NOTE: The order of an elliptic curve with Montgomery parameterization is known to be divisible by 4. Depending on the parametrization that is used (see Section 6) this number is increased in different ways. Let us call T the known torsion (depending on the parametrization). One can assume that the probability that the order is B1,B2 smooth should be about as great as for a random integer 1/Tth in value. However, Montgomery observed that the order behaves even nicer than that: heuristically, it seems that the order is as more likely to be smooth than a random integer about 1/Tth. In "Finding ECM-Friendly Curves through a Study of Galois Properties", Barbulescu, Bos, Bouvier, Kleinjung and Montgomery showed how to computed the average valuation that should be used instead of the average valuation of random number. This is the values we use in GMP-ECM and the computed probabilities match those observed in experiments very well. This however means that the so computed values for the expected number of curves for given B1,B2 values and factor sizes may not match those published in the previous literature. The factor GMP-ECM uses is defined as ECM_EXTRA_SMOOTHNESS in rho.c, EXTRA_SMOOTHNESS_SQUARE and EXTRA_SMOOTHNESS 32BITS_D in ecm.c depending on the parametrization. You can change it to ECM_EXTRA_SMOOTHNESS 3.0 EXTRA_SMOOTHNESS_SQUARE 0.333333333 EXTRA_SMOOTHNESS 32BITS_D 0.333333333 if you want to reproduce the more pessimistic values found in the literature.) In summary, we advise the following method: 0 - choose a target factor size of D digits 1 - choose optimal B1 and B2 values to find factors of D digits (cf Table 1) 2 - run once P-1 with 10*B1, and the default B2 chosen by GMP-ECM 3 - (optional) run 3 times P+1 with 5*B1, and the default B2 4 - run N(B1,B2,D) times ECM with those B1 and B2, where N(B1,B2,D) is the expected number of ECM curves with step 1 bound B1, step 2 bound B2, to find a factor of D digits (cf above table). 5 - if no factor is found, either increase D by 5 digits and go to 0, or use another factorization method (MPQS, NFS) Note: if a factor is found in steps 2, 3 or 4, simply continue the current step with the remaining cofactor (if composite). There is no need to start again from 0, since the cofactor was already tested, too. ############################################################################## 3. Extra factors and Brent-Suyama's extension. GMP-ECM may sometimes find some "extra" factors, such that one factor of P-1, P+1 or P+t+1 exceeds the step 2 bound B2, thanks to Brent-Suyama's extension. Let's explain how it works for P-1, since it's simpler. The classical step 2 (without Brent-Suyama's extension) considers s^(j*d) mod N and s^i mod N, where N is the number to factor, and s is the residue computed in stage 1. Here, d is fixed, and the integers i and j vary in two sets so that j*d-i covers all primes in [B1, B2]. Now consider a polynomial f(x), and compute s^f(j*d) and s^f(i) instead of s^(j*d) and s^i [thus the classical step 2 corresponds to f(x)=x^1]. Then P will be found whenever all but one of the factors of P-1 are <= B1, and one factor divides some f(j*d) - f(i): $ echo 1207946164033269799036708918081 | ./ecm -pm1 -k 3 -power 12 286493 25e6 GMP-ECM ... [powered by GMP ...] [P-1] Input number is 1207946164033269799036708918081 (31 digits) Using B1=286493, B2=30806172, polynomial x^12, x0=1548711558 Step 1 took 320ms Step 2 took 564ms ********** Factor found in step 2: 1207946164033269799036708918081 Found input number N Here the largest factor of P-1 is 83957197, which is 3.35 times larger than B2. Warning: these "extra" factorizations may not be reproducible from one version of GMP-ECM to another one, since they depend on some internal parameters that may change. For P-1 with the generic stage 2, the degree of the Brent-Suyama polynomial should be even. Since i^2k - (j*d)^2k = (i^k - (j*d)^k)(i^k + (j*d)^k), this allows testing two values symmetric around a multiple of d simultaneously, halving the amount of computation required in stage 2. P+1 with the generic stage 2 and ECM do this inherently. The new fast stage 2 for P-1 and P+1 does not support the Brent-Suyama extension. By default, the fast stage 2 is used for P-1 and P+1; giving a -power or -dickson parameter on the command line forces use of the previous, generic stage 2. It is recommended to use the new stage 2 (from version 6.2) for P-1 and P+1, which is the default: it is so much faster that it largely compensates the few extra factors that are not found because Brent-Suyama's extension is not available. The default polynomial used for ECM with a given B2 should be near optimal, i.e., give only a marginal overhead in step 2, while enabling extra factors. ############################################################################## 4. Memory usage. Step 1 does not require much memory: O(n) for an input number of n digits. Step 2 may be quite memory expensive, especially for large B2, since its efficient algorithms use some large tables. To reduce the memory usage of step 2, you may increase the 'k' parameter, which controls the number of "blocks" performed in step 2. Multiplying the default value of k by 4 will decrease the memory usage by a factor of 2. For example with B2=1e10 and a 155-digit number, step 2 requires about 55MB with the default k=4, but only 27MB with k=16. Increasing k does, however, slightly increase the time required for step 2 (see section "How to get the best of GMP-ECM?"). An estimation of the memory usage is given at the start of stage 2: $ ecm -v -k 4 10 1e10 < c155 ... Estimated memory usage: 55M ... Step 2 took 18649ms $ ecm -v -k 16 10 1e10 < c155 ... Estimated memory usage: 27M ... Step 2 took 26972ms Another way is to use the -treefile parameter, which causes some of the tables to be stored on disk instead of in memory. Using the option "-treefile /var/tmp/ecmtree" will create the files "/var/tmp/ecmtree.1", "/var/tmp/ecmtree.2" etc. The files are deleted upon completion of stage 2: $ ecm -v -treefile /tmp/ecmtree -k 4 10 1e10 < c155 ... Estimated memory usage: 36M ... Step 2 took 18648ms Due to time consuming disk I/O, this will cause stage 2 to take somewhat longer. How much memory is saved depends on stage 2 parameters, but a typical value is that memory use is reduced by a factor of about 1.5. Increasing the number of blocks with -k also reduces the amount of data that needs to get written to disk, thus reducing disk I/O time. Combining these parameters is a very effective way of reducing memory use. Up from version 6.1, there is still another (better) possibility, with the -maxmem option. The command-line -maxmem nnn option tells GMP-ECM to use at most nnn MB in stage 2. It is better than -k because it takes into account the size of the number to be factored, and automatically adjusts the number of blocks to use: $ ./ecm -v -maxmem 40 10 1e10 < c155 ... dF=8192, k=15, d=79170, d2=11, i0=-10 ... Estimated memory usage: 27M ... Step 2 took 25456ms ############################################################################## 5. Expression syntax reference for GMP-ECM's syntax parser. GMP-ECM can handle several kinds of expressions as input numbers. Here is the syntax that is handled: 1. Raw decimal numbers like 123456789 2. Comments can be placed in the file. The C++ "one line comment" // is used. Everything after the // on a line (including the //) is ignored. Warning: no input number should appear on such a comment line. 3. Line continuation. If a line ends with a backslash character '\', it is considered it continues on the next line (ignoring the '\'). 4. Any white space (space, tab, end of line) is ignored. However, the "end of line" is used to end the expression (unless of course there is a '\' character before the end of line). For example, processing this: 1 2 3 4 5 6 7 8 9 would be the same as processing 123456789 5. "common" arithmetic expressions (* / + - %), the period '.' might be used in place of * for multiply, and - can be unary minus (e.g., -55555551). Example: echo "3*5+2" | ./ecm 100 6. Grouping ( [ { for start of group (which symbol is used does not matter) and ) ] } to end a group (again all 3 symbols mean the SAME thing). 7. Exponentiation with the ^ character (i.e., 2^24 is the same as 16777216). Example: echo "2^24+1" | ./ecm 100 8. Simple factorial using the exclamation point ! character. Example is 53! == 1*2*3*4...*52*53. Example: echo '53!+1' | ./ecm 1e2 9. Multi-factorial as in: n!m with an example: 15!3 == 15.12.9.6.3. 10. Simple Primorial using the # character with example of 11# == 2*3*5*7*11 11. Reduced Primorial n#m with example of 17#5 == 5.7.11.13.17 12. Functions are possible with the expression parser. Currently, the only available function is Phi(x,n), however other functions should be easy to add in the future. Note: Expressions are maintained as much as possible (even if the expression becomes longer than the decimal expansion). Expressions are output as cofactors (if the input was an expression), and are stored into save/resume files (again if and only if the original input was an expression, and not an expanded decimal number). When a factor is found, the cofactor expression is of the form (original_expression)/factor_found (see however option -cofdec): $ echo "3*2^210+1" | ./ecm -sigma 4007218240 2500 GMP-ECM ... [powered by GMP ...] [ECM] Input number is 3*2^210+1 (64 digits) Using B1=2500, B2=186156, polynomial x^1, sigma=4007218240 Step 1 took 16ms Step 2 took 16ms ********** Factor found in step 2: 1358437 Found probable prime factor of 7 digits: 1358437 Probable prime cofactor (3*2^210+1)/1358437 has 58 digits ############################################################################## 6. Options -param, -sigma, -x0, -y0, -A, -torsion. This section explains the behavior of the options 'param', 'sigma', 'x0', 'y0', 'A' and 'torsion' when GMP-ECM is used to run ECM. The parameters A, x0 and y0 can have integer or rational values, for example -A 22/7 -x0 1/3 -y0 2/7. a) with -param <= 4 (default value is 0): curves in Montgomery form ------------------------------------------------------------------- In stage 1, the elliptic curves used are of the following Montgomery form: b*y^2 = x^3 + A*x^2 + x. The starting point in ( x0 : : 1). In this case we do not need the value of y0. Either sigma is given, in which case the values of A and x0 are deduced from sigma and the params option (see below). Or A and x0 are given (in which case both must be given). The -param option is used to choose the parametrization from which the curves are taken. It can take 4 values (s is the 'sigma' parameter): 0: use Suyama parametrization. It was the parametrization used previous versions of GMP-ECM. u = s^2-5, v = 4*s, A = (v-u)^3*(3*u+v)/(4*u^3*v)-2, x0=u^3/v^3. 1: (only for 64-bit machine) A = 4*s^2-2, x0 = 2. 2: use an elliptic parametrization to compute d(s) such that the curve has a 6-torsion point. A = 4*d(s)-2, x0 = 2. 3: it is mostly used for the gpu computation. A = 4*s-2, x0 = 2. Generally, the parameter s is chosen randomly but the -sigma option can be used to choose its value. In the case where only "-sigma s" is used, "-param 0" is assumed in order to be compatible with older version of GMP-ECM. The above is also true when resuming from a file (see Section 7). In the case where only "-param p" is used, the parameter is chosen at random by the ecm library. In the case where neither "-sigma" nor "-param" is used, the choice of the parametrization is left to ecm library and the parameter is chosen at random. In the case where the two options are used, "-param p -sigma s" can be shorten in "-sigma p:s". The curve can also be chosen with the -A option which directly specifies the values of the coefficient a of the curve. In this case the -x0 option is mandatory. The -x0 option can be used to override the x0 value in the case where param=0 and is mandatory in the case the curve is given with the -A option. In the case where param=1, param=2 or param=3, x0 must be equal to 2, so this should not be used. b) with param = 5: curves in Weierstrass form --------------------------------------------- In this case we use curves in Weierstrass form: E: y^2 = x^3 + A*x + b The user must provide all of A, x0 and y0. GMP-ECM will consider that (x0, y0) is a point on E, where b = y0^2-x0^3-A*x0 mod N. Step 2 is performed as usual, since it already uses the Weierstrass form. The options of Section 7 apply. Applications include using curves with some exotic prescribed torsion subgroup (see d below), or CM (Complex Multiplication) curves. c) with param = 6: curves in Hessian form ----------------------------------------- This is to use curves in Hessian form: X^3+Y^3+1=3*A*X*Y, A^3 <> 1 mod N. These curves have torsion group Z3xZ3 over Q(sqrt(-3)) and are useful when the prime factor p we are looking for is known to satisfy p = 1 mod 3. It should be used with -A <D> -x0 <x0> -y0 <y0> where (x0, y0) is a base point on the curve. Since p = 1 mod 3 is quite frequent, a try at this version is not unthinkable in general. d) -torsion: ------------ This option enables the user to ask for a given curve with specific torsion group (over the rationals). For instance -torsion Z5 -sigma 2 will generate a curve E over the rationals with torsion group Z/5Z and a point of infinite order, both generated from parameter sigma = 2. Note that in the present case, -param is overriden by whatever "good" form the curves are built with that prescribed torsion. Available torsion groups (over the rationals) are: * Z5: for sigma != 0, -1/2, -1/3, -1/4, cf. [1] * Z7: [1] * Z9: [1] * Z10: [1] * Z2xZ8: [1] * Z3xZ3 * Z4xZ4 References: [1] Atkin/Morain, Math. Comp., 1993. ############################################################################## 7. Options -save, -resume and -chkpnt. These -save and -resume options are useful to save the current state of the computation after step 1, or to exchange data with other software. It allows to perform step 1 with GMP-ECM, and step 2 with another software (or vice-versa). Note: the residue from the end of stage 1 gets written to the file only after stage 2, if stage 2 is performed in the same program run. This way, if a factor is found, the save file entry will contain the new cofactor (if composite) or will be omitted (if cofactor is a probable prime). For periodic saving during stage 1 for crash recovery, use -chkpnt, described below. Here is an example how to reuse some P-1 computation: $ cat c71 13155161912808540373988986448257115022677318870175067553764004308210487 $ ./ecm -save toto -pm1 -mpzmod -x0 2 5000000 < c71 GMP-ECM ... [powered by GMP ...] [P-1] Input number is 13155161912808540373988986448257115022677318870175067553764004308210487 (71 digits) Using B1=5000000, B2=352526802, polynomial x^24, x0=2 Step 1 took 3116ms Step 2 took 2316ms The file "toto" now contains some information about the method used, the step 1 bound, the number to factor, the value X at the end of step 1 (in hexa- decimal), and a checksum to verify that no data was corrupted: $ cat toto METHOD=P-1; B1=5000000; N=13155161912808540373988986448257115022677318870175067553764004308210487; X=0x12530157ae22ae14d54d6a5bc404ae9458e54032c1bb2ab269837d1519f; CHECKSUM=2287710189; PROGRAM=GMP-ECM 6.2; X0=0x2; [email protected]; TIME=Sat Apr 12 13:41:01 2008; Then one can resume the computation with larger B1 and/or B2 as follows: $ ./ecm -resume toto 1e7 GMP-ECM ... [powered by GMP ...] [ECM] Resuming P-1 residue saved by [email protected] with GMP-ECM 6.2 on Sat Apr 12 13:41:01 2008 Input number is 13155161912808540373988986448257115022677318870175067553764004308210487 (71 digits) Using B1=5000000-10000000, B2=9946848-1326917772, Step 1 took 3076ms Step 2 took 4304ms ********** Factor found in step 2: 1448595612076564044790098185437 Found probable prime factor of 31 digits: 1448595612076564044790098185437 Probable prime cofactor 9081321110693270343633073697474256143651 has 40 digits The second run only considered the primes in [5e6-10e6] in step 1, which saved half the time of step 1. The format used is the following: - each line corresponds to a composite (expression ARE saved in the save file) - a line contains assignments <id>=<value> separated by semi-colons ';' - possible values for <id> are - METHOD (value = ECM or P-1 or P+1) - PARAM (value = ECM parametrization) [ECM only] (see Section 6) - SIGMA (value = ECM sigma parameter) [ECM only] (see Section 6) - B1 (first step bound) - N (composite number to factor) - X (value at the end of step 1) - A (A-parameter of the elliptic curve) [ECM only] - CHECKSUM (internal value to check correctness of the format) - PROGRAM (program used to perform step 1, useful for factor credits) - X0 (initial point for ECM, or initial residue for P-1/P+1) [optional] - WHO (who performed step 1) - TIME (date and time of first step) PARAM, SIGMA and X0 would be optional, and would be mainly be used in case of a factor is found, to be able to reproduce the factorization. For ECM, one of the SIGMA or A values must be present, so that the computation can be continued on the correct curve. The B1 and X values satisfy the condition that X is a lcm(1,2,...,B1)-th power in the (multiplicatively written) group. If consecutive lines in a save file being resumed contain the same number to be factored, say when many ECM curves on one number have been saved, factors discovered by GMP-ECM are carried over from one attempt to the next so that factors will be reported only once. If the cofactor is a probable prime, or if the -one option was given and a factor was found, the remaining consecutive lines for that number will be skipped. Note: it is allowed to have both -save f1 and -resume f2 for the same run, however the files f1 and f2 should be different. Remark: you should not perform in parallel several -resume runs on the same input with the same B1/B2 values, since those runs will do the same computations. Options -save/-resume are useful in the following cases: (a) somebody did a previous step 1 computation with another software which is faster than GMP-ECM, and wants to perform Step 2 with GMP-ECM which is faster for that task. (b) somebody did a previous step 1 for P-1 or P+1 up to a given bound B1, and you want to extend that computation with B1' > B1, without restarting from scratch. Note: this does not apply to ECM, where the smoothness property depends on the (random) curve chosen, not only on the input number. (c) you did a huge step 1 P-1 or P+1 computation on a given machine, and you want to perform a huge step 2 in parallel on several machines. For example machine 1 tests the range B2_1-B2_2, machine 2 tests B2_2-B2_3, ... This also decreases the memory usage for each machine, which is function of the range width B2max-B2min. For the same reason as (b), this does not apply to ECM. The -chkpnt option causes GMP-ECM to write the current residue periodically during the stage 1 computation. This is useful as a safeguard in case the GMP-ECM process is terminated, or the computer loses power, etc. The checkpoint is written every ten minutes, when a signal (SIGINT, SIGTERM) is received, and at the end of stage 1. The format of the checkpoint file is very similar to that of regular save files, and checkpoints can be resumed with the -resume option. For example: $ ecm -chkpnt pm1chkpoint -pm1 1e10 1 < largenumber.txt [Computer crashes during computation] $ ecm -resume pm1chkpoint 1e10 1 Note: if an existing file is specified as the checkpoint file, it will be silently overwritten! Note 2: When resuming a checkpoint file, additional small primes may be processed in stage 1 when the checkpoint file is resumed, so the end-of-stage 1 residues of an uninterrupted run and a checkpointed run may not match. The extra primes do not reduce the probability of finding factors, however. ############################################################################## 8. How to get the best of GMP-ECM? [this part has to be rewritten, beginning of new part] After configuring GMP-ECM, and "make", do: $ ./bench_mulredc # or bench_mulredc.exe under Windows This will output at the end two lines as follows: /* 0:mulredc 1:mul+redc_1 2:mul+redc_2 3:mul+redc_n */ #define TUNE_MULREDC_TABLE {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0} /* 0:mulredc 1:sqr+redc_1 2:sqr+redc_2 3:sqr+redc_n */ #define TUNE_SQRREDC_TABLE {0,0,0,0,0,0,0,0,1,1,3,1,1,3,1,1,1,1,2,1,2} Then put those lines in the ecm-params.h file (which is a symbolic link), or replace the corresponding values in that file if already present, and recompile GMP-ECM with "make". Once you have optimized the base-case modular arithmetic, do: $ ./tune and replace the corresponding thresholds (#define ...) in ecm-params.h. [end of new part] Choice of modular multiplication. The ecm program may choose between 4 kinds of modular arithmetic: (1) Montgomery's REDC algorithm at the word level (option -modmuln). It is quite fast for small numbers, but has quadratic asymptotic complexity. (2) classical GMP arithmetic (option -mpzmod). Has some overhead with respect to (1) for small sizes, but wins over (1) for larger sizes since it has quasi-linear asymptotic complexity. (3) Montgomery's REDC algorithm at high level (option -redc). This essentially replaces each division by two multiplications. Slower than (1) and (2) for small inputs, but better for large or very large inputs. (4) base-2 arithmetic for numbers dividing 2^n+1 or 2^n-1. Each division has only linear time, but the multiplication are more expensive since they are done on larger numbers. The ecm program automatically selects what it thinks is the best arithmetic for the given input number. If that choice is not optimal, you may force the use of a certain arithmetic by trying options -modmulm, -mpzmod, -redc. (The best choice should depend on B1 and B2 only very little, so long as B1 is not too small, say >= 1000.) Number of step 2 blocks. The step 2 range [B1,B2] is divided into k "big blocks". The default value of k is chosen to be near to optimal. However, it may be that for a given (B1,B2) pair, another value of k may be better. Try for this to give the option -k <value> to ecm, where <value> is 1, 2, 3, ... This will force ecm to divide step 2 in at least <value> blocks. Changing the value of the number of blocks will not modify the chance of finding a factor (except for extra factors, but some will be lost, and some will be won, so the balance should be nearly even). However it will change the time spent in Step 2 and modify the memory used by Step 2 (see the section "Memory usage"). Optimal thresholds. The thresholds for the algorithms used in ecm are defined in ecm-params.h. Several ecm-params.h.* files are included in the distribution and the configure script will select one matching your machine if it exists. If there is no ecm-params.h.* for your machine then you can either compile with default values (not recommended) or you can generate ecm-params.h first with "make ecm-params; make". Stage 2 now uses Number-Theoretic Transforms (NTT) for polynomial arithmetic by default for numbers of at most 30 machine words (NTT_SIZE_THRESHOLD in ecm-ecm.h). The NTT code forces dF to be a power of 2; it can be disabled by passing the command-line option -no-ntt and unconditionally enabled by -ntt. Performance of NTT is dependent on: - Architecture. NTT seems to give the greatest improvement on Athlons, and the least improvement on Pentiums without SSE2. - Thresholds. It is vital to have ecm-params.h properly tuned for your machine. - C compiler. The SSE2 assembly code for 32 bit and the assembly code for 64 bit only work for x86 using gcc or Intel cc, so it is compiler dependent. Note on factoring Fermat numbers: GMP-ECM features Schönhage-Strassen multiplication for polynomials in stage 2 when factoring Fermat numbers (not in the new, fast stage 2 for P+1 and P-1. This is to be implemented.) This greatly reduces the number of modular multiplications required, thus improving speed. It does, however, restrict the length of the polynomials to powers of two, so that for a given number of blocks (-k parameter), the B2 value can only increase by factors of approximately 4. For the number of blocks, choices of 2, 3 or 4 usually give best performance. However, if the polynomial degree becomes too large, relatively expensive Karatsuba or Toom-Coom methods are required to split the polynomial before Schönhage-Strassen's method can handle them. That can make a larger number of blocks worthwhile. When factoring the m-th Fermat number F_m = 2^(2^m)+1, degrees up to dF=2^(m+1) can be handled directly. If your B2 choice requires a degree much larger than this (dF is printed with the -v parameter), try increasing the number of blocks with -k and see if performance improves. The Brent-Suyama extension should not be used when factoring Fermat numbers, it is more efficient to simply increase B2. Therefore, -power 1 for P+1 and ECM, and -power 2 for P-1 are the default for Fermat numbers. (Larger degrees for Brent-Suyama may possibly become worthwhile for P-1 runs on smaller Fermat numbers and extremely large B2, when Karatsuba and Toom-Cook are used extensively.) Factoring Fermat numbers uses a lot of memory, depending on the size of the Fermat number and on dF. For dF=65536 and F_12, the memory used is about 1700MB. If your system does not have enough memory, you will have to use a larger number of blocks to reach the desired B2 value with a smaller poly degree dF, which sacrifices some performance. Additionally, you may use the -treefile option (see 4. Memory usage) k=1 k=2 k=3 k=4 dF=256 582132 1222002 1864182 2504052 dF=512 2443992 5008092 7572192 10131672 dF=1024 10016172 20263332 30519732 40766892 dF=2048 42634420 85689250 128744080 171798910 dF=4096 173259252 347500242 521780502 696021492 dF=8192 711738310 1425139180 2138540050 2851940920 dF=16384 2850278350 5703881830 8557643650 11411247130 dF=32768 11702792020 23412731170 35122670320 46832609470 dF=65536 48071333326 96165459406 144259585486 192353711566 dF=131072 194020810630 388069884940 582118959250 776168033560 Table 2: Stage 2 interval length B2-B2min, for dF a power of 2 and small values of k. For example, if you'd like to run stage 2 on F_12 with B2 ~= 40G, try parameters "-k 1 <B1> 48e9", "-k 3 <B1> 35e9" or "-k 4 <B1> 46e9". ############################################################################## 9. GMP-ECM and GPU The stage 1 of ECM can be computed on a Nvidia GPU. More details are given in README.gpu. ############################################################################## 10. Record factors. If you find a very large factor, the program will print a message like: Report your potential champion to <email address> (see <url for record factors>) This means that your factor might be a champion, i.e., one of the top-ten largest factors ever found by the corresponding method (P-1, P+1 or ECM). Cf the following URLs: ECM: http://wwwmaths.anu.edu.au/~brent/ftp/champs.txt P-1: http://www.loria.fr/~zimmerma/records/Pminus1.html P+1: http://www.loria.fr/~zimmerma/records/Pplus1.html ############################################################################## 11. Known problems. On some machines, GMP-ECM uses the clock() function to measure the cpu time used by step 1 and 2. Since that function returns a 32-bit integer, there is a possible wrap-around effect when the clock() value goes back from 2^32-1 to 0, which may produce negative timings. The NTT code uses primes that fit in one machine word and that are congruent to 1 (mod l), where l is the largest transform length required for the desired stage 2 parameters. For very large B2 on 32-bit machines, there may not be enough suitable primes, which may limit the possible transform length to less than what available memory would permit. This problem occurs mostly in the fast stage 2 for P-1 and P+1, as the generic stage 2 uses far more memory for a given polynomial degree, so that memory on a 32-bit machine will be exhausted long before suitable NTT primes are. The maximal transform length depends on the size of the input number. For a transform length l on a 32 bit machine, N must satisfy: l=2^11:N<2^756200, l=2^12:N<2^379353, l=2^13:N<2^190044, l=2^14:N<2^94870, l=2^15:N<2^47414, l=2^16:N<2^23322, l=2^17:N<2^11620, l=2^18:N<2^5891, l=2^19:N<2^2910, l=2^20:N<2^1340, l=2^21:N<2^578, l=2^22:N<2^228. Since log(N)*l is approximately constant, this limits the amount of memory that can be used to about 600MB for P-1, and 1200MB for P+1.
About
Imported from svn://scm.gforge.inria.fr/svnroot/ecm/
Resources
License
GPL-3.0, LGPL-3.0 licenses found
Licenses found
GPL-3.0
COPYING
LGPL-3.0
COPYING.LIB
Stars
Watchers
Forks
Packages 0
No packages published