
Research Article


A New CMMNAF Modular Exponentiation Algorithm by using a New Modular Multiplication Algorithm 

Abdalhossein Rezai
and
Parviz Keshavarzi



ABSTRACT

This study presents a novel modular multiplication algorithm based on a new multiplier NonAdjacent Form (NAF). In this new multiplier representation, the sliding window method is applied on the canonical recoded multiplier to reduce the average number of multiplier digits and thereby the average number of the required clock cycle for the modular multiplication algorithm. Moreover, a new and efficient modular exponentiation algorithm is presented based on this new modular multiplication algorithm, CommonMultiplicandMultiplication (CMM) method and parallel structure. In this new CMMNAF modular exponentiation algorithm, not only the common part of the modular multiplication is computed once rather than several times but also the modular multiplication and modular squaring operations are performed in parallel. Our analysis shows that the average number of required clock cycle in the proposed CMMNAF modular exponentiation algorithm is reduced in comparison with other modular exponentiation algorithms considerably.





Received: November 03, 2011;
Accepted: January 31, 2012;
Published: February 25, 2012


INTRODUCTION
Recently, cryptography algorithms and protocols have been used to provide the
information security (Wang, 2011; Sharma
et al., 2006; Zaidan et al., 2010;
Jaberi et al., 2012; Nikooghadam
et al., 2008). PublicKey Cryptography (PKC) is an important component
of the cryptography (Olorunfemi et al., 2007;
Xia et al., 2010; Azeez et
al., 2012; Salem et al., 2011). The modular
exponentiation with large modulus is a crucial operation in many PKCs such as
RSA and DSA (Rabah, 2006). This operation is implemented
by repeating modular multiplication. So, the efficiency of many PKCs is determined
by the efficiency of the modular multiplication algorithm and the number of
required modular multiplication in the modular exponentiation algorithm (Nedjah
and Murelle, 2009a; Wu, 2009a).
Montgomery modular multiplication algorithm (Montgomery,
1985) is an efficient algorithm for the modular multiplication because it
avoids division by the modulus (Wu, 2009a; Nedjah
and Murelle, 2009a; Xiangyan et al., 2009).
There are many research efforts to speed up the performance of the Montgomery
modular multiplication algorithm such as highradix design (Pinckney
and Harris, 2008; Tawalbeh et al., 2005),
scalable design (Shieh and Lin, 2010; Pinckney
and Harris, 2008), parallel calculation quotient and partial result (Keshavarzi
and Harrison, 1998) and signeddigit recoding (Koc and
Hung, 1992; Philips and Burgess, 2004).
Moreover, there are many research efforts to reduce the number of modular multiplication
such as sliding window method (Nedjah and Murelle, 2009a,
b), signeddigit recoding (Wu, 2009a;
Wu et al., 2008) and CommonMultiplicandMultiplication
(CMM) method (Ha and Moon, 1998; Wu,
2009a, b; Rezai and Keshavarzi,
2011).
This study presents a new modular multiplication algorithm based on the NonAdjacent Form (NAF) and sliding window method. In this new algorithm, sliding window method is applied on the canonical recoded multiplier to reduce the average number of the multiplier digits and thereby, the average number of the required clock cycle (multiplication step) for the modular multiplication algorithm. Moreover, this study presents a novel modular exponentiation algorithm, called CMMNAF modular exponentiation algorithm, by using this new modular multiplication, the CMM method and the parallel structure. Present analysis shows that the average number of required clock cycle in the CMMNAF exponentiation algorithm is reduced in comparison with other CMM modular exponentiation algorithms considerably. PRELIMINARIES
The Montgomery modular multiplication algorithm: Montgomery
(1985) proposed a modular multiplication algorithm which speeds up the modular
multiplication and modular exponentiation algorithm. The Montgomery modular
multiplication algorithm replaces the trial division by the modulus with a simple
right shift. Algorithm 1 shows the radix2 Montgomery modular
multiplication algorithm.
Algorithm 1: 
The radix2 Montgomery modular multiplication algorithm(Mont(X,Y)) 

In this algorithm, the inputs are nbit X, Y and M. The output is S (n) = XY2^{n} mod M. This algorithm computes S(n) = XY2^{n} mod M in nclock cycle. So, it is a timeconsuming operation.
The modular exponentiation algorithm: As modular exponentiation consists
of series of the modular multiplications, the performance of the modular multiplication
is determined by the efficiency of the implementation of the modular multiplication
(Wu, 2009a; Nedjah and Murelle,
2009a). The RighttoLeft (R2L) Montgomery modular exponentiation algorithm
is shown in Algorithm 2:
Algorithm 2: 
The R2L Montgomery modular exponentiation algorithm 

In this algorithm, the inputs are A, E, R = 2^{n} and N. The output
is C = A^{E} mod N. In Algorithm 2, when the exponent
bit is nonzero, both Mont (S, C) and Mont (S, S), are performed. Ha
and Moon (1998) proposed the common part in Mont (S, C) and Mont (S, S)
can be computed once rather than twice. There are many attempts (Ha
and Moon, 1998; Wu et al., 2008; Wu,
2009a, b) to speed up the performance of the modular
exponentiation algorithm based on this idea. One of the recent attempts is the
parallel CMMCSD Montgomery algorithm (Wu, 2009b) which
is shown in Algorithm 3.
Algorithm 3: 
The parallel CMMCSD Montgomery modular exponentiation algorithm 

In this algorithm, the inputs are M, E_{CSD}, N and R = 2^{n}
where, E_{CSD} is canonical recoded E. the outputs are
mod N and
mod N. In Algorithm 3, the modular squaring and modular multiplication
operations are executed in parallel. Moreover, the registers C and D are used
to store the operation results based on the positive digit and negative digits
in exponent E_{CSD}, respectively. This algorithm uses CMM method to
compute the common part of three multiplications just once.
THE PROPOSED CMMNAF MODULAR EXPONENTIATION
In serialparallel multiplication, partial result shifts one bit per iteration.
Moreover, multiplication by zero bit results in zero but this multiplication
by zero is performed and implemented per iteration. In this study, we proposed
a new modular multiplication by recoding and then by partitioning the multiplier.
This new modular multiplication performs multiplication by zero partition with
any length in only one clock cycle instead of several clock cycles. The proposed
modular multiplication algorithm is shown in Algorithm 4.
Algorithm 4: 
The proposed modular multiplication algorithm 


In this algorithm, the inputs are nbit integer X, Y and M. The output is P = XY2^{n} mod M. m is the number of partitions in the multiplier, l_{i} is the length (i.e., the number of digits) of ith partition and V_{i} is the ith partition value.
In the recoding phase of this new algorithm, the canonical recoding is performed
on the multiplier. In the partitioning phase, the partitioning is performed
on the resulted signeddigit multiplier. The partitioning strategy instrumented
in this algorithm scans the multiplier from the least significant digit to the
most significant digit according to Algorithm 5. In this strategy,
the zero windows are allowed to have an arbitrary length but the maximum length
of the nonzero windows should be the exacted value of d digit.
Algorithm 5: 
The partitioning algorithm 

In the precomputation phase of Algorithm 4, the least significant
digit of the nonzero partition is either 1 or
which implies that the nonzero partition value is always an odd number. So,
we require only precomputation of V_{i}Y for odd value of V_{i}.
It should be noted that, the precomputation phase and the partition phase are
performed independently in parallel. This also speeds up the modular multiplication.
The multiplication phase of Algorithm 4 is performed m times.
Recall that, m denotes the number of partitioning in the signeddigit multiplier.
In each iteration of the multiplication phase of algorithm 5, l_{i}
bits of the multiplier and nbit multiplicand are processed.
We also proposed using this new modular multiplication algorithm in order to
speeding up the CMMCSD Montgomery exponentiation algorithm (Wu,
2009b) as shown in Algorithm 6.
Algorithm 6: 
The proposed CMMNAF modular exponentiation algorithm 


In this algorithm, the inputs are M, E_{CSD}, N and R = 2^{n}
where, E_{CSD} is canonical recoded E. the outputs are C = M^{E[1]}
mod N; D = M^{E[1]} mod N. In this algorithm, for e_{i} = 1,
both Z.C mod N and Z.S mod N are computed in parallel. Similarly, for e_{i}
= ,
both Z.D mod N and Z.S mod N are computed in parallel. In addition by using
CMM method, the common part of two operations is computed once rather than twice.
In step 1 of this algorithm, S is computed by using Algorithm
4. In step 2, Z is computed by executing steps 25 of Algorithm
4 on S after computing m_{0}. R. In steps 3 and 4 of Algorithm
6,
and
are computed based on the value of the e_{i}. These values are computed
by executing steps 610 of Algorithm 4. In step 5 of Algorithm
6, the partial result S is computed by performing steps 610 of Algorithm
4. In step 6 of this algorithm Z is computed by performed steps 24 of Algorithm
4. This step is computed after computing Z_{0}.S. In this algorithm,
the exponentiation operation
is depicted as
= CxD^{1}.
EVALUATION
In the proposed modified modular multiplication algorithm, sliding window method
is performed on the canonical recoded multiplier. So, the Hamming weight of
the multiplier is 3n/3d+4. Therefore, based on the computational analyses of
Ha and Moon (1998), for nbit modulus, both operations
Z.C mod N, and Z.D mod N require:
Multiplication steps (clock cycles). Moreover, the operation Z.S mod N requires multiplication steps:
In the proposed CMMNAF modular exponentiation for e_{i} = 1, both
Z.C mod N and Z.S mod N are performed in parallel. Similarly, for e_{i}
= ,
both Z.D mod N and Z.S mod N are performed in parallel. In addition, the radix2
signeddigit exponent is used. Thus, the probability of digits “0”,
“1” and “1” is 2/3, 1/6 and 1/6, respectively. Therefore,
the proposed modular exponentiation algorithm for kbit exponent takes following
multiplication steps:
However, the Dusse and Kaliski’s exponentiation algorithm (Dusse
and Kaliski, 1990), the HaMoon’s improved Montgomery exponentiation
algorithm (Ha and Moon, 1998), the CMMMSD algorithm
(Wu et al., 2008), the Wu’s improved CMMMSD
exponentiation algorithm (Wu, 2009a) and the Wu’s
parallel CMMCSD exponentiation algorithm (Wu, 2009b)
require 1.5k (2n^{2}+n), 0.5k(5n^{2}+4n), 0.5k(2n^{2}+2n+0.75),
1.833k(n^{2}n2) and 0.5k(2n^{2}+2n+5.33) multiplication steps,
respectively. So, the proposed modular exponentiation algorithm reduces the
required multiplication steps (clock cycles) in comparison with other similar
work as follow:
Figure 1 summarizes these multiplication steps (clock cycles)
improvement for the proposed CMMCSD modular exponentiation algorithm in comparison
with exponentiation algorithm by Dusse and Kaliski (1990),
Ha and Moon (1998), Wu (2009a,b)
and Wu et al. (2008) for various window widths.
The results show that this new CMMNAF exponentiation algorithm reduces the
average number of required clock cycles by about 69.288.2, 63.185.9, 7.764.7,
49.680.7 and 7.764.7% in comparison with Dusse and Kaliski
(1990), Ha and Moon (1998), Wu
et al. (2008) and Wu (2009a, b),
respectively for d = 310. In addition, the algorithm complexity of the proposed
CMMNAF modular exponentiation algorithm is reduced in comparison with Wu
(2009a) and Rezai and Keshavarzi (2011).

Fig. 1: 
The clok cycle (multiplication step) improvement of the proposed
CMMNAF modular exponentaition algorithm 
CONCLUSIONS
This study presents a novel modular exponentiation algorithm based on a novel
modular multiplication, called CMMNAF modular exponentiation algorithm. The
CMMNAF modular exponentiation uses other techniques such as CMM method and
parallel processing. In the proposed modular multiplication algorithm, the sliding
window method is performed on the signeddigit multiplier to reduce the average
number of multiplier digits and thereby, to reduce the average number of the
required clock cycles for the modular multiplication algorithm. Using the CMM
method in the CMMNAF modular exponentiation algorithm, the common part of the
modular multiplication is computed once rather than several times. Moreover,
by using the parallel processing, the speed of the proposed modular exponentiation
increases considerably. The results show that the average number of required
clock cycles (multiplication steps) in the proposed CMMNAF exponentiation algorithm
is reduced by 69.288.2, 63.185.9, 7.764.7, 49.680.7 and 7.764.7% in comparison
with Dusse and Kaliski (1990), Ha
and Moon (1998), Wu et al. (2008) and Wu
(2009a, b), respectively for d = 310.

REFERENCES 
1: Azeez, N.A., A.P. Abidoye, K.K. Agbele and A.O. Adesina, 2012. A dependable model for attaining maximum authentication security procedure in a grid based environment. Trends Appl. Sci. Res., 7: 7886. CrossRef 
2: Dusse, S.R. and B.S. Kaliski, 1990. A Cryptographic Library for the Motorola DSP 56000. In: Advance in Cryptology, Damgard, I.B. (Eds.). SpringerVerlag, Berlin, Germany, pp: 230244
3: Ha, J.C. and S.J. Moon, 1998. A commonmultiplicand method to the Montgomery algorithm for speeding up exponentiation. Inf. Proc. Lett., 66: 105107. CrossRef 
4: Jaberi, A., R. Ayanzadeh and A.S.Z. Mousavi, 2012. Twolayer cellular automata based cryptography. Trends Applied Sci. Res., 7: 6877. CrossRef  Direct Link 
5: Keshavarzi, P. and C. Harrison, 1998. A new modular multiplication algorithm for VLSI implementation of publickey cryptography. Proceedings of the 1st International Symposium on Communication Systems and Digital Signal Processing, April 68, 1998, Sheffield Hallam University Press Learning Centre, UK, pp: 516519
6: Koc, C.K. and C.Y. Hung, 1992. Adaptive mary segmentation and canonical recoding algorithms for multiplication of large binary numbers. Comput. Math. Appl., 24: 312.
7: Montgomery, P.L., 1985. Modular multiplication without trial division. Math. Comput., 44: 519521. Direct Link 
8: Nedjah, N. and L.M. Mourelle, 2009. A hardware/software codesign versus hardwareonly implementation of modular exponentiation using the slidingwindow method. J. Circuts, Syst. Comput., 18: 295310.
9: Nedjah, N. and L.M. Mouller, 2009. Highperformance hardware of the slidingwindow method for parallel computation of modular exponentiations. Int. J. Para. Prog., 37: 537555. CrossRef  Direct Link 
10: Nikooghadam, M., M.R. Bonyadi, E. Malekian and A. Zakerolhosseini, 2008. A protocol for digital signature based on the elliptic curve discrete logarithm problem. J. Applied Sci., 8: 19191925. CrossRef  Direct Link 
11: Olorunfemi, T.O.S., B.K. Alese, S.O. Falaki and O. Fajuyigbe, 2007. Implementation of elliptic curve digital signature algorithms. J. Software Eng., 1: 112. CrossRef  Direct Link 
12: Phillips, B. and N. Burgess, 2004. Minimal weight digit set conversions. IEEE Trans. Comput., 53: 666677. CrossRef 
13: Pinckney, N. and D. Harris, 2008. Parallelized radix4 scalable montgomery multipliers. J. Integr. Circuits Syst., 3: 3945.
14: Rabah, K., 2006. Implementing secure RSA cryptosystems using your own cryptographic JCE provider. J. Applied Sci., 6: 482510. CrossRef  Direct Link 
15: Rezai, A. and P. Keshavarzi, 2011. Highperformance modular exponentiation algorithm by using a new modified modular multiplication algorithm and commonmultiplicandmultiplication method. Proceedings of the World Congress on Internet Security, February 2123, 2011, London, United Kingdom, pp: 192197 Direct Link 
16: Salem, Y., M. Abomhara, O.O. Khalifa, A.A. Zaidan and B.B. Zaidan, 2011. A review on multimedia communications cryptography. Res. J. Inform. Technol., 3: 146152. CrossRef  Direct Link 
17: Sharma, S., R.C. Jain and S. Bhadauria, 2006. A power efficient encryption algorithm for multimedia data in mobile Ad hoc network. Trends Applied Sci. Res., 1: 416425. CrossRef  Direct Link 
18: Shieh, M. and W. Lin, 2010. Wordbased Montgomery modular multiplication algorithm for lowlatency scalable architectures. IEEE Trans. Comput., 59: 11451151.
19: Tawalbeh, L.A., A.F. Tenca and C.K. Koc, 2005. A radix4 scalable design. IEEE. Potentials, 24: 1618.
20: Wang, Y., 2011. Unconditional security of cryptosystem: A review and outlook. Trends Applied Sci. Res., 6: 554562. CrossRef  Direct Link 
21: Wu, C.L., 2009. An efficient commonmultiplicandmultiplication method to the Montgomery algorithm for speeding up exponentiation. Inf. Sci., 179: 410421. CrossRef 
22: Wu, C., 2009. Fast parallel montgomery binary exponentiation algorithm using canonical signeddigit recoding technique. Proceedings of the 9th International Conference on Algorithms and Architectures for Parallel Processing, June 811, 2009, Taipei, Taiwan, pp: 428438
23: Wu, C.L., D.C. Lou and T.J. Chang, 2008. An efficient montgomery exponentiation algorithm for publickey cryptosystem Proceedings of the International Conference on Intelligence and Security Information, June 1720, 2008, Taipei, Taiwan, pp: 284285 CrossRef 
24: Xia, Y., L. Kuang and M. Zhu, 2010. A hierarchical access control scheme in cloud using HHECC. Inform. Technol. J., 9: 15981606. CrossRef  Direct Link 
25: Xiangyan, F., Z. Jiahang, X. Tinggang and Y. Youguang, 2009. The researcher and implement of highspeed modular multiplication algorithm basing on parallel pipelining. Proceedings of the AsiaPacific Conference on Information Processing, July 1819, 2009, China, pp: 398403 CrossRef  Direct Link 
26: Zaidan, A.A., B.B. Zaidan, A.K. AlFrajat and H.A. Jalab, 2010. An overview: Theoretical and mathematical perspectives for advance encryption standard/rijndael. J. Applied Sci., 10: 21612167. CrossRef  Direct Link 



