[US Patent & Trademark Office, Patent Full Text and Image Database]      
        [Home] [Boolean Search] [Manual Search] [Number Search] [Help]         
                              [HIT_LIST] [Bottom]                              
                  [View Shopping Cart] [Add to Shopping Cart]                  
                                    [Image]                                    
                               ( 1 of 1 )
-------------------------------------------------------------------------------
United States Patent                                                  5,999,627
Lee ,   et al.                                                 December 7, 1999
-------------------------------------------------------------------------------
Method for exponentiation in a public-key cryptosystem

                                   Abstract                                    

The present invention relates to an improved method for performing modular
exponentiation to a fixed base element. The method includes exponentiating a
first digital input signal g by a second digital input signal R, where g is a
fixed signal unique to a cryptographic system and R is a randomly generated
digital signal, to compute a third digital signal g.sup.R. The exponentiating
includes pre-computing and storing a plurality of values depending only upon
the fixed signal g in a plurality of memory locations within a computing device
and then speeding up the computation of g.sup.R using the stored values. The
invented exponentiation method can substantially reduce the amount of
computation required to compute the value for g.sup.R. Exponentiation methods
according to embodiments of the present invention may be used in a variety of
cryptographic systems, e.g., Schnorr identification scheme, Digital Signature
Standard (DSS), and Diffie-Hellman key agreement scheme, etc.
-------------------------------------------------------------------------------
Inventors: Lee; Pil-joong (Pohang, KR); Lim; Chae-hoon (Kyungsangnam-do, KR)    
Assignee:  Samsung Electronics Co., Ltd. (Kyungki-do, KR)                       
Appl. No.: 003875                                                               
Filed:     January 7, 1998                                                      
                       Foreign Application Priority Data                       
                -----------------------                
         Jan 07, 1995[KR]                                   95-224             

Current U.S. Class:                                     380/30; 380/28; 708/606
Intern'l Class:                                        H04K 001/00; G06E 001/04
Field of Search:                                  380/28,30 708/620-632,606,491
-------------------------------------------------------------------------------
                       References Cited [Referenced By]                        
-------------------------------------------------------------------------------
                             U.S. Patent Documents                             
5299262             Mar., 1994          Brickell et al.                 380/28.
5870478             Feb., 1999          Kawamura                        380/30.

                                                                       
  Other    Advances in Cryptology-Eurocrypt '92, "Fast Exponentiation  
References with Precomputation," by Brickell et al., pp. 200-207, May  
           1992.                                                       
           Advances in Cryptology-Crypto '94, "More Flexible           
           Exponentiation with Precomputatiuon," by Lim et al, Aug.    
           1994.                                                       

Primary Examiner: Hayes; Gail O.
Assistant Examiner: Sayadian; Hrayr A.
Attorney, Agent or Firm: Sughrue, Mion, Zinn, Macpeak & Seas, PLLC
-------------------------------------------------------------------------------
                               Parent Case Text                                
-------------------------------------------------------------------------------


This disclosure is a continuation-in-part of U.S. patent application Ser. No.
08/467,310, filed Jun. 6, 1995, now abandoned.
-------------------------------------------------------------------------------
                                    Claims                                     
-------------------------------------------------------------------------------


We claim:

1. In a cryptographic system, a computer-implemented method for transforming a
first signal into a second signal in a manner infeasible to invert, the method
comprising:

exponentiating a first digital input signal g by a second digital input signal
R, wherein g is a fixed signal unique to a cryptographic system and R is a
randomly generated digital signal of n bits, n being an integer specified by
the cryptographic system, to obtain a third digital signal g.sup.R, wherein the
exponentiating comprises:

(a) determining integers h and v, and calculating a and b such that a is the
least integer not less than a fractional number n/h and b is the least integer
not less than a fractional number a/v, whereby an arbitrary n bit number can be
divided into h blocks of size a and each divided block can be further
subdivided into v smaller blocks of size b;

(b) precomputing and storing, in a plurality of memory locations G[j][f]
indexed by j and f, 0.ltoreq.j.ltoreq.v-1 and 0.ltoreq.f.ltoreq.2.sup.h -1,
within a computing device, a plurality of values

(g.sub.h-1.sup.e.sbsp.h-1 g.sub.h-2.sup.e.sbsp.h-2 . . . g.sub.1.sup.e.sbsp.1
g.sub.0.sup.e.sbsp.0).sup.2.spsp.jb

where g.sub.1 =g.sup.2.spsp.ia for 0.ltoreq.i.ltoreq.h-1, and f=e.sub.h-1
e.sub.h-2 . . . e.sub.1 e.sub.0 with e.sub.i =0 or 1, wherein the index f
corresponds to the decimal value of the binary representation e.sub.h-1
e.sub.h-2 . . . e.sub.1 e.sub.0 and ranges from 0 to 2.sup.h -1;

(c) dividing the exponent R into h blocks R.sub.i of size a for i=0, 1, . . . ,
h-1 and subdividing each of the blocks R.sub.i into v sub-blocks R.sub.i,j of
size b for j=0, 1, . . . , v-1; and

(d) computing the second signal x=g.sup.R, using the stored values from the
memory locations G[j][I.sub.j,k ], as

(e) using the second signal x to encrypt or decrypt data. ##EQU26## where
I.sub.j,k denotes an h-bit number formed by taking the k-th bits from the
sub-blocks indexed by j in the second subscript, the sub-blocks including
R.sub.h-1,j, . . . , R.sub.1,j, R.sub.0,j.

2. The computer-implemented method of claim 1, wherein computing g.sup.R
according to step (d) includes utilizing p processors operating in parallel
wherein the i-th processor, for 0.ltoreq.i.ltoreq.p, i and p being integers
computes the value ##EQU27## using the stored values from the memory location G
[j][I.sub.j,k ] for iw.ltoreq.iw+w and 0.ltoreq.I.sub.j,k <2.sup.h, to produce
p computational results, where w is a least integer not less than a fractional
number v/p and when v is not equal to a product pw, the index j in the (p-1)-th
processor ranges from j=(p-1)w to v-1, and then multiplying together said p
computational results to compute the value of g.sup.R.

3. In a cryptographic system, a computer-implemented method for transforming a
set of input digital signals into an output digital signal, said method
comprising:

exponentiating each input signal x.sub.i by another input signal K.sub.i of b
bits for i=0, 1, . . . , m-1, wherein x.sub.i and K.sub.i are variable values
processed in a cryptographic system, and multiplying the exponentiated values
together to produce an output digital signal ##EQU28## wherein the
exponentiating comprises: (a) arranging m exponents K.sub.i into an h.times.v
matrix form, h and v being integers such that a product hv is not less than the
integer m, and z being expressed as ##EQU29## (b) computing and temporarily
storing a plurality of values X[j][f] in a plurality of memory locations within
a computing device, wherein X[j][f] denotes a value stored at a two dimensional
array X indexed by row j and column f in a memory of the computing device
defined by an expression: ##EQU30## for each j(0.ltoreq.j.ltoreq.v-1) and f
(0.ltoreq.i.ltoreq.2.sup.h -1), where f=e.sub.h-1 e.sub.h-2 . . . e.sub.1
e.sub.0 with e.sub.i =0 or 1, and wherein f denotes the decimal value of the
binary representation e.sub.h-1 e.sub.h-2 . . . e.sub.1 e.sub.0 ; and

(c) computing the output signal z using the values stored in step (b) as ##
EQU31## where I.sub.j,k denotes an h-bit integer formed by the k-th bit from
each element of the j-th column in the h.times.v matrix consisting of b-bit
sub-blocks

(d) using the output signal z to encrypt or decrypt data.

4. The computer-implemented method of claim 3, wherein step (c) is performed in
parallel using p processors wherein the i-th processor, for 0.ltoreq.i<p, i and
p being integers computes ##EQU32## using the stored values from the memory
locations X[j][I.sub.j,k ] for iw.ltoreq.iw+w and 0.ltoreq.I.sub.j,k <2.sup.h,
to produce p computational results, where w is a least integer not less than a
fractional number v/p and in case that v is not equal to a product pw, the
index j in the (p-1)-th processor ranges from j=(p-1)w to v-1, and then
multiplying together said p computational results to compute the value for Z.

5. A computer-implemented method for verifying the identity of an individual or
a device, the method comprising:

(a) determining integers h and v, and calculating a and b such that a is the
least integer not less than a fractional number n/h, n being an exponent size
in bits specified by a particular identification scheme, and b is the least
integer not less than a fractional number a/v, whereby an arbitrary n bit
number can be divided into h blocks of size a and each divided block can be
further subdivided into v smaller blocks of size b;

(b) precomputing and storing, in a plurality of memory locations G[j][f]
indexed by j and f, 0.ltoreq.j.ltoreq.v-1 and 0.ltoreq.f.ltoreq.2.sup.h -1,
within a computing device, a plurality of values

(g.sub.h-1.sup.e.sbsp.h-1 g.sub.h-2.sup.e.sbsp.h-2 . . . g.sub.1.sup.e.sbsp.1
g.sub.0.sup.e.sbsp.0).sup.2.spsp.jb

where g.sub.i =g.sup.2.spsp.ia for 0.ltoreq.i.ltoreq.h-1, g being a fixed
signal unique to said identification scheme, and f=e.sub.h-1 e.sub.h-2 . . .
e.sub.1 e.sub.0 with e.sub.i =0 or 1, that is the index f corresponds to the
decimal value of the binary representation e.sub.h-1 e.sub.h-2 . . . e.sub.1
e.sub.0 and ranges from 0 to 2.sup.h -1;

(c) exponentiating the signal g by a first randomly generated digital signal R
of size n bits to obtain a second digital signal x=g.sup.R, wherein the
exponentiating comprises:

dividing said exponent R into h blocks R.sub.i of size a for i=0, 1, . . . ,
h-1;

subdividing each of the blocks R.sub.i into predetermined v sub-blocks
R.sub.i,j of size b for j=0, 1, . . . , v-1; and

computing the second signal x=g.sup.R, using the stored values from the memory
locations G[j] [I.sub.j,k ], as ##EQU33## where I.sub.j,k denotes an h-bit
number formed by the k-th bits from said sub-blocks indexed by j in the second
subscript, the sub-blocks including R.sub.h-1,j, . . . , R.sub.1,j, R.sub.0,j ;

(d) transmitting the second signal x to a verifier;

(e) receiving a third signal e from the verifier;

(f) computing a fourth signal y using the second signal x, said third signal e
and other input signals specified by the identification scheme;

(g) transmitting the fourth signal y to the verifier; and

(h) validating the fourth signal by the verifier as originating from an
individual or a device.

6. A computer-implemented method for generating a digital signature to assure
the integrity and origin of a message, the method comprising:

(a) determining integers h and v, and calculating a and b such that a is the
least integer not less than a fractional number n/h, n being an exponent size
in bits specified by a particular digital signature scheme, and b is the least
integer not less than the fractional number a/v, whereby an arbitrary n bit
number can be divided into h blocks of size a and each divided block can be
further subdivided into v smaller blocks of size b;

(b) precomputing and storing, in a plurality of memory locations G[j][f]
indexed by j and f, 0.ltoreq.j.ltoreq.v-1 and 0.ltoreq.f.ltoreq.2.sup.h -1,
within a computing device, a plurality of values

(g.sub.h-1.sup.e.sbsp.h-1 g.sub.h-2.sup.e.sbsp.h-2 . . . g.sub.1.sup.e.sbsp.1
g.sub.0.sup.e.sbsp.0).sup.2.spsp.jb

where g.sub.i =g.sup.2.sbsp.ia for 0.ltoreq.i.ltoreq.h-1, g being a fixed
signal unique to the signature scheme, and f=e.sub.h-1 e.sub.h-2 . . . e.sub.1
e.sub.0 with e.sub.i =0 or 1, that is the index f corresponds to the decimal
value of the binary representation e.sub.h-1 e.sub.h-2 . . . e.sub.1 e.sub.0
and ranges from 0 to 2.sup.h -1;

(c) exponentiating the signal g by a first randomly generated digital signal R
of size n bits to obtain a second digital signal x=g.sup.R, wherein the
exponentiating comprises:

dividing the exponent R into h blocks R.sub.i of size a for i=0, 1, . . . ,
h-1;

subdividing each of the blocks R.sub.i into predetermined v sub-blocks
R.sub.i,j of size b for j=0, 1, . . . , v-1; and

computing the second signal x=g.sup.R, using the stored values from the memory
locations G[j][I.sub.j,k ], as ##EQU34## where I.sub.j,k denotes an h-bit
number formed by the k-th bits from the sub-blocks indexed by j in the second
subscript, the sub-blocks including R.sub.h-1,j, . . . , R.sub.1,j, R.sub.0,j ;

(d) generating a digital signature for a message m using the second signal x
and other input signals specified by the particular signature scheme;

(e) sending the digital signature and message to a verifier; and

(f) checking the validity of the signature for the message by the verifier as
originating from a legitimate user.

7. A computer-implemented method for sharing a common secret key by
Diffie-Hellman key exchange scheme wherein a first computer communicates with a
second computer to agree upon a common secret key K, the method comprising:

at a first computer:

(a) determining integers h and v, and calculating a and b such that a is the
least integer not less than a fractional number n/h, n being an exponent size
in bits specified by the key exchange scheme, and b is the least integer not
less than a fractional number a/v, whereby an arbitrary n bit number is divided
into h blocks of size a and each divided block can be further subdivided into v
smaller blocks of size b;

(b) precomputing and storing, in a plurality of memory locations G[j][f]
indexed by j and f, 0.ltoreq.j.ltoreq.v-1 and 0.ltoreq.f.ltoreq.2.sup.h -1,
within a computing device, a plurality of values

(g.sub.h-1.sup.e.sbsp.h-1 g.sub.h-2.sup.e.sbsp.h-2 . . . g.sub.1.sup.e.sbsp.1
g.sub.0.sup.e.sbsp.0).sup.2.spsp.jb

where g.sub.i =g.sup.2.spsp.ia for 0.ltoreq.i.ltoreq.h-1, g being a fixed
signal unique to the key exchange scheme, and f=e.sub.h-1 e.sub.h-2 . . .
e.sub.1 e.sub.0 with e.sub.i =0 or 1, wherein the index f corresponds to the
decimal value of the binary representation e.sub.h-1 e.sub.h-2 . . . e.sub.1
e.sub.0 and ranges from 0 to 2.sup.h -1;

(c) exponentiating the signal g by a first randomly generated digital signal R
of size n bits to obtain a second digital signal x=g.sup.R, wherein the
exponentiating comprises:

dividing said exponent R into h blocks R.sub.i of size a for i=0, 1, . . . ,
h-1;

subdividing each of the blocks R.sub.i into predetermined v sub-blocks
R.sub.i,j of size b for j=0, 1, . . . , v-1; and

computing the second signal x=g.sup.R, using the stored values from the memory
locations G[j][I.sub.j,k ], as ##EQU35## where I.sub.j,k denotes an h-bit
number formed by the k-th bits from the sub-blocks indexed by j in the second
subscript, the sub-blocks including R.sub.h-1,j, . . . , R.sub.1,j, R.sub.0,j ;

(d) transmitting x to a second computer;

at the second computer:

(e) randomly generating a third value S;

(f) computing a fourth signal y=g.sup.S according to steps (a)-(c);

(g) sending the fourth signal y to the first computer;

(h) at the first computer, computing a common key K, using the fourth signal y,
the first signal R, as K=y.sup.R =g.sup.SR ; and

(i) at the second computer, computing the common key K, using the second signal
x, the third signal S, as K=x.sup.S =g.sup.RS.

8. A cryptographic system where a first signal is transformed into a second
signal in a manner infeasible to invert, the system comprising:

an exponentiator for exponentiating a first digital input signal g by a second
digital input signal R, wherein g is a fixed signal unique to a cryptographic
system and R is a randomly generated digital signal of n bits, n being an
integer specified by the cryptographic system, to obtain a third digital signal
g.sup.r, wherein the exponentiator further comprises:

a means for determining integers h and v, and calculating a and b such that a
is the lest integer not less than a fractional number n/h and b is the least
integer not less than a fractional number a/v, whereby an arbitrary n bit
number can be divided into h blocks of size a and each divided block can by
further subdivided into v smaller blocks of size b;

a means for precomputing and storing, in a plurality of memory locations G[j]
[f] indexed by j and f, 0.ltoreq.f.ltoreq.2.sup.h -1, within a computing
device, a plurality of values

(g.sub.h-1.sup.e.sbsp.h-1 g.sub.h-2.sup.e.sbsp.h-2 . . . g.sub.1.sup.e.sbsp.1
g.sub.0.sup.e.sbsp.0).sup.2.spsp.jb

where g.sub.1 =g.sup.2ia for 0.ltoreq.i.ltoreq.h-1, and f=e.sub.h-1 e.sub.h-2 .
. . e.sub.1 e.sub.0 with e.sub.i =0 or 1, wherein the index f corresponds to
the decimal value of the binary representation e.sub.h-1 e.sub.h-2 . . .
e.sub.1 e.sub.0 and ranges from 0 to 2.sup. -1;

a divider for dividing the exponent R into h blocks R.sub.i,j of size b for j=
0, 1, . . . , v-1; and

a means for computing the second signal x=g.sup.R, using the stored values from
the memory locations G[j][I.sub.j,k ], as ##EQU36## where I.sub.j,k denotes an
h-bit number formed by taking the k-th bits from the sub-blocks indexed by j in
the second subscript, the sub-blocks including R.sub.h-1,j, . . . , R.sub.1,j,
R.sub.0,j ; and a means for using the second signal x in encrypting or
decrypting data.
-------------------------------------------------------------------------------
                                  Description                                  
-------------------------------------------------------------------------------


BACKGROUND OF THE INVENTION

The present invention relates to a method for speeding up modular
exponentiation for a fixed base element in a public-key cryptosystem, and more
particularly, to an exponentiation method using a fixed base element-dependent
pre-computation table.

Since Diffie and Hellman introduced the concept of a public-key cryptosystem in
1976, many cryptographic protocols have been developed based on the public-key
cryptographic method. A public-key cryptosystem utilizes the property that it
is computationally infeasible to derive a matching secret key from the public
key even if the public key is made public, as long as the underlying
mathematical problem remains intractable.

Two typical examples of such mathematical problems are the discrete logarithm
problem and the integer factorization problem. However, the cryptosystem
designed based on these problems gives a relatively poor performance compared
to a conventional or symmetric cryptosystem. Therefore, the development of
algorithms for reducing the amount of computation has been one of the important
research fields in modern cryptology.

The most frequently used operation required in a public key cryptosystem is an
exponentiation in a finite group. Typical groups include a multiplicative group
in an integer ring or finite field, and an additive group on points of an
elliptic curve defined in a finite field. In general, exponentiation implies
computing X.sup.R for two random elements X and R in a given group. However, in
many cryptographic protocols it is often necessary to compute g.sup.R for a
random exponent R and a fixed base element g. For example, many protocols have
been developed for authentication and digital signatures based on the
difficulty of solving the discrete logarithm problem.

The problem of speeding up exponentiation in a given group (usually Z.sub.N,
where N is a large prime number or the product of two large prime numbers) is
very important for the efficient implementation of most public-key
cryptosystems. Hereinafter, for the sake of convenience, it is assumed that the
computation is performed over Z.sub.N, and thus multiplication denotes
multiplication mod N. However, the method proposed in the present invention can
be adapted for any group. Throughout the following explanation, g will be used
as a fixed element in Z.sub.N and R represents an n-bit random exponent over
[0, 2.sup.n), .vertline.S.vertline. denotes the bit-length of S for an integer
S or the cardinality of S for a set S, .left brkt-top.x.right brkt-top. denotes
the smallest integer not less than x (e.g., .left brkt-top.1.29.right brkt-top.
=2), and .left brkt-bot.x.right brkt-bot. denotes the greatest integer not
greater than x (e.g., .left brkt-bot.1.29.right brkt-bot.=1).

A typical method for exponentiation is to use the binary algorithm, also known
as the square-and-multiply method. For a 512-bit modulus and exponent, this
method requires 766 multiplications on average and 1022 in the worst case. The
signed binary algorithm can reduce the required number of multiplications to
around 682 on average and 768 in the worst case.

On the other hand, using a moderate storage capacity for intermediate values,
the performance can be considerably improved. For example, Knuth's five-window
algorithm (see "The Art of Computer Programming," Vol. 2, Seminumerical
Algorithm, by D. E. Knuth, 1981) performs exponentiation in about 609
multiplications on average, including the on-line pre-computation operation of
sixteen multiplications.

The fastest algorithm known for exponentiation is the windowing method based on
addition chains, where bigger windows, for example, ten in size, are used and
more storage capacity for intermediate values are needed. Though finding the
shortest addition chain is an NP-complete problem, it is reported that, by
applying heuristics, an addition chain having a length of around 605 can be
computed.

These general methods can be used for any cryptosystem requiring exponentiation
such as the RSA (see "A Method for Obtaining Digital Signatures and Public-key
Cryptosystems," in Communications ACM, by R. L. Rivest, A. Shamir and L.
Adleman, 21(2), pp 126, 1978) and El Gamal (see "A Public Key Cryptosystem and
a Signature Scheme Based on the Discrete Logarithm," in IEEE Transactions on
Information Theory, by T. El Gamal, 31(4), pp 472, 1985) systems. However, in
many cryptographic protocols based on the discrete logarithm problem, it is
necessary to compute g.sup.R for a fixed base g but for a randomly chosen
exponent R. Therefore, one can construct a pre-computation table depending only
on the fixed base g, which can then be used to speed up the evaluation of
g.sup.R for any random exponent R. As will be seen later, such pre-computation
technique will substantially reduce the number of multiplications required for
exponentiation.

SUMMARY OF THE INVENTION

It is an object of the present invention to provide a method for exponentiation
in a public-key cryptosystem wherein exponentiation can be performed at high
speed using a fixed base element-dependent pre-computation table.

To accomplish the above object, the method for transforming a first signal into
a second signal in a manner infeasible to invert, comprises:

exponentiating a first digital input signal g by a second digital input signal
R, wherein g is a fixed signal unique to a cryptographic system and R is a
randomly generated digital signal of n bits, n being an integer specified by
said cryptographic system, to obtain a third digital signal g.sup.R, wherein
the exponentiating comprises:

(a) determining integers h and v, and calculating a and b such that a is the
least integer not less than the fractional number n/h and b is the least
integer not less than the fractional number a/v, whereby an arbitrary n bit
number can be divided into h blocks of size a and each divided block can be
further subdivided into v smaller blocks of size b;

(b) precomputing and storing, in a plurality of memory locations G[j][f]
indexed by j and f, 0.ltoreq.j.ltoreq.v-1 and 0.ltoreq.f.ltoreq.2.sup.h -1,
within a computing device, a plurality of values

(g.sub.h-1.sup.e.sbsp.h-1 g.sub.h-2.sup.e.sbsp.h-2 . . . g.sub.1.sup.e.sbsp.1
g.sub.0.sup.e.sbsp.0).sup.2.spsp.jb

where g.sub.i =g.sup.2.spsp.ia for 0.ltoreq.i.ltoreq.h-1, and f=e.sub.h-1
e.sub.h-2 . . . e.sub.1 e.sub.0 with e.sub.i =0 or 1, that is the index f
corresponds to the decimal value of the binary representation e.sub.h-1
e.sub.h-2 . . . e.sub.1 e.sub.0 and thus can take values from 0 to 2.sup.h -1;

(c) dividing the exponent R into predetermined h blocks R.sub.i of size a for i
=0, 1, . . . , h-1 and subdividing each of the blocks R.sub.i into
predetermined v sub-blocks R.sub.i,j of size b for j=0, 1, . . . , v-1; and

(d) computing the second signal x=g.sup.R, using the stored values from the
memory locations G[j][I.sub.j,k ], as ##EQU1## where I.sub.j,k denotes an h-bit
number formed by taking the k-th bits from the sub-blocks indexed by j in the
second subscript, i.e., R.sub.h-1,j, . . . , R.sub.1,j, R.sub.0,j.

BRIEF DESCRIPTION OF THE DRAWINGS

The above objects and advantages of the present invention will become more
apparent by describing in detail a preferred embodiment thereof with reference
to the attached drawings in which:

FIG. 1 shows a method of dividing and arranging an arbitrary n-bit exponent;

FIG. 2 is a table for the pre-computed values related to FIG. 1;

FIG. 3 is an example of the method of FIG. 1 for a 32-bit exponent;

FIG. 4 is a table for the pre-computed values related to FIG. 3;

FIG. 5 shows two different configurations having about the same storage
requirement in the pre-computed table;

FIGS. 6A and 6B are tables for explaining the performance of the Brickell,
Gordon, McCurley and Wilson method;

FIGS. 7A and 7B are tables for explaining the performance of the method
according to the present invention;

FIG. 8 is a table showing the number of multiplications required for signature
generation and verification in various signature schemes; and

FIG. 9 is a table showing the number of multiplications required for parallel
processing.

FIG. 10 is a block diagram of a cryptographic system utilizing an improved
method of exponentiation in accordance with an embodiment of the present
invention.

DETAILED DESCRIPTION OF THE INVENTION

First, the Brickell, Gordon, McCurley and Wilson method (hereinafter, the BGMW
method), a pre-computation method for speeding up the computation of g.sup.R
proposed by Brickell, Gordon, McCurley and Wilson at Eurocrypt '92, will be
briefly described, and then embodiments of the present invention will be
described in detail. The performance of each method will be compared via the
accompanying drawings.

In the BGMW method, the basic strategy is to represent an exponent R in base b
as R=d.sub.t-1 b.sup.t-1 + . . . +d.sub.1 +d.sub.0 and pre-compute all powers
g.sub.i =g.sup.b.spsp.i where 0.ltoreq.d.sub.i .ltoreq.b and 0.ltoreq.i<t.
Then, g.sup.R can be computed by the following equation. ##EQU2##

Using a basic digit set for base b, the basic scheme can be extended so that
the computation time can be further decreased while the storage capacity
required is increased accordingly.

The BGMW method will now be described in more detail.

A set of integers D is called a basic digit set for base b if any integer can
be represented in base b using digits from the set D. Then, a set M of
multipliers and a parameter h can be chosen so that the following equation (1)
becomes a basic digit set for base b.

D(M,h)={mk.vertline.m.epsilon.M, 0.ltoreq.k.ltoreq.h} (1)

Then, an n-bit exponent R can be represented as ##EQU3## where d.sub.i =m.sub.i
k.sub.i .epsilon.D(M,h). With this representation of R, g.sup.R can be computed
by the following equation (3). ##EQU4## where ##EQU5##

Therefore, the powers g.sup.m.sbsp.i.sup.b.spsp.i for all i<t and m.epsilon.M
need to be pre-computed and stored. Then, g.sup.R can be computed in, at most,
t+(h-2) multiplications using about t.vertline.M.vertline. pre-computed values,
by the following algorithm. ##EQU6##

It is easily seen that the number of multiplications performed by the above
algorithm is t+(h-2) in the worst case (t-h multiplications for computing
products of the form ##EQU7## for w=1, . . . , h and 2h-2 multiplications for
completing the for-loop) and t(b-1)/b+(h-2) on average since for a randomly
chosen exponent, t/b digits are expected to be zero.

The most obvious example for D is the base b number system (M={1}, h=b-1, t
=.left brkt-top.log.sub.b (2.sup.n -1).right brkt-top.). For a 512-bit
exponent, the choice of b=26 minimizes the expected number of multiplications.
This basic scheme requires 127.8 multiplications on average, 132 in the worst
case, and storage for 109 pre-computed values. A more convenient choice of base
would be b=32, since the digits for the exponent R can then be computed without
radix conversion by extracting five bits at a time. With this base, the
required number of multiplications is increased only by one for the average
case and remains unchanged for the worst case.

Brickell et al. also presented several schemes using other number systems to
decrease the number of multiplications required--of course using more storage
for pre-computed values. An extreme example is to choose the set M as follows:

M.sub.2 =m.vertline.1.ltoreq.m<b, w.sub.2 (m)=0 mod 2}

where w.sub.p (m) is the highest power of p dividing m (e.g., w.sub.2 (40)=3).
Then, for 1.ltoreq.d.sub.i <b, it can be seen that d.sub.i =m or d.sub.i =2 m
for some m.epsilon.M.sub.2 (i.e., h=2). In this manner, with the storage for
.vertline.M.sub.2 .vertline..left brkt-top.log.sub.b (2.sup.n -1).right
brkt-top. values, g.sup.R can be computed in t multiplications on average and t
(b-1l)/b multiplications in the worst case. For example, for b=256 (t=64 and
.vertline.M.sub.2 .vertline.=170), an average of 63.75 multiplications with
10,880 pre-computed values can be achieved. Two tables presented by Brickell et
al. are shown in FIGS. 6A and 6B.

Subsequently, the exponentiation method for computing g.sup.R according to the
present invention will be described.

The method according to the present invention is a generalization of the simple
observation that if an n-bit exponent R is divided into two equal blocks (i.e.,
R=2.sup.k R.sub.1 +R.sub.0. .vertline.R.sub.1 .vertline.=.vertline.R.sub.0
.vertline.=k=n/2, assuming n is even for simplicity) and g.sub.1 =
g.sup.2.spsp.k is pre-computed and stored, then the number of multiplications
required to compute g.sup.R can be reduced by almost one half by computing
g.sup.R as g.sup.R =g.sub.1.sup.R.sbsp.1 g.sup.R.sbsp.0. This computation can
be done by applying the binary exponentiation method in 2(k-1) multiplications
in the worst case and in 7/4(k-1) multiplications on average. Notice that the
number of multiplications required to compute g.sup.R is 2(n-1) in the worst
case and 3/2(n-1) on average when it is simply computed without precomputation
of g.sub.1 =g.sup.2.spsp.k.

The method according to the present invention is now described in more detail.

First, let R be an arbitrary n-bit exponent for which g.sup.R is to be
computed. The objective of the invention is to reduce the on-line computational
load to compute g.sup.R by utilizing a plurality of values precomputed from the
fixed base g and stored in a plurality of memory locations in a computing
device. Thus the method for computing g.sup.R according to the present
invention mainly comprise two steps. In the first step a plurality of values
only depending upon the fixed base g is precomputed and stored in a memory
device. This step is done only once for a given base g and the bit-length n of
an exponent, which are unique to a specific cryptographic system. In the second
step, for a given n-bit exponent R in the cryptographic system, the value for
g.sup.R is on-line computed using the plurality of values prestored in the
first step.

The mathematical background of the present invention is first explained to show
how to obtain a plurality of prestored values to speed up the evaluation of
g.sup.R. First, suppose that the exponent R is divided into hv blocks of fixed
size and then arranged into an h.times.v two-dimensional array as shown in FIG.
1, where the integer variables h and v determines the storage requirement and
performace of the present invention. That is, R is divided into h blocks
R.sub.i (0.ltoreq.i.ltoreq.h-1) of size a=.left brkt-top.n/h.right brkt-top.
bits, and then each R.sub.i is subdivided into v smaller blocks R.sub.i,j
(0.ltoreq.j.ltoreq.v-1) of size b=.left brkt-top.a/v.right brkt-top. bits. Then
the divisions of R into large blocks R.sub.i can be expressed by ##EQU8## and
the further subdivisions of each value of R.sub.i into smaller blocks R.sub.i,j
can be expressed by ##EQU9##

In the expressions (4) and (4'), the value for R.sub.i corresponds to the i-th
row and the value for R.sub.i,j corresponds to the (i,j)-th element of the
h.times.v matrix in FIG. 1.

Let g.sub.0 =g and define g.sub.i as g.sub.i =g.sub.i-1.sup.2.spsp.a =
g.sup.2.spsp.ia for 0<i<h.

Then, using equations (4) and (4'), g.sup.R can be expressed by ##EQU10##

Let R.sub.i =(e.sub.i,a-1 . . . e.sub.i,1 e.sub.i,0).sub.2 be the binary
representation of R.sub.i (0.ltoreq.i<h). Then R.sub.i,j (0.ltoreq.j<v) is
represented in binary form as follows. ##EQU11##

Therefore, the expression (5) can be rewritten as follows: ##EQU12##

This expression (6) is the basis for the construction of a pre-computation
table in the present invention.

Now it is explained how to compute a plurality of values to be prestored for
fast evaluation of g.sup.R using the equation (6). The idea is to pre-compute
and store all possible combinations of the product of the g.sub.i for
0.ltoreq.i<h and their 2.sup.jb -th powers for 0.ltoreq.j<v, so that the
pre-computation table can be used to evaluate any value of g.sup.R for a given
number R of size n. Let G[j][f] be the value stored in the (j,f)-th memory
location of two dimensional array named by G in a computing device. The values
for G[0][f] for 1.ltoreq.f<2.sup.h is computed as

G[0][f]=g.sub.h-1.sup.e.sbsp.h-1 g.sub.h-2.sup.e.sbsp.h-2 . . .
g.sub.1.sup.e.sbsp.1 g.sub.0.sup.e.sbsp.0 (6.5),

where f represents the decimal value for the binary number e.sub.h-1 . . .
e.sub.1 e.sub.0 obtained by concatenating the exponent bits. The equation (6.5)
illustrates an expression for all of the precomputed values stored in column 0
in memory as illustrated by the rightmost column G[0] in FIG. 2. Next, the
2.sup.jb -th powers of G[0][f] for 0<j<v, which are the values stored in column
j in memory in FIG. 2, are obtained recursively from Equation 6.5 as ##EQU13##

Once the values computed according to the equations (6.5) and (7) are stored in
memory locations G[j][f] for 0.ltoreq.j<v and 1.ltoreq.f<2.sup.h, the equation
(6) can be evaluated as follows: ##EQU14##

Here, I.sub.j,k =(e.sub.h-1,bj+k . . . e.sub.1,bj+k e.sub.0,bj+k).sub.2, which
corresponds to the value for the k-th bit column in the j-th block column in
FIG. 1 and G[j][I.sub.j,k ] represents the prestored value located at column j,
row I.sub.j,k in FIG. 2.

Now, it is straightforward to compute g.sup.R using equation (8) by the
extended square-and-multiply method as illustrated in the following algorithm
executable by a computer. ##EQU15##

Next, the number of multiplications required to compute g.sup.R by the
algorithm (9) is counted. It should be noted that the (v-1)-th blocks in FIG. 1
may not be full of b bits. In fact, they are of (bv-a)-bit size. Thus, the
number of terms to be multiplied together in the inner for-loop in the
algorithm (9) is v for the first bv-a iterations and v+1 for the remaining
(b-bv)+a iterations. Therefore, the total number of multiplications required is
at most v(bv-a)+(v+1).times.(b-bv+a)-2=a+b-2 in the worst case. Since it can be
assumed that the probability of I.sub.j,k being zero is 1/2.sup.k on average
and there are a occurrences of I.sub.j,k in the algorithm, the expected number
of multiplications is given by a(2.sup.h -1)/2.sup.h +(b-2). Of course, this
performance is achieved with storage for v(2.sup.h -1) pre-computed values.

The aforementioned exponentiation method according to the present invention
will be described using an example for more detailed explanation. In this
example, suppose that for a given base element g unique to a cryptographic
system, it is required to evaluate g.sup.R with random 32-bit number R in the
cryptographic system. Also, suppose that, as an example, a 4.times.3
configuration is selected for building a pre-computation table (i.e., n=32, h=4
and v=3). Then, the values for a and b are calculated by

a=.left brkt-top.(n/h).right brkt-top.=.left brkt-top.(32/4).right brkt-top.=8

b=.left brkt-top.(a/v).right brkt-top.=.left brkt-top.(8/3).right brkt-top.=3

To build a pre-computation table, let g.sub.0 =g and define g.sub.i as g.sub.i
=g.sub.i-1.sup.2.spsp.8 =g.sup.2.spsp.8i for i=1, 2 and 3 as explained before.
That is, g.sub.0 =g, g.sub.1 =(g.sup.256).sup.1, g.sub.2 =(g.sup.256).sup.2 and
g.sub.3 =(g.sup.256).sup.3. At system initialization stage, once and for all,
the following values are pre-computed and stored in memory locations G[j][f]
within a computing device for 1.ltoreq.f<15 and 0.ltoreq.j<3: ##EQU16## where
the index f is equal to the decimal value of (e.sub.3 e.sub.2 e.sub.1
e.sub.0).sub.2.

For example, the values for G[0][f] for 1.ltoreq.f<15 are computed as follows
(note that G[0][0] is always equal to 1):

                  TABLE 1
    ______________________________________
     ##STR1##
    ______________________________________


Also, each value for G[j][f] for j=1, 2 is computed by raising each value for G
[0][f] to the 2.sup.3j th power.

Now, suppose that the cryptographic system needs evaluate g.sup.R for a 32-bit
number R given by

R=01100010101001111001000101011011.sub.(2).

This exponent R is first partitioned into a 4.times.3 matrix form as shown in
FIG. 3. That is, ##EQU17##

Then, using the equation (8), g.sup.R can be computed by ##EQU18## For each j
and k, the value for I.sub.j,k is obtained by taking the k-th bit column in the
j-th columns of sub-blocks as illustrated in FIG. 3. That is,

I.sub.0,0 =e.sub.3,0 e.sub.2,0 e.sub.1,0 e.sub.0,0 =0111.sub.(2) =7

I.sub.0,1 =e.sub.3,1 e.sub.2,1 e.sub.1,1 e.sub.0,1 =1101.sub.(2) =13

I.sub.0,2 =e.sub.3,2 e.sub.2,2 e.sub.1,2 e.sub.0,2 =0100.sub.(2) =4

I.sub.1,0 =e.sub.3,3 e.sub.2,3 e.sub.1,3 e.sub.0,3 =0001.sub.(2) =1

I.sub.1,1 =e.sub.3,4 e.sub.2,4 e.sub.1,4 e.sub.0,4 =0011.sub.(2) =3

I.sub.1,2 =e.sub.3,5 e.sub.2,5 e.sub.1,5 e.sub.0,5 =1100.sub.(2) =12

I.sub.2,0 =e.sub.3,6 e.sub.2,6 e.sub.1,6 e.sub.0,6 =1001.sub.(2) =9

I.sub.2,1 =e.sub.3,7 e.sub.2,7 e.sub.1,7 e.sub.0,7 =0110.sub.(2) =6

For notational convenience, let ##EQU19## Here each value of Z.sub.k for k=
0,1,2 is computed as follows: ##EQU20##

Therefore, the value for g.sup.R can be obtained from the values Z.sub.0,
Z.sub.1 and Z.sub.2 as follows: ##EQU21##

As described above, the exponentiation method using the pre-computed values
stored in a plurality of memory locations has been explained in detail by way
of a small-size example. Of course, the bit size of the exponent R may be much
larger in practical applications.

It has been assumed that the exponent R is partitioned into hv blocks of almost
equal size and that these hv blocks are arranged in an h.times.v rectangular
shape. In most cases, such partitions and arrangements yield better performance
than others for a given storage capacity, but sometimes this may not be the
case. For example, consider two configurations shown in FIG. 5, where a 512-bit
exponent is partitioned and arranged in two different ways.

The first 5.times.5 configuration corresponds to the case having been analyzed
above and results in the performance of 118.78 multiplications on average (122
in the worst case) with storage for 155 values. On the other hand, with the
second 5.times.1.vertline.6.times.2 configuration, the exponentiation can be
done in 117.13 multiplications on average (119 in the worst case). Therefore,
it can be seen that the second configuration is the better choice.

For the configuration of type h.sub.1 .times.v.sub.1 .vertline.h.sub.2
.times.v.sub.2, a general formula for the worst-case and average-case
performance can be easily derived. Let b.sub.1 and b.sub.2 be the size of
partitioned blocks in h.sub.1 .times.v.sub.1 and h.sub.2 .times.v.sub.2,
respectively. For better performance, b.sub.2 must be greater than or equal to
b.sub.1 and can be obtained in the same way as b in the h.times.v
configuration. Thus, b.sub.1 and b.sub.2 can be obtained as follows:

b.sub.2 =.left brkt-top.n/(h.sub.1 v.sub.1 +h.sub.2 v.sub.2).right brkt-top .

b.sub.1 =.left brkt-top.(n-b.sub.2 h.sub.2 v.sub.2)/h.sub.1 v.sub.1 .right
brkt-top.

Now, the worst-case number of multiplications required for this configuration
can be directly obtained from the formula for the h.times.v configuration by
replacing a with h.sub.1 v.sub.1 +h.sub.2 v.sub.2 and b with b.sub.2,
respectively. This results in b.sub.1 v.sub.1 +b.sub.2 (v.sub.2 +1)-2
multiplications in the worst case. Similarly, the average number of
multiplications can be obtained as follows: ##EQU22##

This performance can be achieved with the storage for v.sub.1 (2.sup.h.sbsp.1
-1)+v.sub.2 (2.sup.h.sbsp.2 -1) pre-computed values. It can be easily seen that
no configurations other than the two types h.times.v and h.sub.1 .times.v.sub.1
.vertline.h.sub.2 .times.v.sub.2 (where h.sub.2 =h.sub.1 +1) yield better
performance for a given storage capacity.

FIGS. 7A and 7B illustrate the performance of the present invention and show a
configuration of an exponent R and the storage capacity for pre-computed values
and the number of multiplications for the worst case and the average case.
Here, FIG. 7A is a table for an 160-bit exponent, and FIG. 7B is a table for a
512-bit exponent.

In order to compare the performance of the BGMW method and the present
invention, with respect to the respective 160-bit and 512-bit exponents, the
number of multiplications and storage requirements by the BGMW method are also
shown in FIGS. 6A and 6B. FIG. 6A is a table for a 160-bit exponent, and FIG.
6B is a table for a 512-bit exponent.

The invented method for exponentiation can also be used to evaluate
multiple-term exponentiation, which is required in some cryptographic
applications such as probabilistic batch verification of digital signatures
(see "Can DSA be improved ? Complexity trade-offs with the digital signature
standard," in Advances in Cryptology-Eurocrypt'94, by D. Naccache, D. M'Raihi,
S. Vaudenay and D. Raphaeli, pp.77-85). For example, suppose that it is
required to compute ##EQU23## for given values of x.sub.i and K.sub.i, where
the bit-length of each K.sub.i is assumed to be t and each x.sub.i varies in
each computation (i.e., not fixed, contrary to g in the previous embodiment).

If each exponential term is computed using the binary method, this computation
requires 1.5 m(t-1) multiplications on average and 2 m(t-1) in the worst case.
However, the computation time can be substantially reduced by applying the
invented method. The process to evaluate z is the same as before, except that
each of g.sub.i in the previous embodiment is replaced by each of x.sub.i.
Thus, in this application, it suffices to arrange the m exponentials into
h.times.v configuration for appropriate values of h and v, on-line precompute
products of all possible combinations among h values of x.sub.i in each column
of h blocks and then apply the extended binary exponentiation method described
before (see algorithm (9)). Then it can be easily seen that the total number of
multiplications required is (v+1)t+(2.sup.h -h-1)v-2 in the worst case and vt
(2.sup.h -1)/2.sup.h +t+(2.sup.h -h-1)v-2 on average. Note here that (2.sup.h
-h-1)v accounts for the number of multiplications required for on-line
precomputation.

For more explicit explanation, let us take an example with m=12, t=160 and
4.times.3 configuration. Then the above method requires 671 multiplications in
the worst case and 641 on average. This can be compared to the case of
evaluating each exponentiation using the binary method requiring 3816
multiplication in the worst case and 2862 on average. Thus, in this example the
invented method can achieve more than a four-fold improvement over the binary
method for the average case.

Another advantage of the present invention is its efficiency in computing
exponentiation of the form g.sup.R y.sup.E, where y is not fixed and the size
of E is assumed to be much less than the size of R, which is the case for the
verification of Schnorr's identification and signature scheme(see "Efficient
signature generation by smart cards," in J. Cryptology, by C. P. Schnorr, 4(3),
pp 161-174, 1991) or its variants, e.g., Brickell-McCurley's scheme (see "An
Interactive Identification Scheme Based On Discrete Logarithms and Factoring",
by E. F. Brickell and K. S. McCurley, J. Cryptology 5(1), pp 29-39, 1992) and
Okamoto's scheme (see "Probably Secure and Practical Identification Schemes and
Corresponding Signature Schemes", by T. Okamoto, Proc. Crypto'92, 1992).

In all these schemes, along with a few modular multiplications, the prover (or
the signer) needs to compute g.sup.R for a random R, which can be efficiently
performed by the method described above. On the other hand, to validate the
prover's identity or to verify the signature, the verifier needs to perform the
computation of the form g.sup.R y.sup.E where y corresponds to the public key
of the prover (or the signer) and thus varies in each run of the protocol. The
size of E typically lies between 20 and 40 in identification schemes. The
performance of the proposed method for computing g.sup.R y.sup.E will be
investigated below.

Let t be the size of E. It is clear that if t.ltoreq.b, then g.sup.R y.sup.E
can be computed in a+b+(t-2) multiplications in the worst case and a(2.sup.h
-1)/2.sup.h +b+0.5 t-2 on average. In case of t>b, the process can be proceeded
as above or the computation can be done after partitioning E into smaller
blocks. The first case yields the performance of a+2 t-2 multiplications in the
worst case and a(2.sup.h -1)/2.sup.h +1.5 t-2 multiplications on average.
However, if t is much larger than b, the performance can be further improved by
dividing E into smaller blocks.

Thus, for more general formula, suppose that E is partitioned into u blocks of
almost equal size. (It is considered that the whole configuration for computing
g.sup.R y.sup.E is u.times.1.vertline.h.times.v.)

Let c be the bit-length of the partitioned blocks (i.e., c=.left brkt-top.t/
u.right brkt-top.). Then, y.sup.2.spsp.kc for k=1, 2, . . . , u-1, and each
product of their possible combinations has to be first computed, which all
together requires c(u-1)+2.sup.u -u-1 multiplications. For the range of
interested t, e.g., up to t=80, u will be at most three.

Now, if c.ltoreq.b, then at most c additional multiplications are sufficient in
the worst case and c(2.sup.u -1)/2.sup.u are sufficient on average. Therefore,
the total number of multiplications required in this case is a+b+uc+(2.sup.u
-u-3) in the worst case and a(2.sup.h -1)/2.sup.h +b+c(u2.sup.u -1)/2.sup.u
+2.sup.u -u-3 on average.

Similarly, for the case of c>b, it can be easily shown that the number of
multiplications is a+(u+1)c+2.sup.u -u-3 in the worst case and a(2.sup.h -1)/
2.sup.h +c((u+1)2.sup.u -1)/2.sup.u +2.sup.u -u-3 on average.

With the invented exponentiation method, the Schnorr-like identification and/or
signature schemes can be made more practical for smart card implementations.
For example, with 160-bit exponents and t=30, the verification condition can be
checked in 80.5 multiplications on average, if 1,920 bytes of storage are
available (for a 4.times.2 configuration). Similarly, a signature with t=80 can
be verified in 144.13 multiplications on average using the same storage
capacity. This is a considerable improvement with only a very small storage
capacity, compared with the binary method requiring 246.5 multiplications for t
=30 and 259.0 multiplications for t=80 on average.

Moreover, identification or signature verifications are usually performed in
much more powerful terminals capable of being equipped with a large amount of
memory. In such an environment, the 8.times.2 configuration, for example, may
be adopted. Then identity verifications can be done in 60.2 multiplications on
average for t=30 and signature verifications can be done in 126.6
multiplications on average for t=80, using about 32 Kbytes of storage.

Next, it will be explained that small additional communication can considerably
reduce the number of multiplications for computing g.sup.R y.sup.E again. That
is, the verifier can save the on-line computational load for preparing y.sub.k
=y.sup.2.spsp.kc for k=1, 2, . . . , u-1, if they are pre-computed and stored
by the signer (or the prover) and then transmitted to the verifier together
with other data.

For example, for the signature scheme with t=80, if the signer sends two
additional values y.sub.1 and y.sub.2, where y.sub.1 =y.sup.2.spsp.27 and
y.sub.2 =y.sub.1.sup.2.spsp.27, together with a signature for message, then the
signature verification can be done in 90.13 multiplications on average with the
4.times.2 configuration. Therefore, 54 multiplications can be saved with the
increase of a small increase of communication. This corresponds to about a
3-fold improvement on average over the binary method which requires 259
multiplications on average.

It can be seen that the BGMW method is less efficient for the computation of
the form g.sup.R y.sup.E in either case considered above. That is, in case of
no additional communication, if the exponents are represented in non-binary
power base, more computations are needed in performing the on-line
pre-computation required for y.sup.E. When additional communication is allowed,
more pre-computed values must be transmitted due to the use of a small base.

The above method of combining pre-computation and additional communication can
be used to speed up the verification of the digital signature standard (DSS) as
well. In DSS, the computation of the type g.sup.R y.sup.E should be performed
with .vertline.R.vertline.=.vertline.E.vertline.=160, and therefore, without
additional communication, no advantage can be gained with pre-computation.

However, if the signer sends three additional blocks {y.sub.1, y.sub.2 and
y.sub.3 }, where y.sub.i =y.sub.i-1.sup.2.spsp.40 for i={1, 2, 3}, and if the
verifier adopts the 4.times.2 configuration, then the signature can be verified
in 124 multiplications on average. This is more than a 2-fold improvement over
the binary method which requires 279 multiplications on average, with only
1,920 bytes of storage and 192 bytes of additional communication (for a 512-bit
modulus).

FIG. 8 shows the number of multiplications required for signature generation
and verification in three signature schemes (Schnorr, DSS and
Brickell-McCurley), under the assumption that the signer sends three additional
pre-computed values for the public key together with a signature, as mentioned
above. Here, only the number of multiplications for exponentiation operations
is taken into account, which disregards some necessary operations such as
reduction mod q and multiplicative inverse mod q where g is a prime of about
160 bits.

Two configurations of 4.times.2 and 8.times.2 are taken as examples, since the
former is suitable for smart card applications and the latter for more general
applications with a relatively large storage capacity available. For
comparison, the performance of the binary method is also presented.

Another Embodiment of the present invention is to do the exponentiation g.sup.R
in parallel with multiple processors. For example, suppose that v processors
are available. Then, one can assign the j-th processor to compute the j-th
column of the h.times.v configuration (see FIG. 1). That is, the j-th processor
can be assigned to compute ##EQU24## in the expression of ##EQU25## where the
pre-computed values are the same as before but each processor oily stores in
its local memory 2.sup.h -1 pre-computed values needed for its computation.
Then the computation of each processor can be completed in at most 2(b-1)
multiplications. Thereafter, .left brkt-top.log.sub.2 v.right brkt-top.
multiplications are needed in addition to produce the final result. Therefore,
the total number of multiplications required is at most 2(b-1)+.left
brkt-top.log.sub.2 v.right brkt-top..

FIG. 9 shows the number of multiplications required for parallel processing in
the case of 160/512 bit exponents, according to the number of processors
(denoted by np) and the storage needed per processor (denoted by sp). As shown
in FIG. 9, with only a small number of processors, the performance can be
greatly improved. For example, for a 512-bit exponent, the computation of
g.sup.R can be done in 32 multiplications with four processors if each
processor stores 255 pre-computed values in its local memory (about 16 Kbytes).
With more processors, say sixteen, the exponentiation can be done in ten
multiplications with the same storage capacity.

Meanwhile, in the above, only the case of v processors being available for an
h.times.v configuration was considered for the sake of convenient explanation.
However, an h.times.v configuration can also be used with smaller processors.
For example, if p processors are available for h.times.v configuration, each
processor can be assigned to compute w=.left brkt-top.v/p.right brkt-top.
columns in the h.times.v configuration. In this case, each processor should
store w(2.sup.h -1) pre-computed values in its local memory. Then, it is easy
to see that the number of required multiplications is given by (w+1)b+.left
brkt-top.log.sub.2 p.right brkt-top.-2. Practically, in most cases, it is more
advantageous to assign many columns to each processor in a time-storage
tradeoff than one column assignment considered in FIG. 9.

As described above, according to the present invention, a new method for fast
exponentiation with pre-computation has been described. The invented method is
very simple and easy to implement systematicaly, but achieves better
performance than the BGMW method.

Also, the method according to the present invention is flexibly applicable to
various computing environments due to its wide range of time-storage tradeoffs.
In particular, the computation by smart cards can be substantially speeded up
with only a very small storage capacity. The present invention can also speed
up the computation of the form g.sup.R y.sup.E with y variable. This can
greatly improve the practicality of the Schnorr-type identification and
signature scheme, since the verifier as well as the prover (signer) can gain
great computational advantage with a moderate storage capacity.

Finally, the parallel processing according to the present invention may be
useful in high performance servers with multiple processors.

The present invention provides a method for reducing the amount of computation
to evaluate g.sup.R for a fixed g and a random R by storing a plurality of
pre-computed values in memory within a computing device. The computation of
g.sup.R is an essential operation in all discrete logarithm-based cryptographic
systems, such as the Schnorr identification scheme, the Digital Signature
Standard (DSS) and Diffie-Hellman key agreement or exchange scheme.

For example, FIG. 10 is a block diagram of a cryptographic system constructed
in accordance with a preferred embodiment of the present invention. The
cryptographic system comprises computers 10 and 12, having network interfaces
or modems 14 and 18, respectively, for communicating with each other over a
communications network 16. The computer 10 includes a CPU 102, a memory 104,
and an input/output port 106 and the computer 12 includes a CPU 122, a memory
124 and an input/output port 126. CPUs 102 and 122 perform various mathematical
processes. Memories 104 and 124 store values utilized in the various
mathematical processes, in particular pre-computed values needed for
exponentiation in the present invention. Also, input/output ports 106 and 126
allow connection to the modems 14 and 18.

The present invention may be used with the system of FIG. 10 in a variety of
ways to transmit coded data between the computer 10 and the computer 12 through
the network 16.

In the Schnorr identification scheme, suppose that user A wants to prove its
identity to user B. As system parameters, two primes p and q, and base g are
made public such that q.vertline.p-1, g.noteq.1, g.sup.q =1 mod p. The prover A
possesses a secret key s .epsilon.[0,q) and publishes the corresponding public
key y=g.sup.x mod p, and wants to prove that he knows the secret key x to user
B. The procedure for identification is as follows:

    ______________________________________
    user A                user B
    ______________________________________
    1)    randomly selects  k .epsilon. [0, q)  and computes  r = g.sup.k mod
          p
                       ##STR2##   randomly selects  e .epsilon. [0, 2.sup.t)
                                  for some integer t
    2)    computes  s = k - xe mod q
                       ##STR3##
    3)
                       ##STR4##   accepts proof of  identity if  g.sup.s
                                  y.sup.e = r mod p
    ______________________________________


In the above example, the exponentiation method according to the present
invention can be used in calculating g.sup.k in 1) and g.sup.s y.sup.e in 3).

In the Digital Signature Standard (DSS), there are three public system
parameters p, q and g: primes p, q such that q.vertline.p-1, and base g such
that g.noteq.1 and g.sup.q =1 mod p. And each signer A selects its secret key x
.epsilon.[0,q) and publishes the public key y=g.sup.x mod p. The signer A can
generate a signature for a message m using its secret key and sends the message
including the signature to the verifier B. Then the verifier B can check the
validity of the signature using A's public key. The signature generation and
verification procedures are described below, where H denotes the secure hash
algorithm (SHA).

Signer A generates signature (r,s) for message m and sends (r,s) to Verifier B:

1) randomly selects k .epsilon.[0,q) and computes r=(g.sup.k mod p) mod q

2) computes s=k.sup.-1 (H(m)+rx) mod q

Verifier B verifies the signature (r,s) by checking that (g.sup.s.spsp.-1.sup.H
(m) y.sup.s.spsp.-1.sup.r mod p) mod q=r.

In the above example, exponentiation methods according to embodiments of the
present invention can be used to calculate g.sup.k and g.sup.s.spsp.-1.sup.H(m)
y.sup.s.spsp.-1.sup.r.

In the Diffie-Hellman key exchange scheme, user A and user B agree upon a
common secret key K.sub.AB through communications over a public channel. Here,
p, q and g are public parameters defined as before.

1) user A randomly selects R.sub.A .epsilon.[0,q), computes and sends X.sub.A =
g.sup.R.sbsp.A mod p to user B;

2) user B randomly selects R.sub.B .epsilon.[0,q, computes and sends X.sub.B =
g.sup.R.sbsp.B mod p to user A;

3) user A computes K.sub.AB =X.sub.B.sup.R.sbsp.A mod p=
g.sup.R.sbsp.B.sup.R.sbsp.A mod p;

4) user B computes K.sub.AB =X.sub.A.sup.R.sbsp.B mod p=
g.sup.R.sbsp.A.sup.R.sbsp.B mod p.

In the above example, exponentiation methods according to embodiments of the
present invention can be used to calculate

g.sup.R.sbsp.A, g.sup.R.sbsp.B.

                                   * * * * *                                   
-------------------------------------------------------------------------------
                                    [Image]                                    
                  [View Shopping Cart] [Add to Shopping Cart]                  
                               [HIT_LIST] [Top]                                
        [Home] [Boolean Search] [Manual Search] [Number Search] [Help]