Showing 1 to 28 of 28 matching Articles
Results per page:
Export (CSV)
By
Aranha, Diego F.; FuentesCastañeda, Laura; Knapp, Edward; Menezes, Alfred; RodríguezHenríquez, Francisco
Show all (5)
Post to Citeulike
14 Citations
We implement asymmetric pairings derived from KachisaSchaeferScott (KSS), BarretoNaehrig (BN), and BarretoLynnScott (BLS) elliptic curves at the 192bit security level. Somewhat surprisingly, we find pairings derived from BLS curves with embedding degree 12 to be the fastest for our serial as well as our parallel implementations. Our serial implementations provide a factor3 speedup over the previous stateoftheart, demonstrating that pairing computation at the 192bit security level is not as expensive as previously thought. We also present a general framework for deriving a Weiltype pairing that is wellsuited for computing a single pairing on a multiprocessor machine.
more …
By
Aranha, Diego F.; Knapp, Edward; Menezes, Alfred; RodríguezHenríquez, Francisco
Show all (4)
Post to Citeulike
4 Citations
In the past year, the speed record for pairing implementations on desktopclass machines has been broken several times. The speed records for asymmetric pairings were set on a single processor. In this paper, we describe our parallel implementation of the optimal ate pairing over BarretoNaehrig (BN) curves that is about 1.23 times faster using two cores of an Intel Core i5 or Core i7 machine, and 1.45 times faster using 4 cores of the Core i7 than the stateoftheart implementation on a single core. We instantiate Hess’s general Weil pairing construction and introduce a new optimal Weil pairing tailored for parallel execution. Our experimental results suggest that the new Weil pairing is 1.25 times faster than the optimal ate pairing on 8core extensions of the aforementioned machines. Finally, we combine previous techniques for parallelizing the eta pairing on a supersingular elliptic curve with embedding degree 4, and achieve an estimated 1.24fold speedup on an 8core extension of an Intel Core i7 over the previous best technique.
more …
By
Adj, Gora; Menezes, Alfred; Oliveira, Thomaz; RodríguezHenríquez, Francisco
Show all (4)
Post to Citeulike
2 Citations
We show that a Magma implementation of Joux’s
$$L[1/4+o(1)]$$
algorithm can be used to compute discrete logarithms in the 1303bit finite field
$${\mathbb F}_{3^{6 \cdot 137}}$$
and the 1551bit finite field
$${\mathbb F}_{3^{6 \cdot 163}}$$
with very modest computational resources. Our
$${\mathbb F}_{3^{6 \cdot 137}}$$
implementation was the first to illustrate the effectiveness of Joux’s algorithm for computing discrete logarithms in smallcharacteristic finite fields that are not Kummer or twistedKummer extensions.
more …
By
Beuchat, JeanLuc; Brisebarre, Nicolas; Detrey, Jérémie; Okamoto, Eiji; RodríguezHenríquez, Francisco
Show all (5)
Post to Citeulike
8 Citations
In this article we propose a study of the modified Tate pairing in characteristics two and three. Starting from the η_{T} pairing introduced by Barreto et al. [1], we detail various algorithmic improvements in the case of characteristic two. As far as characteristic three is concerned, we refer to the survey by Beuchat et al. [5]. We then show how to get back to the modified Tate pairing at almost no extra cost. Finally, we explore the tradeoffs involved in the hardware implementation of this pairing for both characteristics two and three. From our experiments, characteristic three appears to have a slight advantage over characteristic two.
more …
By
FuentesCastañeda, Laura; Knapp, Edward; RodríguezHenríquez, Francisco
Post to Citeulike
7 Citations
An asymmetric pairing
$e\colon{\mathbb{G}}_2\times{\mathbb{G}}_1\to{\mathbb{G}}_T$
is considered such that
${\mathbb{G}}_1=E({\mathbb F}_p)[r]$
and
${\mathbb{G}}_2=\tilde E({\mathbb F}_{p^{k/d}})[r]$
, where k is the embedding degree of the elliptic curve
$E/{\mathbb F}_p$
, r is a large prime divisor of
$\# E({\mathbb F}_p)$
, and
$\tilde E$
is the degreed twist of E over
${\mathbb F}_{p^{k/d}}$
with
$r \mid \tilde E ({\mathbb F}_{p^{k/d}} )$
. Hashing to
${\mathbb{G}}_1$
is considered easy, while hashing to
${\mathbb{G}}_2$
is done by selecting a random point Q in
$\tilde E({\mathbb F}_{p^{k/d}})$
and computing the hash value cQ, where c·r is the order of
$\tilde E({\mathbb F}_{p^{k/d}})$
. We show that for a large class of curves, one can hash to
${\mathbb{G}}_2$
in
$\textup{O}(1/\varphi (k)\log c)$
time, as compared with the previously fastestknown
$\textup{O}(\log p)$
. In the case of BN curves, we are able to double the speed of hashing to
${\mathbb{G}}_2$
. For higherembeddingdegree curves, the results can be more dramatic. We also show how to reduce the cost of the finalexponentiation step in a pairing calculation by a fixed number of field multiplications.
more …
By
Adj, Gora; Menezes, Alfred; Oliveira, Thomaz; RodríguezHenríquez, Francisco
Show all (4)
Post to Citeulike
3 Citations
In 2013, Joux, and then Barbulescu, Gaudry, Joux and Thomé, presented new algorithms for computing discrete logarithms in finite fields of small and medium characteristic. We show that these new algorithms render the finite field
${\mathbb{F}}_{3^{6 \cdot 509}} = {\mathbb{F}}_{3^{3054}}$
weak for discrete logarithm cryptography in the sense that discrete logarithms in this field can be computed significantly faster than with the previous fastest algorithms. Our concrete analysis shows that the supersingular elliptic curve over
${\mathbb{F}}_{3^{509}}$
with embedding degree 6 that had been considered for implementing pairingbased cryptosystems at the 128bit security level in fact provides only a significantly lower level of security. Our work provides a convenient framework and tools for performing a concrete analysis of the new discrete logarithm algorithms and their variants.
more …
By
Taverne, Jonathan; FazHernández, Armando; Aranha, Diego F.; RodríguezHenríquez, Francisco; Hankerson, Darrel; López, Julio
Show all (6)
Post to Citeulike
6 Citations
The availability of a new carryless multiplication instruction in the latest Intel desktop processors significantly accelerates multiplication in binary fields and hence presents the opportunity for reevaluating algorithms for binary field arithmetic and scalar multiplication over elliptic curves. We describe how to best employ this instruction in field multiplication and the effect on performance of doubling and halving operations. Alternate strategies for implementing inversion and halftrace are examined to restore most of their competitiveness relative to the new multiplier. These improvements in field arithmetic are complemented by a study on serial and parallel approaches for Koblitz and random curves, where parallelization strategies are implemented and compared. The contributions are illustrated with experimental results improving the stateoftheart performance of halving and doublingbased scalar multiplication on NIST curves at the 112 and 192bit security levels, and a new speed record for sidechannel resistant scalar multiplication in a random curve at the 128bit security level.
more …
By
Oliveira, Thomaz; López, Julio; Aranha, Diego F.; RodríguezHenríquez, Francisco
Show all (4)
Post to Citeulike
7 Citations
In this work we present the λcoordinates, a new system for representing points in binary elliptic curves. We also provide efficient elliptic curve operations based on the new representation and timing results of our software implementation over the field
$\mathbb{F}_{2^{254}}$
. As a result, we improve speed records for protected/unprotected single/multicore software implementations of randompoint elliptic curve scalar multiplication at the 128bit security level. When implemented on a Sandy Bridge 3.4GHz Intel Xeon processor, our software is able to compute a single/multicore unprotected scalar multiplication in 72,300 and 47,900 clock cycles, respectively; and a protected singlecore scalar multiplication in 114,800 cycles. These numbers improve by around 2% on the newer Core i7 2.8GHz Ivy Bridge platform.
more …
By
MancillasLópez, Cuauhtemoc; Chakraborty, Debrup; RodríguezHenríquez, Francisco
Post to Citeulike
Tweakable enciphering schemes are a certain type of blockcipher mode of operation which provide security in the sense of a strong pseudorandom permutation. It has been proposed that these types of modes are suitable for inplace disk encryption. Currently there are many proposals available for these schemes. EME is one of the efficient candidate of this category. EME2 is a derivative of EME which is currently one of the candidates of a draft standard for wide block modes by the IEEE working group on storage security. We show some weakness of these two modes assuming that some side channel information is available.
more …
By
Beuchat, JeanLuc; GonzálezDíaz, Jorge E.; Mitsunari, Shigeo; Okamoto, Eiji; RodríguezHenríquez, Francisco; Teruya, Tadanori
Show all (6)
Post to Citeulike
27 Citations
This paper describes the design of a fast software library for the computation of the optimal ate pairing on a Barreto–Naehrig elliptic curve. Our library is able to compute the optimal ate pairing over a 254bit prime field
$\mathbb{F}_{p}$
, in just 2.33 million of clock cycles on a single core of an Intel Core i7 2.8GHz processor, which implies that the pairing computation takes 0.832msec. We are able to achieve this performance by a careful implementation of the base field arithmetic through the usage of the customary Montgomery multiplier for prime fields. The prime field is constructed via the Barreto–Naehrig polynomial parametrization of the prime p given as, p = 36t^{4} + 36t^{3} + 24t^{2} + 6t + 1, with t = 2^{62} − 2^{54} + 2^{44}. This selection of t allows us to obtain important savings for both the Miller loop as well as the final exponentiation steps of the optimal ate pairing.
more …
By
Aranha, Diego F.; FazHernández, Armando; López, Julio; RodríguezHenríquez, Francisco
Show all (4)
Post to Citeulike
8 Citations
We design a stateoftheart software implementation of field and elliptic curve arithmetic in standard Koblitz curves at the 128bit security level. Field arithmetic is carefully crafted by using the best formulae and implementation strategies available, and the increasingly common native support to binary field arithmetic in modern desktop computing platforms. The ith power of the Frobenius automorphism on Koblitz curves is exploited to obtain new and faster interleaved versions of the wellknown τNAF scalar multiplication algorithm. The usage of the
$\tau^{\lfloor m/3 \rfloor}$
and
$\tau^{\lfloor m/4 \rfloor}$
maps are employed to create analogues of the 3and 4dimensional GLV decompositions and in general, the
$\lfloor m/s \rfloor$
th power of the Frobenius automorphism is applied as an analogue of an sdimensional GLV decomposition. The effectiveness of these techniques is illustrated by timing the scalar multiplication operation for fixed, random and multiple points. In particular, our library is able to compute a random point scalar multiplication in just below 10^{5} clock cycles, which sets a new speed record across all curves with or without endomorphisms defined over binary or prime fields. The results of our optimized implementation suggest a tradeoff between speed, compliance with the published standards and sidechannel protection. Finally, we estimate the performance of curvebased cryptographic protocols instantiated using the proposed techniques and compare our results to related work.
more …
By
Beuchat, JeanLuc; Detrey, Jérémie; Estibals, Nicolas; Okamoto, Eiji; RodríguezHenríquez, Francisco
Show all (5)
Post to Citeulike
1 Citations
This paper is devoted to the design of fast parallel accelerators for the cryptographic Tate pairing in characteristic three over supersingular elliptic curves. We propose here a novel hardware implementation of Miller’s loop based on a pipelined KaratsubaOfman multiplier. Thanks to a careful selection of algorithms for computing the tower field arithmetic associated to the Tate pairing, we manage to keep the pipeline busy. We also describe the strategies we considered to design our parallel multiplier. They are included in a VHDL code generator allowing for the exploration of a wide range of operators. Then, we outline the architecture of a coprocessor for the Tate pairing over
$\mathbb{F}_{3^m}$
. However, a final exponentiation is still needed to obtain a unique value, which is desirable in most of the cryptographic protocols. We supplement our pairing accelerator with a coprocessor responsible for this task. An improved exponentiation algorithm allows us to save hardware resources.
According to our placeandroute results on Xilinx FPGAs, our design improves both the computation time and the areatime tradeoff compared to previoulsy published coprocessors.
more …
By
Oliveira, Thomaz; Aranha, Diego F.; López, Julio; RodríguezHenríquez, Francisco
Show all (4)
Post to Citeulike
6 Citations
In this paper we introduce new methods for computing constanttime variablebase point multiplications over the GalbraithLinScott (GLS) and the Koblitz families of elliptic curves. Using a lefttoright doubleandadd and a righttoleft halveandadd Montgomery ladder over a GLS curve, we present some of the fastest timings yet reported in the literature for point multiplication. In addition, we combine these two procedures to compute a multicore protected scalar multiplication. Furthermore, we designed a novel regular
$$\tau $$
adic scalar expansion for Koblitz curves. As a result, using the regular recoding approach, we set the speed record for a singlecore constanttime point multiplication on standardized binary elliptic curves at the
$$128$$
bit security level.
more …
By
Beuchat, JeanLuc; LópezTrejo, Emmanuel; MartínezRamos, Luis; Mitsunari, Shigeo; RodríguezHenríquez, Francisco
Show all (5)
Post to Citeulike
15 Citations
This paper describes the design of a fast multicore library for the cryptographic Tate pairing over supersingular elliptic curves. For the computation of the reduced modified Tate pairing over
$\mathbb{F}_{3^{509}}$
, we report calculation times of just 2.94 ms and 1.87 ms on the Intel Core2 and Intel Core i7 architectures, respectively. We also try to answer one important design question that arises: how many cores should be utilized for a given application?
more …
By
Sánchez, Ana Helena; RodríguezHenríquez, Francisco
Post to Citeulike
7 Citations
In 2011, Waters presented a ciphertextpolicy attribute based encryption protocol that uses bilinear pairings to provide control access mechanisms, where the set of user’s attributes is specified by means of a linear secret sharing scheme. Some of the applications foreseen for this protocol lie in the context of mobile devices such a smartphones and tablets, which in a majority of instances are powered by an ARM processor supporting the NEON vector set of instructions. In this paper we present the design of a software cryptographic library that implements a 127bit security level attributebased encryption scheme over mobile devices equipped with a 1.4GHz Exynos 4 CortexA9 processor and a developing board that hosts a 1.7 GHz Exynos 5 CortexA15 processor. For the latter platform and taking advantage of the inherent parallelism of the NEON vector instructions, our library computes a single optimal pairing over a BarretoNaehrig curve approximately 2 times faster than the best timings previously reported on ARM platforms at this level of security. Further, using a 6attribute access formula our library is able to encrypt/decrypt a text/ciphertext in less than 7.5mS and 15.67mS, respectively.
more …
By
Adj, Gora; CervantesVázquez, Daniel; ChiDomínguez, JesúsJavier; Menezes, Alfred; RodríguezHenríquez, Francisco
Show all (5)
Post to Citeulike
The security of the JaoDe Feo Supersingular Isogeny DiffieHellman (SIDH) key agreement scheme is based on the intractability of the Computational Supersingular Isogeny (CSSI) problem—computing
$${\mathbb F}_{p^2}$$
rational isogenies of degrees
$$2^e$$
and
$$3^e$$
between certain supersingular elliptic curves defined over
$${\mathbb F}_{p^2}$$
. The classical meetinthemiddle attack on CSSI has an expected running time of
$$O(p^{1/4})$$
, but also has
$$O(p^{1/4})$$
storage requirements. In this paper, we demonstrate that the van OorschotWiener golden collision finding algorithm has a lower cost (but higher running time) for solving CSSI, and thus should be used instead of the meetinthemiddle attack to assess the security of SIDH against classical attacks. The smaller parameter p brings significantly improved performance for SIDH.
more …
By
Taverne, Jonathan; FazHernández, Armando; Aranha, Diego F.; RodríguezHenríquez, Francisco; Hankerson, Darrel; López, Julio
Show all (6)
Post to Citeulike
13 Citations
The availability of a new carryless multiplication instruction in the latest Intel desktop processors significantly accelerates multiplication in binary fields and hence presents the opportunity for reevaluating algorithms for binary field arithmetic and scalar multiplication over elliptic curves. We describe how to best employ this instruction in field multiplication and the effect on performance of doubling and halving operations. Alternate strategies for implementing inversion and halftrace are examined to restore most of their competitiveness relative to the new multiplier. These improvements in field arithmetic are complemented by a study on serial and parallel approaches for Koblitz and random curves, where parallelization strategies are implemented and compared. The contributions are illustrated with experimental results improving the stateoftheart performance of halving and doublingbased scalar multiplication on NIST curves at the 112 and 192bit security levels and a new speed record for sidechannelresistant scalar multiplication in a random curve at the 128bit security level. The algorithms presented in this work were implemented on Westmere and Sandy Bridge processors, the latest generation Intel microarchitectures.
more …
By
Adj, Gora; CanalesMartínez, Isaac; RiveraZamarripa, Luis; RodríguezHenríquez, Francisco
Show all (4)
Post to Citeulike
The problem of determining whether a polynomial defined over a finite field ring is smooth or not with respect to a given degree, is the most intensive arithmetic operation of the socalled descent phase of indexcalculus algorithms. In this paper, we present an analysis and efficient implementation of Coppersmith’s smoothness test for polynomials defined over finite fields with characteristic three. As a case study, we review the best strategies for obtaining a fast field and polynomial arithmetic for polynomials defined over the ring
$$F_q[X],$$
with
$$q=3^6,$$
and report the timings achieved by our library when computing the smoothness test applied to polynomials of several degrees defined in that ring. This software library was recently used in Adj et al. (Cryptology 2016.
http://eprint.iacr.org/2016/914
), as a building block for achieving a record computation of discrete logarithms over the 4841bit field
$${{\mathbb {F}}}_{3^{6\cdot 509}}$$
.
more …
By
Oliveira, Thomaz; López, Julio; RodríguezHenríquez, Francisco
Post to Citeulike
3 Citations
In this work, we retake an old idea presented by Koblitz in his landmark paper [21], where he suggested the possibility of defining anomalous elliptic curves over the base field
$$\mathbb {F}_4$$
. We present a careful implementation of the base and quadratic field arithmetic required for computing the scalar multiplication operation in such curves. In order to achieve a fast reduction procedure, we adopted a redundant trinomial strategy that embeds elements of the field
$$\mathbb {F}_{4^{m}},$$
with m a prime number, into a ring of higher order defined by an almost irreducible trinomial. We also report a number of techniques that allow us to take full advantage of the native vector instructions of highend microprocessors. Our software library achieves the fastest timings reported for the computation of the timingprotected scalar multiplication on Koblitz curves, and competitive timings with respect to the speed records established recently in the computation of the scalar multiplication over prime fields.
more …
By
Oliveira, Thomaz; López, Julio; Hışıl, Hüseyin; FazHernández, Armando; RodríguezHenríquez, Francisco
Show all (5)
Post to Citeulike
In the RFC 7748 memorandum, the Internet Research Task Force specified a Montgomeryladder scalar multiplication function based on two recently adopted elliptic curves, “curve25519” and “curve448”. The purpose of this function is to support the DiffieHellman key exchange algorithm that will be included in the forthcoming version of the Transport Layer Security cryptographic protocol. In this paper, we describe a ladder variant that permits to accelerate the fixedpoint multiplication function inherent to the DiffieHellman key pair generation phase. Our proposal combines a righttoleft version of the Montgomery ladder along with the precomputation of constant values directly derived from the basepoint and its multiples. To our knowledge, this is the first proposal of a Montgomery ladder procedure for prime elliptic curves that admits the extensive use of precomputation. In exchange of very modest memory resources and a small extra programming effort, the proposed ladder obtains significant speedups for software implementations. Moreover, our proposal fully complies with the RFC 7748 specification. A software implementation of the X25519 and X448 functions using our precomputable ladder yields an acceleration factor of roughly 1.20, and 1.25 when implemented on the Haswell and the Skylake microarchitectures, respectively.
more …
By
LópezTrejo, Emmanuel; RodríguezHenríquez, Francisco; DíazPérez, Arturo
Post to Citeulike
8 Citations
Due to the exponential growth of wireless and mobile applications, security has become a paramount design aspect. New techniques have been proposed for replacing the broken Wired Equivalent Privacy (WEP) protocol, which arguably is the most widely security tool used up to now in wireless environments. Under this scenario, AES in CCM (Counter with CBCMAC) mode has been included in the IEEE 802.11i wireless standard as a promising alternative to the compromised WEP protocol. In this contribution, we present an FPGA implementation of the CCM mode of operation using AES as its block cipher. Our design achieves a throughput of 1.05 Gbits/Sec with reasonable area requirements.
more …
By
Saqib, Nazar A.; RodríguezHenríquez, Francisco; DíazPérez, Arturo
Post to Citeulike
2 Citations
In this paper we present a singlechip FPGA full encryptor/decryptor core design of the AES algorithm. Our design performs all of them, encryption, decryption and key scheduling processes. High performance timing figures are obtained through the use of a pipelined architecture. Moreover, several modifications to the conventional AES algorithm’s formulations have been introduced, thus allowing us to obtain a significant reduction in the total number of computations and the path delay associated to them. Particularly, for the implementation of the most costly step of AES, multiplicative inverse in GF(2^{8}), two approaches were considered. The first approach uses precomputed values stored in a lookup table giving fast execution times of the algorithm at the price of memory requirements. Our second approach computes multiplicative inverse by using composite field techniques, yielding a reduction in the memory requirements at the cost of an increment in the execution time. The obtained results indicate that both designs are competitive with the fastest complete AES singlechip FGPA core implementations reported to date. Our first approach requires up to 11.8% less CLB slices, 21.5% less BRAMs and yields up to 18.5% higher throughput than the fastest comparable implementation reported in literature.
more …
By
CruzCortés, Nareli; OchoaJiménez, Eduardo; RiveraZamarripa, Luis; RodríguezHenríquez, Francisco
Show all (4)
Post to Citeulike
The implementation of the RSA private operation tends to be expensive since its computationally complexity is cubic with respect to the bitsize of its private key. As a consequence, considerable effort has been put into optimizing this operation. In this work, we present a parallel implementation of the RSA private operation using the Single Instruction Multiple Thread (SIMT) threading model of Graphics Processor Unit (GPU) platforms. The underlying modular arithmetic is performed by means of the Residue Number System (RNS) representation. By combining these two approaches, we present a GPU software library that achieves highspeed timings for the RSA private operation when using 1024, 2048 and 3072bit secret keys.
more …
By
RodríguezHenríquez, Francisco; MoralesLuna, Guillermo; Saqib, Nazar A.; CruzCortés, Nareli
Show all (4)
Post to Citeulike
20 Citations
In this contribution, we derive a novel parallel formulation of the standard Itoh–Tsujii algorithm for multiplicative inverse computation over the field GF(2^{m}). The main building blocks used by our algorithm are: field multiplication, field squaring and field square root operators. It achieves its best performance when using a special class of irreducible trinomials, namely, P(x) = x^{m} + x^{k} + 1, with m and k odd numbers and when implemented in hardware platforms. Under these conditions, our experimental results show that our parallel version of the Itoh–Tsujii algorithm yields a speedup of about 30% when compared with the standard version of it. Implemented in a Virtex 3200E FPGA device, our design is able to compute multiplicative inversion over GF(2^{193}) after 20 clock cycles in about 0.94 μS.
more …
By
Oliveira, Thomaz; López, Julio; Aranha, Diego F.; RodríguezHenríquez, Francisco
Show all (4)
Post to Citeulike
17 Citations
In this work, we present new arithmetic formulas for a projective version of the affine point representation
$$(x,x+y/x),$$
for
$$x\ne 0,$$
which leads to an efficient computation of the scalar multiplication operation over binary elliptic curves. A software implementation of our formulas applied to a binary Galbraith–Lin–Scott elliptic curve defined over the field
$$\mathbb {F}_{2^{254}}$$
allows us to achieve speed records for protected/unprotected single/multicore randompoint elliptic curve scalar multiplication at the 127bit security level. When executed on a Sandy Bridge 3.4 GHz Intel Xeon processor, our software is able to compute a single/multicore unprotected scalar multiplication in 69,500 and 47,900 clock cycles, respectively, and a protected singlecore scalar multiplication in 114,800 cycles. These numbers are improved by around 2 and 46 % on the newer Ivy Bridge and Haswell platforms, respectively, achieving in the latter a protected randompoint scalar multiplication in 60,000 clock cycles.
more …
By
Oliveira, Thomaz; López, Julio; RodríguezHenríquez, Francisco
Post to Citeulike
2 Citations
In this survey paper, we present a careful analysis of the Montgomery ladder procedure applied to the computation of the constanttime point multiplication operation on elliptic curves defined over binary extension fields. We give a general view of the main improvements and formula derivations that several researchers have contributed across the years, since the publication of Peter Lawrence Montgomery seminal work in 1987. We also report a fast software implementation of the Montgomery ladder applied on a Galbraith–Lin–Scott (GLS) binary elliptic curve that offers a security level close to 128 bits. Using our software, we can execute the ephemeral Diffie–Hellman protocol in just 95,702 clock cycles when implemented on an Intel Skylake machine running at 4 GHz.
more …
By
Oliveira, Thomaz; López, Julio; CervantesVázquez, Daniel; RodríguezHenríquez, Francisco
Show all (4)
Post to Citeulike
In this work, we retake an old idea that Koblitz presented in his landmark paper (Koblitz, in: Proceedings of CRYPTO 1991. LNCS, vol 576, Springer, Berlin, pp 279–287, 1991), where he suggested the possibility of defining anomalous elliptic curves over the base field
$${\mathbb {F}}_4$$
. We present a careful implementation of the base and quadratic field arithmetic required for computing the scalar multiplication operation in such curves. We also introduce two ordinary Koblitzlike elliptic curves defined over
$${\mathbb {F}}_4$$
that are equipped with efficient endomorphisms. To the best of our knowledge, these endomorphisms have not been reported before. In order to achieve a fast reduction procedure, we adopted a redundant trinomial strategy that embeds elements of the field
$${\mathbb {F}}_{4^{m}},$$
with m a prime number, into a ring of higher order defined by an almost irreducible trinomial. We also suggest a number of techniques that allow us to take full advantage of the native vector instructions of highend microprocessors. Our software library achieves the fastest timings reported for the computation of the timingprotected scalar multiplication on Koblitz curves, and competitive timings with respect to the speed records established recently in the computation of the scalar multiplication over binary and prime fields.
more …
