Exploiting GPUs to speed up cryptanalysis and machine learning

Cianfriglia, Marco

Please use this identifier to cite or link to this item: http://hdl.handle.net/2307/40404

Title:	Exploiting GPUs to speed up cryptanalysis and machine learning
Authors:	Cianfriglia, Marco
Advisor:	BERNASCHI, MASSIMO
Keywords:	DATA DRIVEN MACHINE LEARNING CUBE ATTACK GPU CRYPTANALYSIS
Issue Date:	13-Apr-2018
Publisher:	Università degli studi Roma Tre
Abstract:	In the past cryptography was mainly used by military organizations and governments for secure communication. However today almost everyone uses it everywhere, even if most people do not know it. Cryptography is used in many different scenarios; we use it when we login to a secure website or to our laptops, or when we digitally sign a document; cryptocurrencies like Bitcoin rely on it [1]; we use cryptography even when we chat with friends. As different situations may have different requirements and constraints, there exist various kind of ciphers to answer different needs. As an example, where resources are limited and speed is critical, we usually rely on Stream Ciphers. They are symmetric key ciphers that try to mimic the properties of the perfectly secure cipher One Time Pad (OTP) [2], by building a mechanism that is able to produce a very long pseudorandom keystream that should look like a truly random sequence of bits. This keystream is then xored with the message we want to encrypt. The keystream is serially generated starting from an initial random seed (the key and the IV) that is used to initialize one or more Feedback Shift Registers (FSR) and sometimes some Finite State Machines (FSM) [3, 4]. A stream cipher is said to be secure if any passive attacker can not learn anything about the stream cipher internal state and the key by looking at the keystream bits. Furthermore, the period1 of the cipher should be long enough to avoid reusing the same keystream for more than one encryption. In literature [5] there exist many cryptanalysis techniques that focus on testing the security of Stream ciphers, like Correlation attacks [6], Fast Correlation attacks [7], Algebraic attacks[8] and, the Cube attack [9]. The latter, presented in 2009 by Dinur and Shamir, is a very fascinating technique that can be applied virtually to any cipher since its unique requirement is black-box access to the target cipher. The idea is to combine offline exhaustive searches over selected tweakable public/IV bits (the sides of the ”cube”) with an online key-recovery phase. Cubes computation grows exponentially with the dimension of the cubes, so many approaches have been proposed in the literature to optimize this process [9, 10, 11, 12]; although, none of them fully investigates the potentiality of Graphics Processing Units (GPUs). Since the early times of their usage as general purpose computing devices, GPUs have been considered ideal candidates for cryptanalysis tasks as the intensive computations usually required for cryptanalysis can benefit of the high level of parallelism and the computational power available on GPUs. Indeed, many works exist in the literature as evidence that they can be successfully used [13, 14, 15]. In order to leverage GPUs, it is mandatory to carefully design and implement the applications by taking into account their peculiar characteristics. In the first part of the present thesis work, the first Cube Attack Framework tailored on GPUs [16] is presented. It has been designed and implemented as an optimized solution that is able to perform all the steps needed by the attack; all the design choices are discussed, detailing their respective advantages and difficulties. The framework can virtually support any cipher as the only requirement is development of a GPU version of the target cipher. It has been successfully validated on two different stream ciphers, Trivium [17] and Grain-128 [18]. Even though the attack against Trivium has been ran with only a few preliminarily sets of Initialization Vector (IV) bits (i.e. the cubes) - specifically selected to both validate the code and compare the obtained results with the literature- the findings improve the state-ofthe-art for attacks against reduced-round version of Trivium; here there are presented the first full-key recovery for Trivium up to 781 initialization rounds without brute-force, and the first ever linear equation binding only key bits yields after 800 initialization rounds. Moreover, thanks to the framework, few new candidate linear equations in key bits for both Trivium and Grain-128, respectively after 800 initialization rounds and after 160 initialization rounds, have been discovered; the detailed description and analysis of these results are going to released soon in a work that is currently being finalized. Furthermore, the presented implementation allows for exhaustively assigning values to (subsets of) public variables with negligible additional costs. This approach allows to potentially weakening the assumption of a completely tweakable IV that has been done in previous works [9]. Eventually, the evaluations of the framework on the computational speedup with respect to a CPU-parallel benchmark, the performance dependence on system parameters and GPU architectures (Nvidia Kepler vs Nvidia Pascal), and the scalability of the proposed solution on Multi-GPU systems [19] have been reported. By exhibiting the benefits of a complete GPU-tailored implementations of the cube attack, this thesis work provides novel and strong elements in support of the general feasibility of the attack, thus paving the way for future work in the area. In the second part of this dissertation, an automatic optimization solution for datadriven applications is presented. This solution has been developed by the canditate while he worked at Dividiti L.t.d., as research intern. A new framework for data-driven adaptive libraries has been designed and developed. It generates a pluggable runtime system based on Machine Learning that is able to choose for the current input the ’optimal’ choice, according to a selected metrics, among a set known best choices for other inputs. In order to build the prediction model, the framework relies on the scikit-learn library [20] to generate several Decision Tree Classifiers [21]. Three datasets have been collected on two different architectures, Nvidia Pascal and Arm Mali Midgard and they have been used as training data to build the prediction models. The proposed framework automatically extracts the decision rules from the models generating the corresponding source code for the runtime system. This auto-generated code is then responsible to pick up the configuration according to the model rules. Moreover, a proof of concept based on the open-source OpenCL library CLBlast [22] has been proposed. The proof of concept shows how the proposed framework can be used to optimize matrix multiplication routines for unpredictable matrix sizes. Matrix multiplication is a core routine for many problems ranging from Machine Learning [23], simulation and cryptography. The runtime system generated by the proposed framework provides a speed up respect to the standard CLBlast up to 3× on Nvidia Pascal (Tesla P100) and 2.5× on ARM Midgard (Mali-T860). Moreover the presented framework can generate an optimized runtime system for any target library with a limited effort. References [1] S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system,” 03 2009. [2] J. Katz and Y. Lindell, Introduction to Modern Cryptography (Chapman & Hall/Crc Cryptography and Network Security Series). Chapman & Hall/CRC, 2007. [3] ETSI/SAGE, “Specification of the 3GPP Confidentiality and Integrity Algorithms UEA2 & UIA2. Document 2: SNOW 3G specification, version 1.1.” http://www.3gpp.org/ftp/, September 2006. [4] ETSI/SAGE, “Specification of the 3GPP Confidentiality and Integrity Algorithms128-EEA3 & 128EIA3; Document 2: ZUC specification.” http://www.3gpp.org/ftp/, June 2011. [5] A. Joux, Algorithmic Cryptanalysis. Chapman & Hall/CRC, 1st ed., 2009. [6] T. Siegenthaler, “Decrypting a Class of Stream Ciphers Using Ciphertext Only,” IEEE Trans. Comput., vol. 34, pp. 81–85, Jan. 1985. [7] W. Meier and O. Staffelbach, “Fast correlation attacks on certain stream ciphers,” Journal of Cryptology, vol. 1, pp. 159–176, Oct 1989. [8] N. Courtois and W. Meier, “Algebraic attacks on stream ciphers with linear feedback,” in Advances in Cryptology - EUROCRYPT 2003, International Conference on the Theory and Applications of Cryptographic Techniques, Warsaw, Poland, May 4-8, 2003, Proceedings, pp. 345–359, 2003. [9] I. Dinur and A. Shamir, “Cube attacks on tweakable black box polynomials,” in Advances in CryptologyEUROCRYPT 2009, pp. 278–299, Springer, 2009. [10] P.-A. Fouque and T. Vannet, Improving Key Recovery to 784 and 799 Rounds of Trivium Using Optimized Cube Attacks, pp. 502–517. Berlin, Heidelberg: Springer Berlin Heidelberg, 2014. [11] J. Aumasson, I. Dinur, L. Henzen, W. Meier, and A. Shamir, “Efficient FPGA implementations of highdimensional cube testers on the stream cipher grain-128,” IACR Cryptology ePrint Archive, vol. 2009, p. 218, 2009. [12] I. Dinur, T. G¨uneysu, C. Paar, A. Shamir, and R. Zimmermann, “An experimentally verified attack on full Grain-128 using dedicated reconfigurable hardware,” in Proceedings of the 17th International Conference on The Theory and Application of Cryptology and Information Security, ASIACRYPT’11, (Berlin, Heidelberg), pp. 327–343, Springer-Verlag, 2011. [13] R. Szerwinski and T. G¨uneysu, Exploiting the Power of GPUs for Asymmetric Cryptography, pp. 79–99. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008. [14] M. Bisson, Combining Algortihms and Technologies to Speedup Computing Intensive Applications. PhD thesis, Department of Computer Science - Sapienza University of Rome, 2010. [15] M. Bernaschi, M. Bisson, E. Gabrielli, and S. Tacconi, “An Architecture for Distributed Dictionary Attacks to Cryptosystems,” JCP, vol. 4, no. 5, pp. 378–386, 2009. [16] M. Cianfriglia, S. Guarino, M. Bernaschi, F. Lombardi, and M. Pedicini, A Novel GPU-Based Implementation of the Cube Attack, pp. 184–207. Cham: Springer International Publishing, 2017. [17] C. De Canni`ere, Trivium: A Stream Cipher Construction Inspired by Block Cipher Design Principles, pp. 171–186. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006. [18] M. Hell, T. Johansson, A. Maximov, and W. Meier, “A Stream Cipher Proposal: Grain-128,” in 2006 IEEE International Symposium on Information Theory, pp. 1614–1618, July 2006. [19] M. Cianfriglia and S. Guarino, “Cryptanalysis on GPUs with the Cube Attack: Design, Optimization and Performances Gains,” in 2017 International Conference on High Performance Computing Simulation (HPCS), pp. 753–760, July 2017. [20] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine Learning in Python,” J. Mach. Learn. Res., vol. 12, pp. 2825– 2830, Nov. 2011. [21] T. M. Mitchell, Machine Learning. New York, NY, USA: McGraw-Hill, Inc., 1 ed., 1997. [22] C. Nugteren, “CLBlast: A Tuned OpenCL BLAS library,” CoRR, vol. abs/1705.05249, 2017. [23] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9, June 2015.
URI:	http://hdl.handle.net/2307/40404
Access Rights:	info:eu-repo/semantics/openAccess
Appears in Collections:	Dipartimento di Matematica e Fisica T - Tesi di dottorato

Files in This Item:

File	Description	Size	Format
thesis.pdf		1.13 MB	Adobe PDF	View/Open

Show full item record Recommend this item

Page view(s)

262

checked on Nov 20, 2024

Download(s)

55

checked on Nov 20, 2024

Google Scholar^TM

Check

Files in This Item:

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM