Modern day applications rely heavily on data, and it goes without saying that protecting this data is vital for developers / companies.
Encryption, encoding, hashing, obfuscation, these are a few terms when it comes to data security which are almost used synonymously by everyone. However, they are pretty different in reality. To explain it simply, the terms can be understood in the following way:
- Encryption: When you take a piece of information / data and rewrite it in a way that can’t be understood, therefore keeping the original data a “secret”. To retrieve this information, you may have to use the same methods in reverse to “decrypt” the information.
- Encoding: When you transfer the same piece of information in a different format. For example, you convert an ASCII string into a Base64 string, the characters used in ASCII are different as compared to B64, causing the two strings to look different.
- Hashing: When you use mathematical algorithms / functions, which can’t be reversed, to generate a unique string of a given length for any piece of information that you input.
- Obfuscation: When you scramble your code / information in such a way that isn’t easy to copy, understand or change it, even though that information might be functional within the system.
To understand it further, we can use the following example of a given string and how it behaves in each of these processes.
Let’s take the string “Hello Yuv, the world is boring now”. To understand the difference between encryption, hashing, encoding and obfuscation, I have created a simple example using JavaScript (Node.js).
Each of the examples employ the following structure:
- The main code is written inside a function, and the function requires a string as an input parameter to run
- The function is exported from the file by default
- The line
(require.main === module)
checks if the file has been executed in the environment, if so, it will execute the code enclosed inside the conditional code block. In these examples, it simply executes the function. - The standard string is saved in a file called vars.js, and this string is used inside the module whenever the module is run as an individual script
Encoding
Every piece of information is stored in binary. To represent that information, there are certain formats that exist within the systems. The most common one is the ASCII (American Standard Code for Information Interchange) encoding format. The idea behind using encoding is to make sure that the information that we intend to share, has support for that format and is able to display that piece of information as it is.
In the code example, we will deal with UTF-8 (UCS Transformation Format 8) encoding to Base64 encoding format, and then back to UTF-8 format.
Since JavaScript uses inbuilt methods for encoding / decoding in Base64, we don’t need to import any extra modules.
Obfuscation
Obfuscation is much more similar to encoding, where the same binary information is presented in a system, except that it’s like a game of scramble. Furthermore, the system understands the scramble really well and runs anyway.
The comment enclosed in the multi-line comment is the actual code that was written. After performing a simple obfuscation, the same piece of code is compiled into something incomprehensible at first glance. Some real nerds might be able to understand the code after a lot of head-scratching and head-banging, but in general, this type of code is only meant to be understood by the machine.
Note: Obfuscation can have varying results on the type of code you are writing and what it’s supposed to do. Mostly, it’s not advised as it might cause breaking changes throughout the codebase.
Encryption
The simplest way to understand encryption is by the example of lock and key. You write something, put it in a box and lock it. The key is with you, and an identical key with your friend. You send the box to your friend, who opens the box with the key and reads your letter. The box encrypts the letter and makes sure that the letter is only viewed by the ones who possess the key to open the box. It’s possible that one of you loses the key, and some other person uses the key to open the box and access the contents. Technically, it’s not wrong as the intended recipient is the person with the key. However, ethically it’s wrong since in this cycle of communication, the third person was an intruder.
In practice, particularly in coding, encryption is much more different. You can make sure your data is encrypted using various standards and protocols, generate key-pairs in various formats and communicate securely within the application ecosystem.
Node.js doesn’t have in-built methods for encryption, so it relies on external libraries. The most commonly used and available library is crypto, which contains a number of classes and methods for performing symmetric and asymmetric (hashing) encryption. In this example, we are performing a key-based symmetric encryption. We start by generating a private-public key pair with a given cipher name and key options. We then sign (encrypt) the string with a public or private key, in this example we are signing the string using the private key. To decrypt the signed string, we use the corresponding public key, which has to be part of the key-pair. A different public key value will result in a broken decryption, leading to an error being thrown off in the system. Trying to understand with the earlier analogy, one could just extend the lock-key example to a double lock-key, where the case holding the letter is locked using two locks.
Hashing
Hashing can be closely understood to shredding a piece of coloured paper. If you see two strands of shredded paper, you can tell that they might belong to the same original sheet, though the actual data is now lost amongst other strands. Similarly, hashing downsizes your data into a fixed length, which by itself doesn’t mean anything, and is irreversibly unique to your data.
Node.js uses the crypto
library in order to perform encryption / hashing related functions. We will be using HMAC (Hash Message Authentication Code) for hashing our string. First, an instance of the HMAC object has to be created by providing the hash function name, as well as a key for creating the hash. Using the update()
method, we can add the string which needs hashing. The digest()
method allows us to retrieve the hash in the mentioned encoding. Usually the hex encoding is used, however, developers are free to explore other formats.
Note
In the code repository, the main entry point file (index.js
) behaves a little differently than the main modules. Its main objective is to display each of the modules at once, with the source code and its corresponding output. The principle of exporting the function from each of the module comes handy here, as every module is simply imported as a function and executed as expected. The file is displayed here for reference
All in all, it is vital to have build secure applications, no matter what scale it is.
I have compiled a GitHub Gist which contains some resources to get started with data security at a beginner level.