Cryptography Explained: From Caesar Cipher to AES and TLS

Cryptography is the mathematical foundation of all digital security. Every HTTPS connection, every encrypted disk, every password stored in a database relies on cryptographic algorithms. Yet most developers and security professionals treat cryptography as a black box — they know to use AES or TLS but have only a vague understanding of how or why it works. This guide demystifies cryptography from historical ciphers to modern algorithms, with real examples of where it fails.

From Caesar to XOR: The Foundations

Julius Caesar used a substitution cipher: shift every letter by 3. A becomes D, B becomes E. This is trivially broken by frequency analysis — in English, E is the most common letter, so the most common ciphertext letter is likely E shifted by whatever the key is. Modern cryptography evolved specifically to eliminate this kind of statistical pattern.

# Caesar cipher in Python
def caesar_encrypt(text, shift):
    result = ""
    for char in text:
        if char.isalpha():
            shifted = ord(char) + shift
            if char.isupper():
                result += chr((shifted - 65) % 26 + 65)
            else:
                result += chr((shifted - 97) % 26 + 97)
        else:
            result += char
    return result

# XOR cipher: the simplest symmetric cipher
# XOR key with each byte -- basis of stream ciphers
# Critical property: A XOR B XOR B = A (reversible)
def xor_cipher(data: bytes, key: bytes) -> bytes:
    return bytes(b ^ key[i % len(key)] for i, b in enumerate(data))

# XOR weakness: if key is short and message is long, frequency analysis breaks it
# If same key is reused for two messages: msg1 XOR msg2 = cipher1 XOR cipher2

Symmetric Encryption: AES

AES (Advanced Encryption Standard) is the workhorse of modern symmetric encryption. The same key encrypts and decrypts. It operates on 128-bit blocks with 128, 192, or 256-bit keys. Understanding the different modes of operation is critical — the wrong mode choice in code can completely undermine AES security.

# ECB Mode (Electronic Code Book) -- NEVER USE THIS
# Problem: identical plaintext blocks produce identical ciphertext blocks
# This is why the "ECB penguin" demo works:
# An image encrypted with AES-ECB still visually reveals the outline of the penguin
# because identical pixel blocks encrypt to identical ciphertext blocks

from Crypto.Cipher import AES
# WRONG - ECB reveals patterns:
cipher = AES.new(key, AES.MODE_ECB)

# CORRECT - CBC (Cipher Block Chaining) with random IV:
import os
iv = os.urandom(16)  # ALWAYS use a random IV
cipher = AES.new(key, AES.MODE_CBC, iv)
ciphertext = cipher.encrypt(padded_plaintext)
# Store IV with ciphertext (it is not secret, just must be random and unique)

# BEST - GCM mode (authenticated encryption):
iv = os.urandom(12)
cipher = AES.new(key, AES.MODE_GCM, nonce=iv)
ciphertext, tag = cipher.encrypt_and_digest(plaintext)
# Tag authenticates the ciphertext -- detects tampering

Asymmetric Encryption: RSA and Why Key Size Matters

RSA uses a mathematically linked key pair: a public key that anyone can use to encrypt, and a private key only you hold to decrypt. Security relies on the difficulty of factoring large numbers. A 512-bit RSA key can be factored in hours with modern hardware. A 2048-bit key is currently safe; 4096-bit is paranoid-safe.

# Generate RSA key pair (use at least 2048 bits)
openssl genrsa -out private.pem 2048
openssl rsa -in private.pem -pubout -out public.pem

# Encrypt file with recipient's public key
openssl rsautl -encrypt -inkey public.pem -pubin -in message.txt -out message.enc

# Decrypt with private key
openssl rsautl -decrypt -inkey private.pem -in message.enc -out message.txt

# RSA is slow -- never use it to encrypt large data directly
# Instead: encrypt a random AES session key with RSA, use AES for the data
# This is exactly what TLS does

Hashing: One-Way Functions and How They Fail

# MD5 -- BROKEN for security purposes (collision attacks exist)
# SHA-1 -- BROKEN for security purposes (collision demonstrated 2017)
# SHA-256 -- SAFE for data integrity
# SHA-3 -- SAFE, different design (Keccak sponge construction)
# bcrypt/scrypt/Argon2 -- REQUIRED for password storage (slow by design)

# Why bcrypt for passwords, not SHA-256?
# SHA-256 of "password123" takes 0.0001 seconds
# Argon2 of "password123" takes 0.3 seconds (configurable)
# Attacker can try SHA-256 at 10 billion guesses/second with a GPU
# Argon2 limits attacker to ~3 guesses/second -- game changer

import bcrypt
# Hash a password
hashed = bcrypt.hashpw(b"mypassword", bcrypt.gensalt(rounds=12))

# Verify (use constant-time comparison to prevent timing attacks)
is_valid = bcrypt.checkpw(b"mypassword", hashed)

# Length extension attack on SHA-256 (CTF/real-world relevance):
# If HMAC = SHA256(secret || message), attacker can forge HMAC for extended message
# Fix: use HMAC properly: HMAC = SHA256(key XOR opad || SHA256(key XOR ipad || message))

TLS: How HTTPS Actually Works

Every HTTPS connection performs a TLS handshake. Understanding this handshake explains why certificate pinning matters, why expired certificates cause outages, and why cipher suite choices affect security:

  1. Client sends: supported TLS versions and cipher suites
  2. Server responds: chosen cipher suite + its certificate (public key)
  3. Key exchange: client and server agree on a session key (using ECDHE for forward secrecy)
  4. All further communication encrypted with AES-GCM using the session key
# Analyze TLS configuration of a server
openssl s_client -connect plainlysec.com:443 -showcerts 2>/dev/null | head -40

# Check for weak cipher suites with testssl.sh
./testssl.sh plainlysec.com

# What to look for:
# - TLS 1.3 or 1.2 only (disable TLS 1.0 and 1.1)
# - ECDHE key exchange (forward secrecy)
# - AES-GCM or ChaCha20-Poly1305 (authenticated encryption)
# - No RC4, DES, 3DES (broken ciphers)
# - Certificate uses SHA-256 signature (not SHA-1)

Cryptography does not fail because the mathematics breaks — it fails because developers misuse it. The wrong mode, the wrong hash function for passwords, the hardcoded key, the reused nonce — these implementation errors are the real threat. Understanding the principles behind the algorithms helps you recognize when code is making a dangerous choice.