HARD Lab · Strings · 18

271. Encode and Decode Strings

Serialise an array of strings into a single string and losslessly recover it — the trick is length-prefixing each word as <len>#<str>, making the encoding unambiguous even when strings contain any character.

MediumDesignLength PrefixTypeScript

PROBLEM What we're solving

Design an encoder that turns a list of strings into a single string, and a decoder that perfectly recovers the original list — no external schema, no length info passed separately. The encode/decode pair must survive any input, including empty strings and strings that themselves contain the delimiter character.

Example: ["neet", "code", "love", "you"] → some single string → back to ["neet", "code", "love", "you"].

With length-prefix encoding the wire form is "4#neet4#code4#love3#you" — completely unambiguous to parse.

KEY IDEA Tell the decoder how long each string is before writing it

Insight → a plain delimiter like | breaks the moment a string contains that character. Instead, prefix every string with its length followed by a fixed separator (#). The decoder reads the number, skips the #, then slices exactly that many characters — no guessing, no ambiguity, no matter what the string contains.

RECIPE Encode: prefix every string; Decode: read length, slice, advance

  • Encode — for each string s: append ${s.length}#${s} to the output. The length is a variable-width decimal, # is the fixed boundary marker. No separator between chunks — each chunk is already self-delimiting.
  • Decode — pointer i = 0; loop while i < s.length: scan forward from i to find the # (at position j). Parse the integer in s[i..j]. Slice s[j+1 .. j+1+len]. Advance i = j + 1 + len.
  • Why indexOf('#', i) is safe: the search starts at i, where we know the length digits begin. The very first #we hit from there is always the delimiter — even if the string's content later contains #, those are already past the boundary we just located.
Classic confusion →people try a single-character delimiter (comma, pipe, even null-byte) and then wonder whether strings can contain that character. They can — and escaping it adds complexity. Length-prefixing sidesteps all of this: the decoder never scans the string's content for a boundary; it counts chars.

COST Complexity & alternatives

Delimiter + escape
O(n·k)
Must scan every char to escape; fragile.
Length-prefix
O(n·k)
Same time but zero fragility; O(1) extra space per string.

Both are O(n · k) where n = number of strings and k = average length, but length-prefix never touches each character more than once per direction. Total space is O(n · k) for the output, plus O(log k) bytes of overhead per string for the decimal prefix — negligible in practice.

Pattern transfer → length-prefixing is the backbone of binary serialisation formats: Protocol Buffers use varint-prefixed byte strings, netstringis exactly this scheme, and HTTP "chunked transfer encoding" prefixes each chunk's byte count. Anywhere you need to delimit variable-length fields in a flat stream, reach for a length header.

RUN IT Encode with length-prefix, decode by counting

step 0 / 19
STARTEncoding 4 strings. Each string is prefixed as <len>#<str> so the decoder always knows exactly how many chars to read.
State
encode
phase
i
token
length prefix being written / pointer positionstring content / extracted wordcomplete / donedecode phase
slowfast

TYPESCRIPT The solution, annotated

encodeDecodeStrings.ts
class Codec {
  encode(strs: string[]): string {
    // Prefix each string with "<length>#" so the decoder knows
    // exactly how many characters to consume — no delimiter ambiguity.
    return strs.map(s => `${s.length}#${s}`).join('');
  }

  decode(s: string): string[] {
    const result: string[] = [];
    let i = 0;
    while (i < s.length) {
      // Find the '#' that marks end of the length number.
      const j = s.indexOf('#', i);
      const len = parseInt(s.slice(i, j), 10);
      // Slice exactly 'len' chars after the '#'.
      result.push(s.slice(j + 1, j + 1 + len));
      i = j + 1 + len;
    }
    return result;
  }
}

Reading it block by block

Encode — one-liner map. strs.map(s => `${s.length}#${s}`).join('') writes the decimal length, a literal #, then the string content for every element, all concatenated with no separator between chunks — each chunk is self-delimiting.
Decode — pointer loop. Start i = 0. Each iteration: indexOf('#', i) finds the boundary at j.parseInt(s.slice(i, j)) reads the length. Because we start the search at i (the first digit of the number), we always land on the correct # regardless of what the string content contains.
Slice and advance. s.slice(j + 1, j + 1 + len) extracts exactly len characters after the #. Setting i = j + 1 + len moves the pointer to the first digit of the next chunk. The loop terminates when i reaches the end of the encoded string.
Complexity → O(n · k) time for both encode and decode — each character is visited exactly once. Space is O(n · k) for the output string; no auxiliary structures beyond the pointer.

INTERVIEWFollow-ups they'll ask

  • "What if a string is empty?" Works perfectly — an empty string encodes as 0# and the decoder reads length 0, slices zero characters, and pushes "".
  • "Can the length prefix itself grow large?" For a string of length 10 000 the prefix is "10000#" — five decimal digits. You could switch to a fixed 4-byte binary int (like most binary protocols do) to keep overhead constant, but that makes the wire format non-printable.
  • "Why not just use JSON.stringify?" You could, and JSON.stringify / JSON.parse works. The point of this problem is to understand how delimited serialisation actually works under the hood — JSON itself uses length-known string encoding internally.
  • "What's the brute force?"Join with a rare delimiter and hope strings never contain it. It's wrong in the general case — a test with "a|b" breaks a |-delimited scheme.
  • "How does this relate to real protocols?"Netstrings, Protocol Buffers wire format, Redis's RESP protocol — all use length-prefixed byte strings. Chunked HTTP transfer encoding does the same with hex-length prefixes.

MNEMONIC The one-liner

"Write the length, write a hash, write the word — decode by counting, never by scanning for delimiters."

TRIGGERS When you see ___ → reach for ___

"encode array of strings into one string"length-prefix: `${len}#${str}` per element
delimiter would break if string contains that charswitch to length-prefix — content is irrelevant
decode: find boundary in encoded streamindexOf("#", ptr) then parseInt, then slice(j+1, j+1+len)
serialize variable-length fields (any language)length header + payload (netstring / protobuf pattern)

SKELETON The reusable shape

skeleton.ts
class Codec {
  encode(strs: string[]): string {
    return strs.map(s => `${s.length}#${s}`).join('');
  }

  decode(s: string): string[] {
    const result: string[] = [];
    let i = 0;
    while (i < s.length) {
      const j = s.indexOf('#', i);
      const len = parseInt(s.slice(i, j), 10);
      result.push(s.slice(j + 1, j + 1 + len));
      i = j + 1 + len;
    }
    return result;
  }
}

FLASHCARDS Tap to flip

Why does a plain delimiter (e.g. "|") fail?
A string containing that delimiter character would be split incorrectly during decode.
tap to flip
1 / 7
Answer to reveal explanations. No penalty for retries.Score: 0 / 7
QUESTION 01
What does encode(["neet", "code"]) produce?
QUESTION 02
Why can't you just join with a delimiter character like "|"?
QUESTION 03
During decode, you are at pointer i = 7 in the string "4#neet3#abc". What do you do first?
QUESTION 04
What is the time complexity of encode and decode?
QUESTION 05
How is the empty string "" encoded?
QUESTION 06
A string in the list is "5#hello" (it literally contains #). How does the decoder handle it without corruption?
QUESTION 07
Which real-world protocol most directly mirrors this scheme?