Serialise an array of strings into a single string and losslessly recover it — the trick is length-prefixing each word as <len>#<str>, making the encoding unambiguous even when strings contain any character.
Design an encoder that turns a list of strings into a single string, and a decoder that perfectly recovers the original list — no external schema, no length info passed separately. The encode/decode pair must survive any input, including empty strings and strings that themselves contain the delimiter character.
Example: ["neet", "code", "love", "you"] → some single string → back to ["neet", "code", "love", "you"].
With length-prefix encoding the wire form is "4#neet4#code4#love3#you" — completely unambiguous to parse.
| breaks the moment a string contains that character. Instead, prefix every string with its length followed by a fixed separator (#). The decoder reads the number, skips the #, then slices exactly that many characters — no guessing, no ambiguity, no matter what the string contains.s: append ${s.length}#${s} to the output. The length is a variable-width decimal, # is the fixed boundary marker. No separator between chunks — each chunk is already self-delimiting.i = 0; loop while i < s.length: scan forward from i to find the # (at position j). Parse the integer in s[i..j]. Slice s[j+1 .. j+1+len]. Advance i = j + 1 + len.indexOf('#', i) is safe: the search starts at i, where we know the length digits begin. The very first #we hit from there is always the delimiter — even if the string's content later contains #, those are already past the boundary we just located.Both are O(n · k) where n = number of strings and k = average length, but length-prefix never touches each character more than once per direction. Total space is O(n · k) for the output, plus O(log k) bytes of overhead per string for the decimal prefix — negligible in practice.
netstringis exactly this scheme, and HTTP "chunked transfer encoding" prefixes each chunk's byte count. Anywhere you need to delimit variable-length fields in a flat stream, reach for a length header.4 strings. Each string is prefixed as <len>#<str> so the decoder always knows exactly how many chars to read.class Codec {
encode(strs: string[]): string {
// Prefix each string with "<length>#" so the decoder knows
// exactly how many characters to consume — no delimiter ambiguity.
return strs.map(s => `${s.length}#${s}`).join('');
}
decode(s: string): string[] {
const result: string[] = [];
let i = 0;
while (i < s.length) {
// Find the '#' that marks end of the length number.
const j = s.indexOf('#', i);
const len = parseInt(s.slice(i, j), 10);
// Slice exactly 'len' chars after the '#'.
result.push(s.slice(j + 1, j + 1 + len));
i = j + 1 + len;
}
return result;
}
}strs.map(s => `${s.length}#${s}`).join('') writes the decimal length, a literal #, then the string content for every element, all concatenated with no separator between chunks — each chunk is self-delimiting.i = 0. Each iteration: indexOf('#', i) finds the boundary at j.parseInt(s.slice(i, j)) reads the length. Because we start the search at i (the first digit of the number), we always land on the correct # regardless of what the string content contains.s.slice(j + 1, j + 1 + len) extracts exactly len characters after the #. Setting i = j + 1 + len moves the pointer to the first digit of the next chunk. The loop terminates when i reaches the end of the encoded string.0# and the decoder reads length 0, slices zero characters, and pushes ""."10000#" — five decimal digits. You could switch to a fixed 4-byte binary int (like most binary protocols do) to keep overhead constant, but that makes the wire format non-printable.JSON.stringify / JSON.parse works. The point of this problem is to understand how delimited serialisation actually works under the hood — JSON itself uses length-known string encoding internally."a|b" breaks a |-delimited scheme.| "encode array of strings into one string" | length-prefix: `${len}#${str}` per element |
| delimiter would break if string contains that char | switch to length-prefix — content is irrelevant |
| decode: find boundary in encoded stream | indexOf("#", ptr) then parseInt, then slice(j+1, j+1+len) |
| serialize variable-length fields (any language) | length header + payload (netstring / protobuf pattern) |
class Codec {
encode(strs: string[]): string {
return strs.map(s => `${s.length}#${s}`).join('');
}
decode(s: string): string[] {
const result: string[] = [];
let i = 0;
while (i < s.length) {
const j = s.indexOf('#', i);
const len = parseInt(s.slice(i, j), 10);
result.push(s.slice(j + 1, j + 1 + len));
i = j + 1 + len;
}
return result;
}
}encode(["neet", "code"]) produce?i = 7 in the string "4#neet3#abc". What do you do first?"5#hello" (it literally contains #). How does the decoder handle it without corruption?