Generate Parentheses: Backtracking, DP Intuition, and Pitfalls
O(4^n / sqrt(n))O(n)Write-up
Problem Summary
Youâre given n pairs of parentheses and asked to generate every string of length 2n that forms a valid parenthesis expression. Valid means parentheses are balanced and properly nested. The output should contain all unique strings, order doesnât matter.
Constraints are small but produce large outputs. n is in [1, 8]. The number of answers for n grows as the nth Catalan number, which is roughly 4^n / (n^(3/2) sqrt(pi)). Even for n = 8, there are 1430 answers.
My Approach
I like to build this family of strings using the classic Catalan recurrence. Any valid string on n pairs can be decomposed uniquely into an opening parenthesis, then a valid string on i pairs, then a closing parenthesis, then a valid string on the remaining nâ1âi pairs. If we call the set of strings on k pairs G(k), the constructive recurrence is:
- For i from 0 to nâ1, for each left in G(i) and right in G(nâ1âi), produce "(" + left + ")" + right.
This is the same structural decomposition as the standard Catalan recurrence used for counting. Here, we use it to generate. It guarantees correctness and avoids duplicates by construction. With memoization, we avoid recomputing the same sets.
Why pick this over a naĂŻve recursive concatenation of smaller answers with a set to deduplicate? Two reasons stand out.
- Uniqueness is built in. There is exactly one way a valid stringâs first matching pair wraps an inside part and follows with an outside part, so no duplicates need to be removed via a set.
- Performance is better. We donât waste time and memory forming duplicate strings and then hashing them into a set. We also avoid repeated subproblem work via memoization.
Thereâs an equally good approach that many people find more intuitive: backtracking with two counters, open and close, tracking how many parentheses you can still place while never allowing more ')' than '(' at any prefix. Both methods end up with the same asymptotic complexity dominated by the output size, and both are easy to implement.
Why This Works
A correctness argument falls out of the grammar of balanced parentheses.
- Every non-empty valid string starts with '('. Let that '(' match its corresponding ')'. Everything between them must be a valid string, call it inside. Everything after that closing ')' must also be a valid string, call it outside. The total pairs split as n = 1 + i + j, with i pairs inside and j pairs outside, where j = nâ1âi.
- This decomposition is unique. The first pair to close is determined by the position where the running balance returns to zero for the first time. That gives a canonical split into inside and outside.
- The recurrence G(n) = union over i in [0, nâ1] of "(" + G(i) + ")" + G(nâ1âi) therefore generates all valid strings exactly once each.
Induction on n completes the proof. The base is G(0) = {""}. Assuming G(k) correct and duplicate-free for k < n, then G(n) formed by wrapping every left in G(i) and following it with every right in G(nâ1âi) yields exactly the set of all valid strings of size n, without duplicates, because each valid stringâs unique inside/outside split is represented once in this construction.
The backtracking approach has a different but equally simple invariant. You build strings left to right while maintaining counts open_used and close_used. You may add '(' if open_used < n. You may add ')' if close_used < open_used. Those two rules guarantee every prefix remains valid, and when the string reaches length 2n you have a valid solution. This search explores exactly the Catalan-shaped tree.
Code Walkthrough
Below is the provided solution, which mixes recursive generation with a set for deduplication and a map intended for memoization.
C++
1 class Solution { 2 public: 3 // Dictionary (map) to cache the results for already computed n 4 unordered_map<int, vector<string>> dp; 5 6 // Function to generate well-formed parentheses combinations 7 vector<string> generateParenthesis(int n) { 8 // Base case: only one way to arrange one pair of parentheses 9 if (n == 1) return {"()"}; 10 11 // Unordered set to avoid duplicate combinations 12 unordered_set<string> s; 13 14 // Recursive call to get combinations for n-1 15 auto child = generateParenthesis(n-1); 16 // Wrap each child combination with parentheses 17 for (auto & c : child) { 18 s.insert("(" + c + ")"); 19 } 20 21 // Generate combinations by splitting into pairs 22 for (int i = 1; i < n; ++i) { 23 // Get combinations for the left and right subproblems 24 auto left = generateParenthesis(i); 25 auto right = generateParenthesis(n-i); 26 27 // Form new combinations by concatenating left and right pairs 28 for (auto &l : left) { 29 for (auto &r : right) { 30 s.insert(l + r); 31 } 32 } 33 } 34 35 // Cache the result of the computed n combinations 36 dp[n] = vector<string>(s.begin(), s.end()); 37 return dp[n]; // Return resulting combinations 38 } 39 };
What itâs doing
- It tries to assemble G(n) from smaller answers in two ways: wrapping every element of G(nâ1) and also concatenating G(i) with G(nâi) for all i in [1, nâ1]. The set s removes duplicates.
- It stores the final vector in dp[n], so future calls might reuse it.
Where it can be improved
- Missing memoization lookups. The dp map is populated but never used to short-circuit. That causes a lot of recomputation.
- Duplicate generation. The construction "wrap G(nâ1)" plus "concatenate G(i) with G(nâi)" doesnât match the unique Catalan split, so many strings are formed multiple times and then filtered by the set.
- Base cases. Generalizing to G(0) = {""} is cleaner and makes the recurrence uniform. The current base of n == 1 works under the given constraints but requires special treatment for n == 0 if ever needed.
- Complexity claims. Time is dominated by the number of outputs, which is Catalan C_n. But with unnecessary recomputation and deduplication via a set, practical performance suffers compared to a clean Catalan DP or backtracking.
A refined DP implementation using the Catalan split is below. It memoizes results for every k and never constructs duplicates. This is the exact generative version of the Catalan recurrence.
C++
1 class Solution { 2 public: 3 vector<vector<string>> memo; // memo[k] holds all strings with k pairs 4 5 vector<string> generateParenthesis(int n) { 6 memo.assign(n + 1, {}); 7 memo[0] = {""}; // empty string is the only valid string with 0 pairs 8 for (int pairs = 1; pairs <= n; ++pairs) { 9 vector<string> cur; 10 for (int i = 0; i < pairs; ++i) { 11 // inside has i pairs, outside has pairs-1-i pairs 12 for (const string& inside : memo[i]) { 13 for (const string& outside : memo[pairs - 1 - i]) { 14 cur.push_back("(" + inside + ")" + outside); 15 } 16 } 17 } 18 memo[pairs] = move(cur); 19 } 20 return memo[n]; 21 } 22 };
Key points in this refined version
- The base case is memo[0] = {""}. That makes the inner loop produce "()" when i = 0 and outside = "".
- The double loop follows the exact Catalan split. For each i, it wraps an inside value then appends an outside value.
- No sets are needed. The construction is unique, so there are no duplicates by design.
- It runs in O(C_n * n) time and O(C_n * n) space for results. Each string has length 2n, so writing them dominates.
If you prefer the backtracking approach, here is a clean version with a small state and strong invariant checks.
C++
1 class Solution { 2 public: 3 vector<string> res; 4 5 void dfs(string& cur, int open_used, int close_used, int n) { 6 if (open_used == n && close_used == n) { 7 res.push_back(cur); 8 return; 9 } 10 if (open_used < n) { 11 cur.push_back('('); 12 dfs(cur, open_used + 1, close_used, n); 13 cur.pop_back(); 14 } 15 if (close_used < open_used) { 16 cur.push_back(')'); 17 dfs(cur, open_used, close_used + 1, n); 18 cur.pop_back(); 19 } 20 } 21 22 vector<string> generateParenthesis(int n) { 23 string cur; 24 cur.reserve(2 * n); 25 dfs(cur, 0, 0, n); 26 return res; 27 } 28 };
Highlights of the backtracking version
- The rule close_used < open_used ensures every prefix is valid, so no pruning via sets is required.
- The call tree has size proportional to the Catalan number. Memory usage is small beyond the output itself.
- Reserving 2n space in cur reduces reallocations while constructing strings.
Alternative Approaches
There are a few viable ways to solve this, each with trade-offs.
-
Backtracking with prefix validity checks. This is often the simplest conceptually. Build strings by adding '(' when you still can and ')' only when it wonât make the prefix invalid. Pick this when you want concise code and predictable performance that scales with the number of outputs. It also avoids the overhead of storing all subproblems like a DP might if you only need a one-pass generator.
-
Catalan DP generation (constructive recurrence). This is my default when I want to be explicit about the structure and sometimes when I need deterministic grouping or ordering. It avoids duplicates by construction and often reads clearly when youâre thinking in terms of the inside/outside decomposition.
-
Counting via Catalan numbers. If the task was only to count how many valid strings exist for a given n, you can compute the nth Catalan number using dynamic programming on counts or via the closed-form with care for integer overflow. This doesnât generate strings but is useful for reasoning about complexity and for variations that only need the count.
-
Iterative BFS-style generation. You can maintain states by (open_used, close_used, string) in a queue and expand level by level. It mirrors the backtracking search but uses an explicit queue instead of recursion. Use it when recursion depth might be a concern or when you want to process in increasing length order for streaming.
-
Grammar-based generators and yield-once streams. If you need to stream results lazily without storing them all at once, a generator that yields as soon as it forms a full string is effective. Backtracking naturally supports this; DP can be adapted with coroutines but is more awkward.
Pitfalls and Edge Cases
This problem looks friendly, but a few patterns can derail performance or correctness.
-
Missing memoization checks. If you attempt a DP and never check the cache before recursing, youâll recompute the same sets repeatedly. Add an early return for cached values or build bottom-up.
-
Duplicate generation with a set for cleanup. If your construction can create the same string through multiple paths, a set will fix correctness but at a performance cost. Worst case, you build and hash the same long string many times before deduplication. Prefer a generation rule thatâs unique by design.
-
Incorrect split logic. The correct Catalan recurrence is "(" + G(i) + ")" + G(nâ1âi) over i in [0, nâ1]. Replacing that with arbitrary concatenations G(i) + G(nâi) produces invalid strings as well as duplicates, unless you further filter, which slows it down.
-
Skipping the n == 0 base. While this code doesnât require n == 0 because constraints start at 1, many variants or unit tests pass n == 0. Setting G(0) to the empty string smooths the recurrence and reduces special cases.
-
Output-size blind spots. The number of outputs is large, even at n = 8. Any asymptotic analysis should acknowledge that O(C_n * n) is the true lower bound because you must at least write every character of every output.
-
String concatenation overhead. Repeatedly creating new strings with operator+ is fine at this scale, but for bigger problems you might want to use a single mutable buffer per recursion path and push/pop characters, as in the backtracking version.
-
Ordering assumptions. LeetCode doesnât require a specific order for the results. If you rely on a set, your output order may look arbitrary. If you or a test harness expects a particular order, choose backtracking or a deterministic DP loop order.
-
Stack depth. Recursion depth can reach 2n in backtracking and n in DP. On typical platforms n <= 8 is safe, but itâs good practice to be aware of limits if you repurpose the technique.
Tricky inputs to think through
- n = 1. The only answer is ["()"]. Your base cases should yield this without extra work.
- n = 2. The answers are ["()()", "(())"]. A wrong split that uses G(1) + G(1) without wrapping will create duplicates and invalid sequences.
- n = 3. Confirm you get exactly 5 answers and none are duplicates.
Further Thinking
Here are several related problems and variations that use the same ideas.
-
Unique Binary Search Trees. Count or generate all BSTs with values 1..n. Hint: identical Catalan recurrence with left subtree size i and right subtree size nâ1âi.
-
Remove Invalid Parentheses. Given a string with parentheses and letters, remove the minimal number of invalid parentheses to make it valid and return all results. Hint: backtracking or BFS over deletion count while maintaining prefix validity.
-
Generate Parentheses with Multiple Types. Suppose you have n pairs of each type '()', '[]', and '{}'. Generate all well-formed strings. Hint: extend the backtracking state to track remaining counts per type and a stack to ensure correct nesting.
-
Dyck Paths of Length 2n. Generate all sequences of 'U' and 'D' steps that never drop below zero and end at zero. Hint: map '(' to 'U' and ')' to 'D'; itâs the same Catalan object.
-
Balanced Binary Words. Generate all binary strings of length 2n where every prefix has ones at least as many as zeros, and the total ones equal the total zeros. Hint: reinterpret '(' as '1' and ')' as '0'; reuse the same backtracking guard.
-
Catalan Counting With Modulo. Compute C_n modulo a prime for larger n. Hint: use DP on counts with modular arithmetic or factorials with modular inverses.
-
Serialize and Deserialize All Valid Parenthesis Trees. View each valid string as a rooted ordered tree. Generate the trees and map them to strings. Hint: "(" corresponds to descending to a child; ")" corresponds to stepping back up.
-
Generate Expressions With Constraints. Build expressions with parentheses where operator arity or precedence imposes structure. Hint: use grammar-based generation and ensure composing subexpressions follows feasibility rules similar to the inside/outside split.
The shared theme across these problems is a structure defined by a recursive grammar or a simple prefix invariant. Once you see the invariant, either a constructive DP or a disciplined backtracking search gives a clean, efficient solution.
Conditions
- 1 <= n <= 8
Examples
Example 1
- input: n = 3
- output: ["((()))","(()())","(())()","()(())","()()()"] <strong class="
Example 2
- input: n = 1
- output: ["()"]
Solution
1 class Solution { 2 public: 3 // Memoization map to store previously computed results for different n 4 unordered_map<int, vector<string>> dp; 5 6 // Main function to generate parentheses combinations for n pairs 7 vector<string> generateParenthesis(int n) { 8 // Base case: for 1 pair, the only combination is "()" 9 if (n == 1) return {"()"}; 10 11 // Set to store unique combinations of parentheses 12 unordered_set<string> s; 13 14 // Recursive call to generate combinations for (n-1) pairs and wrap all combinations with parentheses 15 auto child = generateParenthesis(n-1); 16 for (auto & c:child) { 17 s.insert("(" + c + ")"); // Wrap each child combination with outer parentheses 18 } 19 20 // Loop through different partitions of pairs to create various combinations 21 for (int i = 1; i < n; ++i) { 22 // Generate left and right parts by dividing pairs 23 auto left = generateParenthesis(i); 24 auto right = generateParenthesis(n-i); 25 26 // Combine left and right parts in all possible manners 27 for (auto &l:left) { 28 for(auto &r:right) { 29 s.insert(l + r); // Concatenate left and right combinations 30 } 31 } 32 } 33 34 // Store result in dp for memoization 35 dp[n] = vector<string>(s.begin(), s.end()); 36 return dp[n]; // Return the final unique combinations for n pairs 37 } 38 };
Key Points
- âUtilizes recursive backtracking to explore all combinations of parentheses.
- âEmploys memoization (unordered_map) to store results for previously computed n, reducing redundant calculations.
- âCombines smaller solutions to build larger solutions, illustrating a divide and conquer strategy.
Recommended Approaches
O(4^n / sqrt(n))This problem can be solved efficiently using a backtracking approach where you build the string of parentheses incrementally and only add valid combinations. This effectively reduces the search space from all possible combinations to only valid ones.
O(4^n / sqrt(n))Similar to the current approach, you can use a DP table where dp[i] stores the result for 'i' pairs. This allows bottom-up computation using known subproblem results, which can simplify the code by reducing recursive calls.
O(n) for computation, but generating combinations still takes O(4^n / sqrt(n)).This mathematical approach gives the number of well-formed parentheses combinations directly using the nth Catalan number. You can compute Catalan numbers recursively or using a closed formula, bypassing the need to generate combinations.
O(4^n / sqrt(n))An iterative method can also be used where you build the combinations by maintaining a stack structure to simulate the backtracking process. This can be more memory-efficient in some cases.
O(4^n / sqrt(n))By manipulating string indices to represent opening and closing parentheses, you can attempt to fill in combinations without using additional data structures, although this could make the code more complex and less readable.
Key Idea for Similar Problems
Use the Catalan structure to build all valid parentheses strings. A DP formulation composes "(" + left + ")" + right over splits i in [0, n-1], and a backtracking alternative prunes invalid prefixes by tracking how many '(' and ')' have been placed. Both run in output-sensitive time O(Cn * n), with memory dominated by the size of the returned list.
Related Topics
- âdp
- âbacktracking