Data Structures Interview Questions - Computer Science | Skill-Lync Resources

Only 42 Seats Left!

Data Structures Interview Questions

Arrays, linked lists, trees, graphs, hash tables, and advanced data structures

50 Questions
15 Easy
20 Medium
15 Hard
Arrays & Strings Linked Lists Stacks & Queues Trees & Binary Trees Graphs Hash Tables Heaps & Priority Queues Advanced Structures
1

What is the difference between an array and a linked list?

Easy

Arrays store elements in contiguous memory locations with O(1) random access but O(n) insertion/deletion, while linked lists store elements in non-contiguous nodes connected by pointers with O(n) access but O(1) insertion/deletion at known positions. Arrays are better for frequent access operations, while linked lists excel when frequent insertions and deletions are needed.

Subtopic: Arrays & Strings
Relevant for: Software EngineerBackend DeveloperFull Stack Developer
View full answer
2

What is a stack and what are its main operations?

Easy

A stack is a linear data structure following Last-In-First-Out (LIFO) principle where elements are added and removed from the same end called the top. Main operations are push (add element to top), pop (remove top element), peek/top (view top element without removing), and isEmpty (check if stack is empty). All these operations have O(1) time complexity.

Subtopic: Stacks & Queues
Relevant for: Software EngineerBackend DeveloperApplication Developer
View full answer
3

What is a queue and how does it differ from a stack?

Easy

A queue is a linear data structure following First-In-First-Out (FIFO) principle where elements are added at the rear and removed from the front. Unlike a stack (LIFO), a queue processes elements in the order they were added. Main operations are enqueue (add to rear), dequeue (remove from front), front (view first element), and isEmpty, all with O(1) complexity.

Subtopic: Stacks & Queues
Relevant for: Software EngineerBackend DeveloperSystems Developer
View full answer
4

What is a hash table and how does it work?

Easy

A hash table is a data structure that maps keys to values using a hash function to compute an index into an array of buckets or slots. The hash function converts the key into an array index, providing average O(1) time complexity for insertion, deletion, and lookup. Hash tables handle collisions using techniques like chaining (linked lists at each bucket) or open addressing (probing for next available slot).

Subtopic: Hash Tables
Relevant for: Software EngineerBackend DeveloperFull Stack Developer
View full answer
5

What is a binary tree and what are its properties?

Easy

A binary tree is a hierarchical data structure where each node has at most two children, referred to as left and right child. Key properties include: the topmost node is the root, nodes with no children are leaves, the height is the longest path from root to leaf, and a binary tree with n nodes has n-1 edges. Binary trees are used for hierarchical data representation, searching, and expression evaluation.

Subtopic: Trees & Binary Trees
Relevant for: Software EngineerBackend DeveloperAlgorithm Developer
View full answer
Get IIT Jammu PG Certification
IIT Certified

Get IIT Jammu PG Certification

Master these concepts with 175+ hours of industry projects and hands-on training.

6

What is the difference between singly and doubly linked lists?

Easy

A singly linked list has nodes with data and a pointer to the next node only, allowing traversal in one direction. A doubly linked list has nodes with pointers to both next and previous nodes, enabling bidirectional traversal. Doubly linked lists use more memory but allow O(1) deletion when you have a reference to the node, while singly linked lists require O(n) to find the previous node.

Subtopic: Linked Lists
Relevant for: Software EngineerBackend DeveloperSystems Developer
View full answer
7

What is a Binary Search Tree (BST) and what property does it maintain?

Easy

A Binary Search Tree is a binary tree where for each node, all values in its left subtree are smaller and all values in its right subtree are larger (BST property). This ordering enables efficient searching, insertion, and deletion with O(log n) average time complexity. An in-order traversal of a BST yields elements in sorted order, making it useful for maintaining sorted data dynamically.

Subtopic: Trees & Binary Trees
Relevant for: Software EngineerBackend DeveloperAlgorithm Developer
View full answer
8

What is a graph data structure and what are its components?

Easy

A graph is a non-linear data structure consisting of vertices (nodes) and edges that connect pairs of vertices. Graphs can be directed (edges have direction) or undirected, weighted (edges have values) or unweighted. Components include vertices (V), edges (E), and optionally weights. Graphs represent relationships like social networks, maps, and dependencies, with common representations being adjacency matrix and adjacency list.

Subtopic: Graphs
Relevant for: Software EngineerBackend DeveloperData Engineer
View full answer
9

What is a heap and what are its types?

Easy

A heap is a complete binary tree that satisfies the heap property: in a max-heap, each parent is greater than or equal to its children, and in a min-heap, each parent is less than or equal to its children. Heaps provide O(1) access to the maximum/minimum element and O(log n) insertion and deletion. They are commonly implemented using arrays and are the basis for heap sort and priority queues.

Subtopic: Heaps & Priority Queues
Relevant for: Software EngineerBackend DeveloperAlgorithm Developer
View full answer
10

What are the time complexities for common array operations?

Easy

For arrays: Access by index is O(1), Search (unsorted) is O(n), Search (sorted with binary search) is O(log n), Insertion at end (with space) is O(1), Insertion at beginning/middle is O(n) due to shifting, Deletion is O(n) due to shifting. Arrays excel at random access but are costly for insertions and deletions that require element shifting.

Subtopic: Arrays & Strings
Relevant for: Software EngineerBackend DeveloperFull Stack Developer
View full answer
11

What are the different types of tree traversal?

Easy

Tree traversals visit all nodes systematically: In-order (left, root, right) gives sorted order for BST, Pre-order (root, left, right) is used for creating tree copies, Post-order (left, right, root) is used for deleting trees, and Level-order (breadth-first) visits nodes level by level using a queue. The first three are depth-first traversals using recursion or stack, while level-order uses a queue.

Subtopic: Trees & Binary Trees
Relevant for: Software EngineerBackend DeveloperAlgorithm Developer
View full answer
12

What is a hash collision and how can it be handled?

Easy

A hash collision occurs when two different keys produce the same hash value, mapping to the same index in the hash table. Common handling methods are: Chaining (storing collided elements in a linked list at that index), Open Addressing with Linear Probing (finding next empty slot), Quadratic Probing (using quadratic function to find slots), and Double Hashing (using second hash function). Good hash functions minimize collisions.

Subtopic: Hash Tables
Relevant for: Software EngineerBackend DeveloperSystems Developer
View full answer
13

What is a circular queue and why is it useful?

Easy

A circular queue is a linear data structure that connects the last position back to the first position, forming a circle. Unlike regular queues that waste space after dequeue operations, circular queues reuse freed spaces by wrapping around. This makes them memory-efficient and ideal for buffering (like CPU scheduling, IO buffers) where fixed-size storage with continuous reuse is needed.

Subtopic: Stacks & Queues
Relevant for: Software EngineerSystems DeveloperEmbedded Developer
View full answer
14

What is the difference between adjacency matrix and adjacency list for graphs?

Easy

An adjacency matrix is a 2D array where matrix[i][j] indicates an edge between vertices i and j, using O(V^2) space with O(1) edge lookup but slow for sparse graphs. An adjacency list uses an array of lists where each vertex stores its neighbors, using O(V+E) space with O(degree) edge lookup, efficient for sparse graphs. Matrix suits dense graphs; list suits sparse graphs and most real-world scenarios.

Subtopic: Graphs
Relevant for: Software EngineerBackend DeveloperAlgorithm Developer
View full answer
15

What is a priority queue and how does it differ from a regular queue?

Easy

A priority queue is an abstract data type where each element has a priority, and elements with higher priority are served before elements with lower priority regardless of their insertion order. Unlike regular queues (FIFO), priority queues process based on priority. They are typically implemented using heaps for O(log n) insertion and O(1) peek of highest priority. Common uses include task scheduling, Dijkstra's algorithm, and event-driven simulation.

Subtopic: Heaps & Priority Queues
Relevant for: Software EngineerBackend DeveloperSystems Developer
View full answer
3,000+ Engineers Placed at Top Companies
Placements

3,000+ Engineers Placed at Top Companies

Join Bosch, Tata Motors, L&T, Mahindra and 500+ hiring partners.

16

How would you implement an LRU (Least Recently Used) cache?

Medium

An LRU cache can be efficiently implemented using a combination of a hash map and a doubly linked list. The hash map provides O(1) access to cache entries, while the doubly linked list maintains the order of usage with most recently used at the head. On access, move the node to the head; on insertion when full, remove the tail node. This gives O(1) for both get and put operations.

Subtopic: Advanced Structures
Relevant for: Software EngineerBackend DeveloperSystems Developer
View full answer
17

How do you detect a cycle in a linked list?

Medium

The optimal approach is Floyd's Cycle Detection (tortoise and hare) algorithm using two pointers: slow moves one step at a time, fast moves two steps. If there's a cycle, they will eventually meet; if fast reaches null, there's no cycle. Time complexity is O(n), space is O(1). To find the cycle start, after detection, move one pointer to head and advance both by one until they meet again.

Subtopic: Linked Lists
Relevant for: Software EngineerBackend DeveloperAlgorithm Developer
View full answer
18

How do you check if a binary tree is balanced?

Medium

A balanced binary tree has the height difference between left and right subtrees of every node at most 1. The optimal O(n) solution uses a bottom-up approach: recursively compute height while checking balance, returning -1 if imbalanced. For each node, get heights of left and right subtrees; if either is -1 or their difference exceeds 1, return -1; otherwise return max(left, right) + 1.

Subtopic: Trees & Binary Trees
Relevant for: Software EngineerBackend DeveloperAlgorithm Developer
View full answer
19

Compare BFS and DFS for graph traversal. When would you use each?

Medium

BFS (Breadth-First Search) explores neighbors level by level using a queue, optimal for finding shortest path in unweighted graphs and level-order traversal. DFS (Depth-First Search) explores as deep as possible before backtracking using recursion/stack, better for topological sorting, cycle detection, and path finding. BFS uses O(V) space for the queue; DFS uses O(h) stack space where h is the maximum depth.

Subtopic: Graphs
Relevant for: Software EngineerBackend DeveloperAlgorithm Developer
View full answer
20

How would you implement a stack using two queues?

Medium

Use two queues q1 and q2. For push: add element to q2, then move all elements from q1 to q2, then swap q1 and q2. For pop: simply dequeue from q1. This makes push O(n) and pop O(1). Alternative approach: make push O(1) and pop O(n) by adding to q1 directly, and for pop, move all except last element to q2, dequeue last, then swap.

Subtopic: Stacks & Queues
Relevant for: Software EngineerBackend DeveloperApplication Developer
View full answer
21

How do you find the Lowest Common Ancestor (LCA) in a binary tree?

Medium

For a general binary tree, use recursion: if current node is null or matches either target, return it. Recursively find LCA in left and right subtrees. If both return non-null, current node is LCA. If only one returns non-null, return that. Time complexity is O(n). For BST, leverage ordering: if both targets are smaller, go left; if both larger, go right; otherwise current node is LCA with O(h) time.

Subtopic: Trees & Binary Trees
Relevant for: Software EngineerBackend DeveloperAlgorithm Developer
View full answer
22

What is the load factor in a hash table and why is it important?

Medium

Load factor is the ratio of number of entries to the number of buckets (n/capacity). It measures how full the hash table is. A higher load factor means more collisions and slower operations; typical threshold is 0.75. When exceeded, the table is resized (usually doubled) and all elements are rehashed. Balancing load factor is crucial: too low wastes memory, too high degrades performance from O(1) toward O(n).

Subtopic: Hash Tables
Relevant for: Software EngineerBackend DeveloperSystems Developer
View full answer
23

How do you reverse a linked list iteratively and recursively?

Medium

Iteratively: use three pointers (prev, curr, next). In each step, save curr.next, point curr.next to prev, move prev to curr, move curr to saved next. Continue until curr is null; prev is the new head. Recursively: base case returns head when null or single node. Recursively reverse rest, then set head.next.next = head and head.next = null. Both are O(n) time; iterative is O(1) space, recursive is O(n) stack space.

Subtopic: Linked Lists
Relevant for: Software EngineerBackend DeveloperAlgorithm Developer
View full answer
24

How do you validate if a binary tree is a valid BST?

Medium

Use recursion with min/max range validation. Start with range (-infinity, infinity). For each node, check if its value is within the valid range. Recursively validate left subtree with (min, node.value) and right subtree with (node.value, max). If any node violates its range, return false. Alternative: in-order traversal should produce strictly increasing sequence. Both approaches are O(n) time and O(h) space.

Subtopic: Trees & Binary Trees
Relevant for: Software EngineerBackend DeveloperAlgorithm Developer
View full answer
25

What is topological sorting and how do you implement it?

Medium

Topological sort orders vertices of a directed acyclic graph (DAG) such that for every edge u->v, u comes before v. Two approaches: Kahn's algorithm uses BFS with in-degree tracking, repeatedly removing zero in-degree vertices. DFS-based approach recursively visits all descendants before adding current node to result (then reverse). Both are O(V+E). Used for task scheduling, build systems, and course prerequisites.

Subtopic: Graphs
Relevant for: Software EngineerBackend DeveloperSystems Developer
View full answer
🎯 3,000+ Engineers Placed
Sponsored
Harshal Sukenkar

Harshal

Fiat Chrysler

Abhishek

Abhishek

TATA ELXSI

Srinithin

Srinithin

Xitadel

Ranjith

Ranjith

Core Automotive

Gaurav Jadhav

Gaurav

Automotive Company

Bino K Biju

Bino

Design Firm

Aseem Shrivastava

Aseem

EV Company

Puneet

Puneet

Automotive Company

Vishal Kumar

Vishal

EV Startup

26

Explain the heapify operation and its time complexity.

Medium

Heapify maintains heap property by moving a node to its correct position. Heapify-down (sift-down) compares parent with children and swaps with larger (max-heap) or smaller (min-heap) child, repeating until valid. Single heapify is O(log n). Building a heap from array uses bottom-up heapify on all non-leaf nodes, achieving O(n) total time (not O(n log n)) because most nodes are near the bottom with small heights.

Subtopic: Heaps & Priority Queues
Relevant for: Software EngineerBackend DeveloperAlgorithm Developer
View full answer
27

What is a Trie and when would you use it?

Medium

A Trie (prefix tree) is a tree-like structure for storing strings where each node represents a character, and paths from root represent prefixes. Each node has up to 26 children (for lowercase letters) and a flag indicating word end. Operations (insert, search, prefix search) are O(m) where m is string length. Ideal for autocomplete, spell checkers, IP routing, and when many strings share common prefixes.

Subtopic: Advanced Structures
Relevant for: Software EngineerBackend DeveloperSearch Engineer
View full answer
28

How do you detect a cycle in a directed graph?

Medium

Use DFS with three states: unvisited, visiting (in current path), and visited. For each node, mark as visiting, recurse on neighbors; if you encounter a visiting node, there's a cycle. After processing all neighbors, mark as visited. Alternative: Kahn's topological sort - if not all nodes are processed (remaining nodes have cycles), a cycle exists. Both are O(V+E). For undirected graphs, track parent during DFS.

Subtopic: Graphs
Relevant for: Software EngineerBackend DeveloperAlgorithm Developer
View full answer
29

How do you serialize and deserialize a binary tree?

Medium

Serialization converts tree to string; deserialization reconstructs it. Use pre-order traversal with null markers. Serialize: recursively write node value (or 'null' for null nodes), separated by delimiters. Deserialize: split string into tokens, use recursion - take next token, if 'null' return null, else create node with value and recursively build left and right children. Both operations are O(n) time and space.

Subtopic: Trees & Binary Trees
Relevant for: Software EngineerBackend DeveloperDistributed Systems Engineer
View full answer
30

How do you implement a queue using two stacks?

Medium

Use two stacks: inbox for enqueue, outbox for dequeue. Enqueue pushes to inbox (O(1)). Dequeue pops from outbox; if outbox is empty, transfer all elements from inbox (reversing order) then pop. Each element is moved at most twice, giving amortized O(1) per operation. This approach is useful when stack operations are the only primitives available or for certain distributed scenarios.

Subtopic: Stacks & Queues
Relevant for: Software EngineerBackend DeveloperApplication Developer
View full answer
31

How do you find the kth smallest element in a BST?

Medium

Use in-order traversal which visits BST nodes in sorted order. Iterative approach with stack: go left as far as possible while pushing nodes, pop and decrement k, if k is 0 return current value, else go right. O(H + k) time where H is height. Alternative: augment BST nodes with subtree sizes for O(H) time by comparing k with left subtree size to decide direction.

Subtopic: Trees & Binary Trees
Relevant for: Software EngineerBackend DeveloperAlgorithm Developer
View full answer
32

Explain the Rabin-Karp algorithm for string matching.

Medium

Rabin-Karp uses rolling hash to find pattern in text. Compute hash of pattern and first window of text. Slide window, updating hash in O(1) using rolling hash: remove contribution of outgoing char, add incoming char. When hashes match, verify with character comparison (to handle hash collisions). Average O(n+m) time but O(nm) worst case. Efficient for multiple pattern search.

Subtopic: Arrays & Strings
Relevant for: Software EngineerBackend DeveloperAlgorithm Developer
View full answer
33

Design a stack that supports push, pop, and getMin in O(1) time.

Medium

Use two approaches: (1) Two stacks - main stack for elements, auxiliary stack for minimums. Push to both when element <= current min. Pop from both when popping the minimum. (2) Single stack with pairs storing (value, currentMin). (3) Without extra space: store encoded value = 2*value - minAtThatTime when value is new minimum; decode when popping. All achieve O(1) operations.

Subtopic: Stacks & Queues
Relevant for: Software EngineerBackend DeveloperSystems Developer
View full answer
34

What is Union-Find (Disjoint Set Union) and its optimizations?

Medium

Union-Find tracks elements partitioned into disjoint sets, supporting union (merge sets) and find (determine set membership). Basic implementation uses parent array with tree structure. Optimizations: Path Compression (during find, make all nodes point directly to root) and Union by Rank (attach smaller tree under root of larger tree). Together, they achieve near-O(1) amortized time - specifically O(alpha(n)) where alpha is inverse Ackermann function.

Subtopic: Advanced Structures
Relevant for: Software EngineerAlgorithm DeveloperCompetitive Programmer
View full answer
35

How do you find the diameter of a binary tree?

Medium

Diameter is the longest path between any two nodes (number of edges). For each node, the longest path through it is left_height + right_height. Use DFS to compute height while tracking maximum diameter seen. At each node, calculate current diameter as left_height + right_height, update global maximum, return 1 + max(left_height, right_height). Time complexity is O(n), visiting each node once.

Subtopic: Trees & Binary Trees
Relevant for: Software EngineerBackend DeveloperAlgorithm Developer
View full answer
36

Explain the Skip List data structure and its probabilistic properties.

Hard

A Skip List is a probabilistic data structure using multiple layers of sorted linked lists for O(log n) average search, insert, and delete. Each element has a random height (usually geometric distribution with p=0.5). Higher layers act as express lanes, enabling binary search-like behavior without rebalancing. Space is O(n) expected. Skip lists are simpler to implement than balanced trees and used in Redis sorted sets and LevelDB.

Subtopic: Advanced Structures
Relevant for: Senior Software EngineerSystems ArchitectDatabase Developer
View full answer
37

Explain Red-Black tree properties and why they guarantee O(log n) operations.

Hard

Red-Black trees are self-balancing BSTs with five properties: nodes are red or black, root is black, leaves (NIL) are black, red nodes have black children, and all paths from node to descendant leaves have equal black nodes. These properties ensure the longest path is at most twice the shortest (alternating red-black vs all black), guaranteeing height <= 2log(n+1). Rebalancing uses rotations and recoloring after insert/delete.

Subtopic: Trees & Binary Trees
Relevant for: Senior Software EngineerSystems DeveloperDatabase Developer
View full answer
38

Why are B-trees used in databases instead of binary trees?

Hard

B-trees are optimized for disk-based storage where I/O is the bottleneck. They have large branching factor (hundreds of children per node) matching disk block size, minimizing tree height and disk reads. A B-tree with 1000-way branching and millions of keys has height 2-3 (only 2-3 disk reads). They maintain balance automatically, support range queries efficiently, and keep data sorted. B+ trees, a variant storing data only in leaves, are most common in databases.

Subtopic: Advanced Structures
Relevant for: Database DeveloperStorage Systems EngineerBackend Architect
View full answer
39

Explain Bloom filters, their false positive rate, and applications.

Hard

Bloom filters are space-efficient probabilistic structures testing set membership. They use a bit array with multiple hash functions. Insert: set bits at hash positions. Query: check all hash positions - if all set, probably present (possible false positive); if any unset, definitely absent (no false negatives). False positive rate is approximately (1-e^(-kn/m))^k where k=hash functions, n=elements, m=bits. Used in caching, network routers, and databases to avoid expensive lookups.

Subtopic: Advanced Structures
Relevant for: Senior Software EngineerSystems ArchitectDatabase Developer
View full answer
40

What is a Segment Tree and how does it support range queries?

Hard

Segment trees store interval information for array range queries (sum, min, max) and updates in O(log n). Each node represents an interval; root is [0,n-1], children split the interval in half. Build is O(n), query combines relevant segments in O(log n), and point update propagates up in O(log n). Lazy propagation enables O(log n) range updates by deferring updates to children until needed. Used in competitive programming and databases.

Subtopic: Advanced Structures
Relevant for: Algorithm DeveloperCompetitive ProgrammerSoftware Engineer
View full answer
41

Compare AVL trees and Red-Black trees. When would you prefer one over the other?

Hard

AVL trees are more strictly balanced (height difference <= 1) than Red-Black trees (longest path <= 2x shortest), making AVL faster for lookups. However, Red-Black trees require fewer rotations for insertions/deletions (max 2 vs potentially O(log n) for AVL). Choose AVL for lookup-heavy workloads (databases) and Red-Black for modification-heavy workloads. Red-Black trees are used in C++ STL map/set, Java TreeMap; AVL in some database indices.

Subtopic: Trees & Binary Trees
Relevant for: Senior Software EngineerSystems DeveloperLanguage/Library Developer
View full answer
42

How do you find strongly connected components in a directed graph?

Hard

Kosaraju's algorithm: (1) DFS on original graph, pushing finished vertices to stack. (2) Transpose the graph. (3) Pop vertices from stack, DFS on transposed graph - each DFS tree is an SCC. O(V+E) time. Tarjan's algorithm uses single DFS with low-link values, identifying SCCs when a node's low-link equals its discovery time. Both find maximal sets where every vertex is reachable from every other vertex within the set.

Subtopic: Graphs
Relevant for: Senior Software EngineerAlgorithm DeveloperSystems Architect
View full answer
43

Explain consistent hashing and its advantages for distributed systems.

Hard

Consistent hashing maps both keys and servers to a circular hash space (ring). Keys are assigned to the next server clockwise. When a server joins/leaves, only K/n keys need remapping (K=keys, n=servers) versus all keys in traditional hashing. Virtual nodes (multiple positions per server) improve load balance. Used in distributed caches (Memcached), databases (DynamoDB, Cassandra), and CDNs for scalable data distribution.

Subtopic: Hash Tables
Relevant for: Senior Software EngineerDistributed Systems EngineerSystems Architect
View full answer
44

Explain Binary Indexed Trees (Fenwick Trees) and their applications.

Hard

Fenwick Trees support prefix sum queries and point updates in O(log n) with O(n) space, simpler than segment trees. Each index i stores sum of elements in range determined by lowest set bit of i. Update propagates to indices i + (i & -i); query sums indices i - (i & -i). Can be extended for range updates with point queries, or range updates with range queries using two BITs. Used for cumulative frequency tables and inversion counting.

Subtopic: Advanced Structures
Relevant for: Algorithm DeveloperCompetitive ProgrammerSoftware Engineer
View full answer
45

What are suffix arrays and how are they used in string processing?

Hard

Suffix arrays store lexicographically sorted indices of all suffixes of a string in O(n) space (vs O(n^2) for actual suffixes). Construction can be O(n log n) or O(n) with SA-IS algorithm. Combined with LCP (Longest Common Prefix) array, they enable O(m log n) pattern matching, finding longest repeated substring in O(n), and various string problems. They're a space-efficient alternative to suffix trees, used in bioinformatics and text indexing.

Subtopic: Arrays & Strings
Relevant for: Algorithm DeveloperBioinformatics EngineerSearch Engineer
View full answer
46

What is a Treap and how does it combine BST and heap properties?

Hard

A Treap is a randomized BST where each node has a key (BST property) and a random priority (heap property). Keys maintain BST ordering; priorities maintain max-heap ordering with parents having higher priority than children. This random structure provides expected O(log n) height without explicit balancing. Operations use tree rotations to maintain heap property after BST operations. Treaps support split and merge operations naturally, useful for implementing rope data structures.

Subtopic: Advanced Structures
Relevant for: Algorithm DeveloperSoftware EngineerCompetitive Programmer
View full answer
47

Explain LSM Trees and why they are used in write-optimized databases.

Hard

Log-Structured Merge Trees optimize write-heavy workloads by converting random writes to sequential writes. Writes go to an in-memory buffer (memtable); when full, it flushes to disk as sorted immutable file (SSTable). Background compaction merges SSTables. Reads check memtable then SSTables (aided by bloom filters). Writes are O(1) amortized, reads are slower than B-trees but acceptable with optimizations. Used in LevelDB, RocksDB, Cassandra, and HBase.

Subtopic: Advanced Structures
Relevant for: Database DeveloperStorage Systems EngineerBackend Architect
View full answer
48

What is a Count-Min Sketch and when would you use it?

Hard

Count-Min Sketch is a probabilistic data structure for frequency estimation of items in a stream using sub-linear space. It uses a 2D array with d hash functions (rows) and w counters (columns). Insert: increment counters at hash positions in each row. Query: return minimum across all rows (minimum reduces over-counting from collisions). Space is O(w*d), error is O(n/w) with probability 1-delta for d=log(1/delta). Used for network traffic monitoring, NLP word frequencies, and heavy hitters detection.

Subtopic: Advanced Structures
Relevant for: Data EngineerSystems ArchitectML Engineer
View full answer
49

Explain persistent/immutable data structures and their benefits.

Hard

Persistent data structures preserve previous versions after modifications, enabling immutability. Implemented via structural sharing - new versions share unchanged parts with old versions (copy-on-write). For example, persistent trees create new path from root to modified node, sharing other nodes. Benefits: thread safety without locks, easy undo/redo, functional programming compatibility, debugging with time travel. Clojure's data structures and React's immutable state management use this concept. Space overhead is often O(log n) per operation.

Subtopic: Advanced Structures
Relevant for: Senior Software EngineerFunctional ProgrammerSystems Developer
View full answer
50

Discuss lock-free data structures and their implementation challenges.

Hard

Lock-free data structures allow concurrent access without traditional locks, using atomic operations like CAS (Compare-And-Swap). Challenges include: ABA problem (value changes A->B->A between read and CAS, resolved with version numbers or hazard pointers), memory management (safe reclamation without garbage collection), and proving correctness (linearizability). Examples: Michael-Scott queue, Harris's linked list. Benefits: no deadlocks, better scalability, progress guarantees. Used in high-performance systems like Java's ConcurrentLinkedQueue.

Subtopic: Advanced Structures
Relevant for: Systems EngineerConcurrency ExpertPerformance Engineer
View full answer