Question 1

What is the difference between an array and a linked list?

Accepted Answer

Arrays store elements in contiguous memory locations with O(1) random access but O(n) insertion/deletion, while linked lists store elements in non-contiguous nodes connected by pointers with O(n) access but O(1) insertion/deletion at known positions. Arrays are better for frequent access operations, while linked lists excel when frequent insertions and deletions are needed.

Question 2

What is a stack and what are its main operations?

Accepted Answer

A stack is a linear data structure following Last-In-First-Out (LIFO) principle where elements are added and removed from the same end called the top. Main operations are push (add element to top), pop (remove top element), peek/top (view top element without removing), and isEmpty (check if stack is empty). All these operations have O(1) time complexity.

Question 3

What is a queue and how does it differ from a stack?

Accepted Answer

A queue is a linear data structure following First-In-First-Out (FIFO) principle where elements are added at the rear and removed from the front. Unlike a stack (LIFO), a queue processes elements in the order they were added. Main operations are enqueue (add to rear), dequeue (remove from front), front (view first element), and isEmpty, all with O(1) complexity.

Question 4

What is a hash table and how does it work?

Accepted Answer

A hash table is a data structure that maps keys to values using a hash function to compute an index into an array of buckets or slots. The hash function converts the key into an array index, providing average O(1) time complexity for insertion, deletion, and lookup. Hash tables handle collisions using techniques like chaining (linked lists at each bucket) or open addressing (probing for next available slot).

Question 5

What is a binary tree and what are its properties?

Accepted Answer

A binary tree is a hierarchical data structure where each node has at most two children, referred to as left and right child. Key properties include: the topmost node is the root, nodes with no children are leaves, the height is the longest path from root to leaf, and a binary tree with n nodes has n-1 edges. Binary trees are used for hierarchical data representation, searching, and expression evaluation.

Question 6

What is the difference between singly and doubly linked lists?

Accepted Answer

A singly linked list has nodes with data and a pointer to the next node only, allowing traversal in one direction. A doubly linked list has nodes with pointers to both next and previous nodes, enabling bidirectional traversal. Doubly linked lists use more memory but allow O(1) deletion when you have a reference to the node, while singly linked lists require O(n) to find the previous node.

Question 7

What is a Binary Search Tree (BST) and what property does it maintain?

Accepted Answer

A Binary Search Tree is a binary tree where for each node, all values in its left subtree are smaller and all values in its right subtree are larger (BST property). This ordering enables efficient searching, insertion, and deletion with O(log n) average time complexity. An in-order traversal of a BST yields elements in sorted order, making it useful for maintaining sorted data dynamically.

Question 8

What is a graph data structure and what are its components?

Accepted Answer

A graph is a non-linear data structure consisting of vertices (nodes) and edges that connect pairs of vertices. Graphs can be directed (edges have direction) or undirected, weighted (edges have values) or unweighted. Components include vertices (V), edges (E), and optionally weights. Graphs represent relationships like social networks, maps, and dependencies, with common representations being adjacency matrix and adjacency list.

Question 9

What is a heap and what are its types?

Accepted Answer

A heap is a complete binary tree that satisfies the heap property: in a max-heap, each parent is greater than or equal to its children, and in a min-heap, each parent is less than or equal to its children. Heaps provide O(1) access to the maximum/minimum element and O(log n) insertion and deletion. They are commonly implemented using arrays and are the basis for heap sort and priority queues.

Question 10

What are the time complexities for common array operations?

Accepted Answer

For arrays: Access by index is O(1), Search (unsorted) is O(n), Search (sorted with binary search) is O(log n), Insertion at end (with space) is O(1), Insertion at beginning/middle is O(n) due to shifting, Deletion is O(n) due to shifting. Arrays excel at random access but are costly for insertions and deletions that require element shifting.

Question 11

What are the different types of tree traversal?

Accepted Answer

Tree traversals visit all nodes systematically: In-order (left, root, right) gives sorted order for BST, Pre-order (root, left, right) is used for creating tree copies, Post-order (left, right, root) is used for deleting trees, and Level-order (breadth-first) visits nodes level by level using a queue. The first three are depth-first traversals using recursion or stack, while level-order uses a queue.

Question 12

What is a hash collision and how can it be handled?

Accepted Answer

A hash collision occurs when two different keys produce the same hash value, mapping to the same index in the hash table. Common handling methods are: Chaining (storing collided elements in a linked list at that index), Open Addressing with Linear Probing (finding next empty slot), Quadratic Probing (using quadratic function to find slots), and Double Hashing (using second hash function). Good hash functions minimize collisions.

Question 13

What is a circular queue and why is it useful?

Accepted Answer

A circular queue is a linear data structure that connects the last position back to the first position, forming a circle. Unlike regular queues that waste space after dequeue operations, circular queues reuse freed spaces by wrapping around. This makes them memory-efficient and ideal for buffering (like CPU scheduling, IO buffers) where fixed-size storage with continuous reuse is needed.

Question 14

What is the difference between adjacency matrix and adjacency list for graphs?

Accepted Answer

An adjacency matrix is a 2D array where matrix[i][j] indicates an edge between vertices i and j, using O(V^2) space with O(1) edge lookup but slow for sparse graphs. An adjacency list uses an array of lists where each vertex stores its neighbors, using O(V+E) space with O(degree) edge lookup, efficient for sparse graphs. Matrix suits dense graphs; list suits sparse graphs and most real-world scenarios.

Question 15

What is a priority queue and how does it differ from a regular queue?

Accepted Answer

A priority queue is an abstract data type where each element has a priority, and elements with higher priority are served before elements with lower priority regardless of their insertion order. Unlike regular queues (FIFO), priority queues process based on priority. They are typically implemented using heaps for O(log n) insertion and O(1) peek of highest priority. Common uses include task scheduling, Dijkstra's algorithm, and event-driven simulation.

Question 16

How would you implement an LRU (Least Recently Used) cache?

Accepted Answer

An LRU cache can be efficiently implemented using a combination of a hash map and a doubly linked list. The hash map provides O(1) access to cache entries, while the doubly linked list maintains the order of usage with most recently used at the head. On access, move the node to the head; on insertion when full, remove the tail node. This gives O(1) for both get and put operations.

Question 17

How do you detect a cycle in a linked list?

Accepted Answer

The optimal approach is Floyd's Cycle Detection (tortoise and hare) algorithm using two pointers: slow moves one step at a time, fast moves two steps. If there's a cycle, they will eventually meet; if fast reaches null, there's no cycle. Time complexity is O(n), space is O(1). To find the cycle start, after detection, move one pointer to head and advance both by one until they meet again.

Question 18

How do you check if a binary tree is balanced?

Accepted Answer

A balanced binary tree has the height difference between left and right subtrees of every node at most 1. The optimal O(n) solution uses a bottom-up approach: recursively compute height while checking balance, returning -1 if imbalanced. For each node, get heights of left and right subtrees; if either is -1 or their difference exceeds 1, return -1; otherwise return max(left, right) + 1.

Question 19

Compare BFS and DFS for graph traversal. When would you use each?

Accepted Answer

BFS (Breadth-First Search) explores neighbors level by level using a queue, optimal for finding shortest path in unweighted graphs and level-order traversal. DFS (Depth-First Search) explores as deep as possible before backtracking using recursion/stack, better for topological sorting, cycle detection, and path finding. BFS uses O(V) space for the queue; DFS uses O(h) stack space where h is the maximum depth.

Question 20

How would you implement a stack using two queues?

Accepted Answer

Use two queues q1 and q2. For push: add element to q2, then move all elements from q1 to q2, then swap q1 and q2. For pop: simply dequeue from q1. This makes push O(n) and pop O(1). Alternative approach: make push O(1) and pop O(n) by adding to q1 directly, and for pop, move all except last element to q2, dequeue last, then swap.

Question 21

How do you find the Lowest Common Ancestor (LCA) in a binary tree?

Accepted Answer

For a general binary tree, use recursion: if current node is null or matches either target, return it. Recursively find LCA in left and right subtrees. If both return non-null, current node is LCA. If only one returns non-null, return that. Time complexity is O(n). For BST, leverage ordering: if both targets are smaller, go left; if both larger, go right; otherwise current node is LCA with O(h) time.

Question 22

What is the load factor in a hash table and why is it important?

Accepted Answer

Load factor is the ratio of number of entries to the number of buckets (n/capacity). It measures how full the hash table is. A higher load factor means more collisions and slower operations; typical threshold is 0.75. When exceeded, the table is resized (usually doubled) and all elements are rehashed. Balancing load factor is crucial: too low wastes memory, too high degrades performance from O(1) toward O(n).

Question 23

How do you reverse a linked list iteratively and recursively?

Accepted Answer

Iteratively: use three pointers (prev, curr, next). In each step, save curr.next, point curr.next to prev, move prev to curr, move curr to saved next. Continue until curr is null; prev is the new head. Recursively: base case returns head when null or single node. Recursively reverse rest, then set head.next.next = head and head.next = null. Both are O(n) time; iterative is O(1) space, recursive is O(n) stack space.

Question 24

How do you validate if a binary tree is a valid BST?

Accepted Answer

Use recursion with min/max range validation. Start with range (-infinity, infinity). For each node, check if its value is within the valid range. Recursively validate left subtree with (min, node.value) and right subtree with (node.value, max). If any node violates its range, return false. Alternative: in-order traversal should produce strictly increasing sequence. Both approaches are O(n) time and O(h) space.

Question 25

What is topological sorting and how do you implement it?

Accepted Answer

Topological sort orders vertices of a directed acyclic graph (DAG) such that for every edge u->v, u comes before v. Two approaches: Kahn's algorithm uses BFS with in-degree tracking, repeatedly removing zero in-degree vertices. DFS-based approach recursively visits all descendants before adding current node to result (then reverse). Both are O(V+E). Used for task scheduling, build systems, and course prerequisites.

Question 26

Explain the heapify operation and its time complexity.

Accepted Answer

Heapify maintains heap property by moving a node to its correct position. Heapify-down (sift-down) compares parent with children and swaps with larger (max-heap) or smaller (min-heap) child, repeating until valid. Single heapify is O(log n). Building a heap from array uses bottom-up heapify on all non-leaf nodes, achieving O(n) total time (not O(n log n)) because most nodes are near the bottom with small heights.

Question 27

What is a Trie and when would you use it?

Accepted Answer

A Trie (prefix tree) is a tree-like structure for storing strings where each node represents a character, and paths from root represent prefixes. Each node has up to 26 children (for lowercase letters) and a flag indicating word end. Operations (insert, search, prefix search) are O(m) where m is string length. Ideal for autocomplete, spell checkers, IP routing, and when many strings share common prefixes.

Question 28

How do you detect a cycle in a directed graph?

Accepted Answer

Use DFS with three states: unvisited, visiting (in current path), and visited. For each node, mark as visiting, recurse on neighbors; if you encounter a visiting node, there's a cycle. After processing all neighbors, mark as visited. Alternative: Kahn's topological sort - if not all nodes are processed (remaining nodes have cycles), a cycle exists. Both are O(V+E). For undirected graphs, track parent during DFS.

Question 29

How do you serialize and deserialize a binary tree?

Accepted Answer

Serialization converts tree to string; deserialization reconstructs it. Use pre-order traversal with null markers. Serialize: recursively write node value (or 'null' for null nodes), separated by delimiters. Deserialize: split string into tokens, use recursion - take next token, if 'null' return null, else create node with value and recursively build left and right children. Both operations are O(n) time and space.

Question 30

How do you implement a queue using two stacks?

Accepted Answer

Use two stacks: inbox for enqueue, outbox for dequeue. Enqueue pushes to inbox (O(1)). Dequeue pops from outbox; if outbox is empty, transfer all elements from inbox (reversing order) then pop. Each element is moved at most twice, giving amortized O(1) per operation. This approach is useful when stack operations are the only primitives available or for certain distributed scenarios.

Question 31

How do you find the kth smallest element in a BST?

Accepted Answer

Use in-order traversal which visits BST nodes in sorted order. Iterative approach with stack: go left as far as possible while pushing nodes, pop and decrement k, if k is 0 return current value, else go right. O(H + k) time where H is height. Alternative: augment BST nodes with subtree sizes for O(H) time by comparing k with left subtree size to decide direction.

Question 32

Explain the Rabin-Karp algorithm for string matching.

Accepted Answer

Rabin-Karp uses rolling hash to find pattern in text. Compute hash of pattern and first window of text. Slide window, updating hash in O(1) using rolling hash: remove contribution of outgoing char, add incoming char. When hashes match, verify with character comparison (to handle hash collisions). Average O(n+m) time but O(nm) worst case. Efficient for multiple pattern search.

Question 33

Design a stack that supports push, pop, and getMin in O(1) time.

Accepted Answer

Use two approaches: (1) Two stacks - main stack for elements, auxiliary stack for minimums. Push to both when element <= current min. Pop from both when popping the minimum. (2) Single stack with pairs storing (value, currentMin). (3) Without extra space: store encoded value = 2*value - minAtThatTime when value is new minimum; decode when popping. All achieve O(1) operations.

Question 34

What is Union-Find (Disjoint Set Union) and its optimizations?

Accepted Answer

Union-Find tracks elements partitioned into disjoint sets, supporting union (merge sets) and find (determine set membership). Basic implementation uses parent array with tree structure. Optimizations: Path Compression (during find, make all nodes point directly to root) and Union by Rank (attach smaller tree under root of larger tree). Together, they achieve near-O(1) amortized time - specifically O(alpha(n)) where alpha is inverse Ackermann function.

Question 35

How do you find the diameter of a binary tree?

Accepted Answer

Diameter is the longest path between any two nodes (number of edges). For each node, the longest path through it is left_height + right_height. Use DFS to compute height while tracking maximum diameter seen. At each node, calculate current diameter as left_height + right_height, update global maximum, return 1 + max(left_height, right_height). Time complexity is O(n), visiting each node once.

Question 36

Explain the Skip List data structure and its probabilistic properties.

Accepted Answer

A Skip List is a probabilistic data structure using multiple layers of sorted linked lists for O(log n) average search, insert, and delete. Each element has a random height (usually geometric distribution with p=0.5). Higher layers act as express lanes, enabling binary search-like behavior without rebalancing. Space is O(n) expected. Skip lists are simpler to implement than balanced trees and used in Redis sorted sets and LevelDB.

Question 37

Explain Red-Black tree properties and why they guarantee O(log n) operations.

Accepted Answer

Red-Black trees are self-balancing BSTs with five properties: nodes are red or black, root is black, leaves (NIL) are black, red nodes have black children, and all paths from node to descendant leaves have equal black nodes. These properties ensure the longest path is at most twice the shortest (alternating red-black vs all black), guaranteeing height <= 2log(n+1). Rebalancing uses rotations and recoloring after insert/delete.

Question 38

Why are B-trees used in databases instead of binary trees?

Accepted Answer

B-trees are optimized for disk-based storage where I/O is the bottleneck. They have large branching factor (hundreds of children per node) matching disk block size, minimizing tree height and disk reads. A B-tree with 1000-way branching and millions of keys has height 2-3 (only 2-3 disk reads). They maintain balance automatically, support range queries efficiently, and keep data sorted. B+ trees, a variant storing data only in leaves, are most common in databases.

Question 39

Explain Bloom filters, their false positive rate, and applications.

Accepted Answer

Bloom filters are space-efficient probabilistic structures testing set membership. They use a bit array with multiple hash functions. Insert: set bits at hash positions. Query: check all hash positions - if all set, probably present (possible false positive); if any unset, definitely absent (no false negatives). False positive rate is approximately (1-e^(-kn/m))^k where k=hash functions, n=elements, m=bits. Used in caching, network routers, and databases to avoid expensive lookups.

Question 40

What is a Segment Tree and how does it support range queries?

Accepted Answer

Segment trees store interval information for array range queries (sum, min, max) and updates in O(log n). Each node represents an interval; root is [0,n-1], children split the interval in half. Build is O(n), query combines relevant segments in O(log n), and point update propagates up in O(log n). Lazy propagation enables O(log n) range updates by deferring updates to children until needed. Used in competitive programming and databases.

Question 41

Compare AVL trees and Red-Black trees. When would you prefer one over the other?

Accepted Answer

AVL trees are more strictly balanced (height difference <= 1) than Red-Black trees (longest path <= 2x shortest), making AVL faster for lookups. However, Red-Black trees require fewer rotations for insertions/deletions (max 2 vs potentially O(log n) for AVL). Choose AVL for lookup-heavy workloads (databases) and Red-Black for modification-heavy workloads. Red-Black trees are used in C++ STL map/set, Java TreeMap; AVL in some database indices.

Question 42

How do you find strongly connected components in a directed graph?

Accepted Answer

Kosaraju's algorithm: (1) DFS on original graph, pushing finished vertices to stack. (2) Transpose the graph. (3) Pop vertices from stack, DFS on transposed graph - each DFS tree is an SCC. O(V+E) time. Tarjan's algorithm uses single DFS with low-link values, identifying SCCs when a node's low-link equals its discovery time. Both find maximal sets where every vertex is reachable from every other vertex within the set.

Question 43

Explain consistent hashing and its advantages for distributed systems.

Accepted Answer

Consistent hashing maps both keys and servers to a circular hash space (ring). Keys are assigned to the next server clockwise. When a server joins/leaves, only K/n keys need remapping (K=keys, n=servers) versus all keys in traditional hashing. Virtual nodes (multiple positions per server) improve load balance. Used in distributed caches (Memcached), databases (DynamoDB, Cassandra), and CDNs for scalable data distribution.

Question 44

Explain Binary Indexed Trees (Fenwick Trees) and their applications.

Accepted Answer

Fenwick Trees support prefix sum queries and point updates in O(log n) with O(n) space, simpler than segment trees. Each index i stores sum of elements in range determined by lowest set bit of i. Update propagates to indices i + (i & -i); query sums indices i - (i & -i). Can be extended for range updates with point queries, or range updates with range queries using two BITs. Used for cumulative frequency tables and inversion counting.

Question 45

What are suffix arrays and how are they used in string processing?

Accepted Answer

Suffix arrays store lexicographically sorted indices of all suffixes of a string in O(n) space (vs O(n^2) for actual suffixes). Construction can be O(n log n) or O(n) with SA-IS algorithm. Combined with LCP (Longest Common Prefix) array, they enable O(m log n) pattern matching, finding longest repeated substring in O(n), and various string problems. They're a space-efficient alternative to suffix trees, used in bioinformatics and text indexing.

Question 46

What is a Treap and how does it combine BST and heap properties?

Accepted Answer

A Treap is a randomized BST where each node has a key (BST property) and a random priority (heap property). Keys maintain BST ordering; priorities maintain max-heap ordering with parents having higher priority than children. This random structure provides expected O(log n) height without explicit balancing. Operations use tree rotations to maintain heap property after BST operations. Treaps support split and merge operations naturally, useful for implementing rope data structures.

Question 47

Explain LSM Trees and why they are used in write-optimized databases.

Accepted Answer

Log-Structured Merge Trees optimize write-heavy workloads by converting random writes to sequential writes. Writes go to an in-memory buffer (memtable); when full, it flushes to disk as sorted immutable file (SSTable). Background compaction merges SSTables. Reads check memtable then SSTables (aided by bloom filters). Writes are O(1) amortized, reads are slower than B-trees but acceptable with optimizations. Used in LevelDB, RocksDB, Cassandra, and HBase.

Question 48

What is a Count-Min Sketch and when would you use it?

Accepted Answer

Count-Min Sketch is a probabilistic data structure for frequency estimation of items in a stream using sub-linear space. It uses a 2D array with d hash functions (rows) and w counters (columns). Insert: increment counters at hash positions in each row. Query: return minimum across all rows (minimum reduces over-counting from collisions). Space is O(w*d), error is O(n/w) with probability 1-delta for d=log(1/delta). Used for network traffic monitoring, NLP word frequencies, and heavy hitters detection.

Question 49

Explain persistent/immutable data structures and their benefits.

Accepted Answer

Persistent data structures preserve previous versions after modifications, enabling immutability. Implemented via structural sharing - new versions share unchanged parts with old versions (copy-on-write). For example, persistent trees create new path from root to modified node, sharing other nodes. Benefits: thread safety without locks, easy undo/redo, functional programming compatibility, debugging with time travel. Clojure's data structures and React's immutable state management use this concept. Space overhead is often O(log n) per operation.

Question 50

Discuss lock-free data structures and their implementation challenges.

Accepted Answer

Lock-free data structures allow concurrent access without traditional locks, using atomic operations like CAS (Compare-And-Swap). Challenges include: ABA problem (value changes A->B->A between read and CAS, resolved with version numbers or hazard pointers), memory management (safe reclamation without garbage collection), and proving correctness (linearizability). Examples: Michael-Scott queue, Harris's linked list. Benefits: no deadlocks, better scalability, progress guarantees. Used in high-performance systems like Java's ConcurrentLinkedQueue.

Data Structures Interview Questions