Leetcode # 28. Find the Index of the First Occurrence in a String
- 2023.07.17
- Hash Function Knuth-Morris-Pratt (KMP) Algorithm LeetCode Rabin-Karp Algorithm String-Searching Algorithm
https://leetcode.com/problems/find-the-index-of-the-first-occurrence-in-a-string/
First Solution
Time Complexity: O(len(haystack) + len(needle))
Space Complexity: O(len(haystack))
(The input and output generally do not count towards the space complexity.)
class Solution: def strStr(self, haystack: str, needle: str) -> int: start_points = [] for i, c in enumerate(haystack): if c == needle[0] and i + len(needle) <= len(haystack): start_points.append(i) for i in start_points: is_needle = True for j in range(1, len(needle)): if haystack[i + j] != needle[j]: is_needle =False break if is_needle: return i return -1
Sliding Window
Time Complexity: O(len(haystack) + len(needle))
Space Complexity: O(len(haystack)) O(1)
(The input and output generally do not count towards the space complexity.)
改善:
- 不再使用多餘的記憶體去存 window 的 start 處
- 不再使用額外的 boolean 變數去判斷 sliding window 是否符合條件
- 注意!如此的話,若 needle 的 characters 檢查迴圈仍從 1 開始
當 len(needle) 為 1 時會誤判
- 注意!如此的話,若 needle 的 characters 檢查迴圈仍從 1 開始
class Solution: def strStr(self, haystack: str, needle: str) -> int: len_needle_minus_1 = len(needle) - 1 for i in range(len(haystack) - len(needle) + 1): for j in range(len(needle)): if haystack[i + j] != needle[j]: break if j == len_needle_minus_1: return i return -1
Rabin-Karp Algorithm
Algorithm
def rabin_karp(s, sub): hsub = hash(sub) hs = hash(s[:len(sub)]) for i in range(len(s)): if hs == hsub: if s[i:i + len(s)] = sub: return i next_i = i + 1 hs = hash(s[next_i: next_i + len(s)]) return -1
Hash Function
$$ H[i] = s[i]b^{m – 1} + s[i + 1]b^{m-2} + … + s[i + m – 1]b^0 $$
H[i]: The hash value of the m-substring of s starting from index i
b: base/radix
m: len(sub)
\begin{align}
H[i + 1] & = s[i + 1]b^{m – 1} + s[i + 2]b^{m-2} + … + s[i + m]b^0 \\
& = H[i] \cdot b – s[i]b^m + s[i + m]
\end{align}
Solution
Single Hash
Time Complexity: O(len(haystack) * len(needle))
Space Complexity: O(1)
(The input and output generally do not count towards the space complexity.)
class Solution: def strStr(self, haystack: str, needle: str) -> int: # CONSTANTS RADIX = 26 MOD = 1_000_000_033 # 1,000,000,033 is the 4th smallest 10-digit prime. # Underscores are using for # grouping decimal numbers by thousands. MAX_WEIGHT = 1 # (x * y) % z = ((x % z) * (y % z)) % z # (b ** m) % z = (b ** (m - 1) * b) % z # = (((b ** (m - 1)) % z) * (b % z)) % z # = (((b ** (m - 1)) % z) * b) % z for _ in range(len(needle)): MAX_WEIGHT = (MAX_WEIGHT * RADIX) % MOD # b ** m def hash_value(string): result = 0 factor = 1 for i in range(len(needle) - 1, -1, -1): result += ((ord(string[i])- 97) * factor) % MOD factor = (factor * RADIX) % MOD return result % MOD hash_needle = hash_value(needle) for i in range(len(haystack) - len(needle) + 1): if i == 0: hash_haystack = hash_value(haystack) else: # H[i + 1] = H[i] * b - s[i] * (b ** m) + s[i + m] # ((x % z) + (y % z)) = (x + y) % z hash_haystack = ((hash_haystack * RADIX) % MOD - ((ord(haystack[i - 1]) - 97) * MAX_WEIGHT) % MOD + (ord(haystack[i + len(needle) - 1]) - 97) + MOD) % MOD if hash_needle == hash_haystack: for j in range(len(needle)): if needle[j] != haystack[i + j]: break if j == len(needle) - 1: return i return -1
Double Hash
haystack 的第 i 個字元會有 H1[i], H2[i] 兩個 hash 值
- hash_1(m_str): MOD = 2 ** 31, a → 1, b → 2, …, z → 26
- hash_2(m_str): MOD = 10 ** 9 + 33, a → 0, b → 1, …, z → 25
當其與 needle 的 h1, h2 相同時
便認定其為 needle 的起點
省去只使用一個 hash 值時
每次都要比對 haystack[i:i +len(needle)] 的 O(len(needle)) 時間
- Time Complexity:
O(len(haystack) + len(needle))
< O(2 * len(haystack))
= O(len(haystack))
- Space Complexity: O(1)
(使用 H[i + 1] = H[i] * b – s[i] * (b ** m) + s[i + m] 計算
而不需要額外空間)
(The input and output generally do not count towards the space complexity.)
略
Knuth-Morris-Pratt (KMP) Algorithm
- s: The text to be searched needle
- w: The word sought
j | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
w[j] | a | b | c | d | a | b | d |
t[j] | -1 | 0 | 0 | 0 | 0 | 1 | 2 |
When w[j] != s[i + j] but w[0:j] == s [i:i + j], then backtrack j to t[j]
s | c | a | b | c | d | a | b | c | d | a | b | d |
w | a | b | c | d | a | b | d |
- t[0] is always -1 to show it’s the first character of the substring
- t[1] is always 0, because the only place you can backtrack at the second character is the first character
Time Complexity: O(len(needle) + len(haystack)) < O(2 * len(haystack)) = O(len(haystack))
Space Complexity: O(len(needle))
(The input and output generally do not count towards the space complexity.)
class Solution: def strStr(self, haystack: str, needle: str) -> int: """KMP Search args: haystack (s): The text to be searched needle (w): The word sought """ # t[j] := When s[i + j] != w[j], # then backtrack j to t[j] def kmp_table(_w): _t = [-1] + [0] * (len(_w) - 1) # _t[0] is always -1 # _t[1] is always 0 # So _i start from 2 _i, _j = 2, 0 while _i < len(_w): # First case: The substring continues if _w[_i - 1] == _w[_j]: _t[_i] = _j + 1 _i += 1 _j += 1 # Second case: The substring doesn't continues, # but _j is not -1 or 0, # that means we can fall back elif _j > 0: _j = _t[_j] # Third case: We have run out of candidates # Note _j == 0 else: _t[_i] = 0 _i += 1 return _t i, j = 0, 0 # Pointers for s and w t = kmp_table(needle) # O(len(needle)) # O(len(haystack)) while i + j < len(haystack): if needle[j] == haystack[i + j]: j += 1 if j == len(needle): return i else: # Backtrack i = i + j - t[j] if j > 0: j = t[j] return -1
Last Updated on 2023/08/16 by A1go