Leetcode # 28. Find the Index of the First Occurrence in a String
- 2023.07.17
- Hash Function Knuth-Morris-Pratt (KMP) Algorithm LeetCode Rabin-Karp Algorithm String-Searching Algorithm
https://leetcode.com/problems/find-the-index-of-the-first-occurrence-in-a-string/
First Solution
Time Complexity: O(len(haystack) + len(needle))
Space Complexity: O(len(haystack))
(The input and output generally do not count towards the space complexity.)
class Solution:
def strStr(self, haystack: str, needle: str) -> int:
start_points = []
for i, c in enumerate(haystack):
if c == needle[0] and i + len(needle) <= len(haystack):
start_points.append(i)
for i in start_points:
is_needle = True
for j in range(1, len(needle)):
if haystack[i + j] != needle[j]:
is_needle =False
break
if is_needle:
return i
return -1
Sliding Window
Time Complexity: O(len(haystack) + len(needle))
Space Complexity: O(len(haystack)) O(1)
(The input and output generally do not count towards the space complexity.)
改善:
- 不再使用多餘的記憶體去存 window 的 start 處
- 不再使用額外的 boolean 變數去判斷 sliding window 是否符合條件
- 注意!如此的話,若 needle 的 characters 檢查迴圈仍從 1 開始
當 len(needle) 為 1 時會誤判
- 注意!如此的話,若 needle 的 characters 檢查迴圈仍從 1 開始
class Solution:
def strStr(self, haystack: str, needle: str) -> int:
len_needle_minus_1 = len(needle) - 1
for i in range(len(haystack) - len(needle) + 1):
for j in range(len(needle)):
if haystack[i + j] != needle[j]:
break
if j == len_needle_minus_1:
return i
return -1
Rabin-Karp Algorithm
Algorithm
def rabin_karp(s, sub):
hsub = hash(sub)
hs = hash(s[:len(sub)])
for i in range(len(s)):
if hs == hsub:
if s[i:i + len(s)] = sub:
return i
next_i = i + 1
hs = hash(s[next_i: next_i + len(s)])
return -1
Hash Function
$$ H[i] = s[i]b^{m – 1} + s[i + 1]b^{m-2} + … + s[i + m – 1]b^0 $$
H[i]: The hash value of the m-substring of s starting from index i
b: base/radix
m: len(sub)
\begin{align}
H[i + 1] & = s[i + 1]b^{m – 1} + s[i + 2]b^{m-2} + … + s[i + m]b^0 \\
& = H[i] \cdot b – s[i]b^m + s[i + m]
\end{align}
Solution
Single Hash
Time Complexity: O(len(haystack) * len(needle))
Space Complexity: O(1)
(The input and output generally do not count towards the space complexity.)
class Solution:
def strStr(self, haystack: str, needle: str) -> int:
# CONSTANTS
RADIX = 26
MOD = 1_000_000_033 # 1,000,000,033 is the 4th smallest 10-digit prime.
# Underscores are using for
# grouping decimal numbers by thousands.
MAX_WEIGHT = 1
# (x * y) % z = ((x % z) * (y % z)) % z
# (b ** m) % z = (b ** (m - 1) * b) % z
# = (((b ** (m - 1)) % z) * (b % z)) % z
# = (((b ** (m - 1)) % z) * b) % z
for _ in range(len(needle)):
MAX_WEIGHT = (MAX_WEIGHT * RADIX) % MOD # b ** m
def hash_value(string):
result = 0
factor = 1
for i in range(len(needle) - 1, -1, -1):
result += ((ord(string[i])- 97) * factor) % MOD
factor = (factor * RADIX) % MOD
return result % MOD
hash_needle = hash_value(needle)
for i in range(len(haystack) - len(needle) + 1):
if i == 0:
hash_haystack = hash_value(haystack)
else:
# H[i + 1] = H[i] * b - s[i] * (b ** m) + s[i + m]
# ((x % z) + (y % z)) = (x + y) % z
hash_haystack = ((hash_haystack * RADIX) % MOD
- ((ord(haystack[i - 1]) - 97) * MAX_WEIGHT) % MOD
+ (ord(haystack[i + len(needle) - 1]) - 97)
+ MOD) % MOD
if hash_needle == hash_haystack:
for j in range(len(needle)):
if needle[j] != haystack[i + j]:
break
if j == len(needle) - 1:
return i
return -1
Double Hash
haystack 的第 i 個字元會有 H1[i], H2[i] 兩個 hash 值
- hash_1(m_str): MOD = 2 ** 31, a → 1, b → 2, …, z → 26
- hash_2(m_str): MOD = 10 ** 9 + 33, a → 0, b → 1, …, z → 25
當其與 needle 的 h1, h2 相同時
便認定其為 needle 的起點
省去只使用一個 hash 值時
每次都要比對 haystack[i:i +len(needle)] 的 O(len(needle)) 時間
- Time Complexity:
O(len(haystack) + len(needle))
< O(2 * len(haystack))

= O(len(haystack)) - Space Complexity: O(1)
(使用 H[i + 1] = H[i] * b – s[i] * (b ** m) + s[i + m] 計算
而不需要額外空間)
(The input and output generally do not count towards the space complexity.)
略
Knuth-Morris-Pratt (KMP) Algorithm
- s: The text to be searched needle
- w: The word sought
| j | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
| w[j] | a | b | c | d | a | b | d |
| t[j] | -1 | 0 | 0 | 0 | 0 | 1 | 2 |
When w[j] != s[i + j] but w[0:j] == s [i:i + j], then backtrack j to t[j]
| s | c | a | b | c | d | a | b | c | d | a | b | d |
| w | a | b | c | d | a | b | d |
- t[0] is always -1 to show it’s the first character of the substring
- t[1] is always 0, because the only place you can backtrack at the second character is the first character
Time Complexity: O(len(needle) + len(haystack)) < O(2 * len(haystack)) = O(len(haystack))
Space Complexity: O(len(needle))
(The input and output generally do not count towards the space complexity.)
class Solution:
def strStr(self, haystack: str, needle: str) -> int:
"""KMP Search
args:
haystack (s): The text to be searched
needle (w): The word sought
"""
# t[j] := When s[i + j] != w[j],
# then backtrack j to t[j]
def kmp_table(_w):
_t = [-1] + [0] * (len(_w) - 1)
# _t[0] is always -1
# _t[1] is always 0
# So _i start from 2
_i, _j = 2, 0
while _i < len(_w):
# First case: The substring continues
if _w[_i - 1] == _w[_j]:
_t[_i] = _j + 1
_i += 1
_j += 1
# Second case: The substring doesn't continues,
# but _j is not -1 or 0,
# that means we can fall back
elif _j > 0:
_j = _t[_j]
# Third case: We have run out of candidates
# Note _j == 0
else:
_t[_i] = 0
_i += 1
return _t
i, j = 0, 0 # Pointers for s and w
t = kmp_table(needle) # O(len(needle))
# O(len(haystack))
while i + j < len(haystack):
if needle[j] == haystack[i + j]:
j += 1
if j == len(needle):
return i
else: # Backtrack
i = i + j - t[j]
if j > 0:
j = t[j]
return -1
Last Updated on 2023/08/16 by A1go