Leetcode # 28. Find the Index of the First Occurrence in a String

https://leetcode.com/problems/find-the-index-of-the-first-occurrence-in-a-string/

First Solution

Time Complexity: O(len(haystack) + len(needle))
Space Complexity: O(len(haystack))
(The input and output generally do not count towards the space complexity.)

 

class Solution:
  def strStr(self, haystack: str, needle: str) -> int:
    start_points = []
    for i, c in enumerate(haystack):
      if c == needle[0] and i + len(needle) <= len(haystack):
          start_points.append(i)

    for i in start_points:
      is_needle = True
      for j in range(1, len(needle)):
        if haystack[i + j] != needle[j]:
          is_needle =False
          break
      if is_needle:
        return i

    return -1

Sliding Window

Time Complexity: O(len(haystack) + len(needle))
Space Complexity: O(len(haystack)) O(1)
(The input and output generally do not count towards the space complexity.)

改善:

  • 不再使用多餘的記憶體去存 window 的 start 處
  • 不再使用額外的 boolean 變數去判斷 sliding window 是否符合條件
    • 注意!如此的話,若 needle 的 characters 檢查迴圈仍從 1 開始
      當 len(needle) 為 1 時會誤判

 

class Solution:
  def strStr(self, haystack: str, needle: str) -> int:
    len_needle_minus_1 = len(needle) - 1
    for i in range(len(haystack) - len(needle) + 1):
      for j in range(len(needle)):
        if haystack[i + j] != needle[j]:
          break
        if j == len_needle_minus_1:
          return i
    return -1

Rabin-Karp Algorithm

Algorithm

 

def rabin_karp(s, sub):
  hsub = hash(sub)
  hs   = hash(s[:len(sub)])
  for i in range(len(s)):
    if hs == hsub:
      if s[i:i + len(s)] = sub:
        return i
    next_i = i + 1
    hs = hash(s[next_i: next_i + len(s)])
  return -1

Hash Function

$$ H[i] = s[i]b^{m – 1} + s[i + 1]b^{m-2} + … + s[i + m – 1]b^0 $$

H[i]: The hash value of the m-substring of s starting from index i
b: base/radix
m: len(sub)

\begin{align}
H[i + 1] & = s[i + 1]b^{m – 1} + s[i + 2]b^{m-2} + … + s[i + m]b^0 \\
& =  H[i] \cdot b – s[i]b^m + s[i + m]
\end{align}

Solution

Single Hash

Time Complexity: O(len(haystack) * len(needle))
Space Complexity: O(1)
(The input and output generally do not count towards the space complexity.)

 

class Solution:
  def strStr(self, haystack: str, needle: str) -> int:
    # CONSTANTS
    RADIX = 26
    MOD = 1_000_000_033 # 1,000,000,033 is the 4th smallest 10-digit prime.
                        # Underscores are using for 
                        #   grouping decimal numbers by thousands.
    MAX_WEIGHT = 1

    # (x * y) % z = ((x % z) * (y % z)) % z
    # (b ** m) % z = (b ** (m - 1) * b) % z
    #  = (((b ** (m - 1)) % z) * (b % z)) % z
    #  = (((b ** (m - 1)) % z) * b) % z
    for _ in range(len(needle)):
      MAX_WEIGHT = (MAX_WEIGHT * RADIX) % MOD # b ** m

    def hash_value(string):
      result = 0
      factor = 1
      for i in range(len(needle) - 1, -1, -1):
        result += ((ord(string[i])- 97) * factor) % MOD
        factor = (factor * RADIX) % MOD
      return result % MOD

    hash_needle = hash_value(needle)

    for i in range(len(haystack) - len(needle) + 1):
      if i == 0:
        hash_haystack = hash_value(haystack)
      else:
        # H[i + 1] = H[i] * b - s[i] * (b ** m) + s[i + m]
        # ((x % z) + (y % z)) = (x + y) % z
        hash_haystack = ((hash_haystack * RADIX) % MOD
                         - ((ord(haystack[i - 1]) - 97) * MAX_WEIGHT) % MOD
                         + (ord(haystack[i + len(needle) - 1]) - 97)
                         + MOD) % MOD

      if hash_needle == hash_haystack:
        for j in range(len(needle)):
          if needle[j] != haystack[i + j]:
            break
          if j == len(needle) - 1:
            return i

    return -1

Double Hash

haystack 的第 i 個字元會有 H1[i], H2[i] 兩個 hash 值

  • hash_1(m_str): MOD = 2 ** 31, a → 1, b → 2, …, z → 26
  • hash_2(m_str): MOD = 10 ** 9 + 33, a → 0, b → 1, …, z → 25

當其與 needle 的 h1, h2 相同時
便認定其為 needle 的起點
省去只使用一個 hash 值時
每次都要比對 haystack[i:i +len(needle)] 的 O(len(needle)) 時間 

  • Time Complexity:
    O(len(haystack) + len(needle))
    < O(2 * len(haystack))
    = O(len(haystack))
  • Space Complexity: O(1)
    (使用 H[i + 1] = H[i] * b – s[i] * (b ** m) + s[i + m] 計算
    而不需要額外空間)

(The input and output generally do not count towards the space complexity.)

 

Knuth-Morris-Pratt (KMP) Algorithm

  • s: The text to be searched needle
  • w: The word sought
j 0 1 2 3 4 5 6
w[j] a b c d a b d
t[j] -1 0 0 0 0 1 2

When w[j] != s[i + j] but w[0:j] == s [i:i + j], then backtrack j to t[j]

s c a b c d a b c d a b d
w   a b c d a b d        
  • t[0] is always -1 to show it’s the first character of the substring
  • t[1] is always 0, because the only place you can backtrack at the second character is the first character

 

Time Complexity: O(len(needle) + len(haystack)) < O(2 * len(haystack)) = O(len(haystack))
Space Complexity: O(len(needle))
(The input and output generally do not count towards the space complexity.)

 

class Solution:
  def strStr(self, haystack: str, needle: str) -> int:
    """KMP Search

    args:
      haystack (s): The text to be searched
      needle (w): The word sought
    """

    # t[j] := When s[i + j] != w[j], 
    #         then backtrack j to t[j]
    def kmp_table(_w):
      _t = [-1] + [0] * (len(_w) - 1)

      # _t[0] is always -1
      # _t[1] is always 0
      # So _i start from 2
      _i, _j = 2, 0
      while _i < len(_w):
        # First case: The substring continues
        if _w[_i - 1] == _w[_j]:
          _t[_i] = _j + 1
          _i += 1
          _j += 1
        # Second case: The substring doesn't continues, 
        #              but _j is not -1 or 0, 
        #              that means we can fall back
        elif _j > 0:
          _j = _t[_j]
        # Third case: We have run out of candidates
        #             Note _j == 0
        else:
          _t[_i] = 0
          _i += 1

      return _t

    i, j = 0, 0 # Pointers for s and w
    t = kmp_table(needle) # O(len(needle))

    # O(len(haystack))
    while i + j < len(haystack):
      if needle[j] == haystack[i + j]:
        j += 1
        if j == len(needle):
          return i
      else: # Backtrack
        i = i + j - t[j]
        if j > 0:
          j = t[j]

    return -1

Last Updated on 2023/08/16 by A1go

目錄
Bitnami