The suffix array is a fundamental data structure used in text processing and string matching algorithms. It is an array of all the suffixes of a given string, sorted in lexicographical order.
The suffix array can be used to efficiently find all occurrences of a given pattern in a string, as well as to find the longest common substring of two strings.
A suffix array is a sorted array of all the suffixes of a given string. The suffixes are sorted in lexicographical order.
The suffix array can be used to efficiently find all occurrences of a given pattern in a string, as well as to find the longest common substring of two strings.
The suffix array can be constructed in O(n log n) time using the Suffix Sort algorithm.
The following code example shows how to construct a suffix array in Java using the Suffix Sort algorithm.
public static int[] suffixArray(String s) {
int n = s.length();
int[] suffixes = new int[n];
for (int i = 0; i < n; i++) {
suffixes[i] = i;
}
Arrays.sort(suffixes, (a, b) -> s.substring(a).compareTo(s.substring(b)));
return suffixes;
}
The following code example shows how to use the suffix array to find all occurrences of a given pattern in a string.
public static List<Integer> findOccurrences(String s, String pattern) {
int n = s.length();
int m = pattern.length();
int[] suffixes = suffixArray(s);
int l = 0;
int r = n - 1;
while (l <= r) {
int mid = l + (r - l) / 2;
int res = s.substring(suffixes[mid]).compareTo(pattern);
if (res == 0) {
// Found a match!
// Now, we need to find the first and last occurrences
// of the pattern in the suffix array.
int first = mid;
int last = mid;
while (first > 0 && s.substring(suffixes[first - 1]).compareTo(pattern) == 0) {
first--;
}
while (last < n - 1 && s.substring(suffixes[last + 1]).compareTo(pattern) == 0) {
last++;
}
// Now, we have the first and last occurrences of the pattern.
// We can return all the occurrences in the range [first, last].
List<Integer> occurrences = new ArrayList<>();
for (int i = first; i <= last; i++) {
occurrences.add(suffixes[i]);
}
return occurrences;
} else if (res < 0) {
l = mid + 1;
} else {
r = mid - 1;
}
}
// Pattern not found
return Collections.emptyList();
}
The following code example shows how to use the suffix array to find the longest common substring of two strings.
public static String longestCommonSubstring(String s1, String s2) {
int n1 = s1.length();
int n2 = s2.length();
int[] suffixes1 = suffixArray(s1);
int[] suffixes2 = suffixArray(s2);
int l = 0;
int r = Math.min(n1, n2) - 1;
String longestCommonSubstring = "";
while (l <= r) {
int mid = l + (r - l) / 2;
String substring1 = s1.substring(suffixes1[mid]);
String substring2 = s2.substring(suffixes2[mid]);
int len = commonPrefix(substring1, substring2);
if (len > longestCommonSubstring.length()) {
longestCommonSubstring = substring1.substring(0, len);
}
if (substring1.compareTo(substring2) < 0) {
l = mid + 1;
} else {
r = mid - 1;
}
}
return longestCommonSubstring;
}
private static int commonPrefix(String s1, String s2) {
int n = Math.min(s1.length(), s2.length());
for (int i = 0; i < n; i++) {
if (s1.charAt(i) != s2.charAt(i)) {
return i;
}
}
return n;
}
Implement the Suffix Sort algorithm in your chosen programming language.
Use the suffix array to find all occurrences of a given pattern in a string.
Use the suffix array to find the longest common substring of two strings.