Exploring Different Ways to Split Strings in Java

Introduction

Splitting strings is a fundamental operation in Java programming, commonly used in data processing, text manipulation, and parsing structured content like CSV files or logs. Whether you need to break down user input, process API responses, or extract values from a large text, Java provides multiple ways to efficiently split strings based on a delimiter.

Each method has its own strengths and weaknesses, affecting performance, flexibility, and ease of use. Some methods, like String.split(), are straightforward and built-in, while others, such as using StringTokenizer or Scanner, offer more control. Additionally, modern approaches like Java Streams provide a functional alternative for handling string splitting with filtering capabilities.

Code Example: Java String Splitting

StringSplit.java
package string;

import java.util.*;
import java.util.regex.Pattern;
import java.util.stream.Stream;

public class StringSplit {

    // Split using String split method with regex.
    public static String[] splitM1(String input, String delimiter) {
        if (input == null || delimiter == null) return new String[0];

        return input.split(Pattern.quote(delimiter));
    }

    //  Split using StringTokenizer
    public static String[] splitM2(String input, String delimiter) {
        if (input == null || delimiter == null) return new String[0];

        StringTokenizer tokenizer = new StringTokenizer(input, delimiter);
        List<String> list = new ArrayList<>();
        while (tokenizer.hasMoreTokens()) {
            list.add(tokenizer.nextToken());
        }

        return list.toArray(new String[0]); // Convert an ArrayList of String to a String Array
    }

    // Split using String methods: indexOf and substring. (Manual approach)
    public static String[] splitM3(String input, String delimiter) {
        if (input == null || delimiter == null) return new String[0];

        List<String> list = new ArrayList<>();
        int index = 0;
        do {
            int lastIndex = index;
            index = input.indexOf(delimiter, index + 1);
            if (lastIndex != index) {
                String element = input.substring(
                        lastIndex == 0 ? 0 : lastIndex + delimiter.length(),
                        index > 0 ? index : input.length()
                );
                if (!element.equals(delimiter) && !element.isEmpty()) {
                    list.add(element);
                }
            }
        } while (index >= 0);

        return list.toArray(new String[0]); // Convert an ArrayList of String to a String Array
    }

    // Split using Pattern split method with regex.
    public static String[] splitM4(String input, String delimiter) {
        if (input == null || delimiter == null) return new String[0];

        return Pattern.compile(Pattern.quote(delimiter)).split(input);
    }

    // Split using Scanner and Pattern quote.
    public static String[] splitM5(String input, String delimiter) {
        if (input == null || delimiter == null) return new String[0];

        List<String> list = new ArrayList<>();
        try (Scanner scanner = new Scanner(input)) {
            scanner.useDelimiter(Pattern.quote(delimiter));
            while (scanner.hasNext()) {
                String element = scanner.next();
                if (!element.isEmpty()) {
                    list.add(element);
                }
            }
        }

        return list.toArray(new String[0]); // Convert an ArrayList of String to a String Array
    }

    public static String[] splitM6(String input, String delimiter) {
        if (input == null || delimiter == null) return new String[0];

        return Stream.of(input.split(Pattern.quote(delimiter)))
                .filter(s -> !s.isEmpty())
                .toArray(String[]::new);
    }

    public static void main(String[] args) {
        String input = "apple,banana,orange";
        String delimiter = ",";

        System.out.println("Method splitM1 " + Arrays.toString(StringSplit.splitM1(input, delimiter)));
        System.out.println("Method splitM2 " + Arrays.toString(StringSplit.splitM2(input, delimiter)));
        System.out.println("Method splitM3 " + Arrays.toString(StringSplit.splitM3(input, delimiter)));
        System.out.println("Method splitM4 " + Arrays.toString(StringSplit.splitM4(input, delimiter)));
        System.out.println("Method splitM5 " + Arrays.toString(StringSplit.splitM5(input, delimiter)));
        System.out.println("Method splitM6 " + Arrays.toString(StringSplit.splitM6(input, delimiter)));
    }
}

Output

Method splitM1 [apple, banana, orange]
Method splitM2 [apple, banana, orange]
Method splitM3 [apple, banana, orange]
Method splitM4 [apple, banana, orange]
Method splitM5 [apple, banana, orange]
Method splitM6 [apple, banana, orange]

Overview of the Code

    Using String.split() Method

    The simplest way to split a string is by using the built-in split() method of the String class. This method takes a regular expression as a delimiter and returns an array of substrings.

    [codefile] error: Method "splitM1" not found.

    Explanation

    • Pattern.quote(delimiter): Escapes special characters in the delimiter to ensure correct splitting.
    • Returns an array of substrings split by the delimiter.

    Using StringTokenizer

    The StringTokenizer class is an older method to split strings. It tokenizes a string based on a given delimiter.

    Explanation

    • StringTokenizer iterates over tokens split by the delimiter.
    • Tokens are stored in an ArrayList and then converted to an array.
    • Note: StringTokenizer does not support regular expressions.

    Using indexOf() and substring() (Manual Approach)

    For full control over the splitting process, you can manually extract substrings using indexOf() and substring().

    Explanation

    • Uses indexOf() to find the next occurrence of the delimiter.
    • Extracts substrings using substring().
    • More efficient than split() if working with large strings.

    Using Pattern.split()

    The Pattern class provides a way to split strings using compiled regular expressions.

    Explanation

    • Pattern.compile(Pattern.quote(delimiter)): Compiles a pattern for efficient splitting.
    • This is a good alternative to String.split() when the same delimiter is used multiple times.

    Using Scanner

    Java’s Scanner class provides another way to tokenize a string.

    Explanation

    • Uses Scanner to iterate through tokens separated by the delimiter.
    • More flexible than StringTokenizer since it allows regex-based delimiters.

    Using Java Streams

    The Java 8 Streams API offers a functional approach to string splitting.

    Explanation

    • Uses Stream.of() to create a stream from the split string.
    • Filters out empty strings.
    • Useful when combined with other stream operations.

Conclusion

Each method of splitting strings in Java has its own advantages:

  • splitM1 and splitM4 – split() and Pattern.split() are easy to use and handle regex.
  • splitM2 – StringTokenizer is lightweight but lacks regex support.
  • splitM3 – indexOf() and substring() provide a manual approach with more control.
  • splitM5 – Scanner is useful when processing structured input.
  • splitM6 – Streams offer a functional, concise approach.

Choosing the right method depends on the use case, performance needs, and complexity of the delimiter.