Removing duplicates from a String in Java

Removing duplicates from a String in Java



I am trying to iterate through a string in order to remove the duplicates characters.



For example the String aabbccdef should become abcdef
and the String abcdabcd should become abcd


aabbccdef


abcdef


abcdabcd


abcd



Here is what I have so far:


public class test

public static void main(String args)

String input = new String("abbc");
String output = new String();

for (int i = 0; i < input.length(); i++)
for (int j = 0; j < output.length(); j++)
if (input.charAt(i) != output.charAt(j))
output = output + input.charAt(i);




System.out.println(output);






What is the best way to do this?





Do you just want to 'collapse' repeating characters, or remove duplicates entirely. That is, should "abba" result in "aba" or "ab"?
– Alistair A. Israel
Aug 10 '11 at 9:17




33 Answers
33



Convert the string to an array of char, and store it in a LinkedHashSet. That will preserve your ordering, and remove duplicates. Something like:


LinkedHashSet


String string = "aabbccdefatafaz";

char chars = string.toCharArray();
Set<Character> charSet = new LinkedHashSet<Character>();
for (char c : chars)
charSet.add(c);


StringBuilder sb = new StringBuilder();
for (Character character : charSet)
sb.append(character);

System.out.println(sb.toString());





I guess I can't really avoid StringBuilder or an array list...oh well, thanks
– Ricco
Feb 14 '11 at 5:44





@Rico: You can also do this manually (like creating an array of the right length, then putting all non-duplicates in it, then creating a string of this), but it is simply more work this way, and a StringBuilder is really made to construct Strings.
– Paŭlo Ebermann
Feb 14 '11 at 11:12





This will also remove the second 'f', which may or may not be what the OP wants.
– Alistair A. Israel
Aug 10 '11 at 9:16





Using new StringBuilder(charSet.size()) will optimize this slightly to avoid resizing the StringBuilder.
– Simon Forsberg
Apr 25 '15 at 9:05


new StringBuilder(charSet.size())


StringBuilder





Not using StringBuilder at all is even better. See my answer.
– Andreas
Jan 11 '16 at 21:30


StringBuilder



I would use the help of LinkedHashSet. Removes dups (as we are using a Set, maintains the order as we are using linked list impl). This is kind of a dirty solution. there might be even a better way.


String s="aabbccdef";
Set<Character> set=new LinkedHashSet<Character>();
for(char c:s.toCharArray())

set.add(Character.valueOf(c));





Its not returning a String though.
– realPK
Jul 24 '16 at 2:12



Try this simple solution:


public String removeDuplicates(String input)
String result = "";
for (int i = 0; i < input.length(); i++)
if(!result.contains(String.valueOf(input.charAt(i))))
result += String.valueOf(input.charAt(i));


return result;





Good answer, but each time += gets run, the entire string is destroyed and re-copied resulting in unnecessary inefficiency. Also testing for the length() of the string on every iteration of the loop introduces inefficiency. The length of the loop doesn't change so you don't have to check it on every character.
– Eric Leschinski
Apr 24 '15 at 18:31


+=



Create a StringWriter. Run through the original string using charAt(i) in a for loop. Maintain a variable of char type keeping the last charAt value. If you iterate and the charAt value equals what is stored in that variable, don't add to the StringWriter. Finally, use the StringWriter.toString() method and get a string, and do what you need with it.





I tried somethinig like that, but not StringWriter.toString(). The first loop would iterate through the input string and if that character did not exist in the result string then append it...but it didn't work.
– Ricco
Feb 14 '11 at 5:35




Using Stream makes it easy.


import java.util.Arrays;
import java.util.stream.Collectors;

public class MyClass

public static String removeDuplicates(String myString)
return Arrays.asList(myString.split("")).stream().distinct().collect(Collectors.joining());




Here is some more documentation about Stream and all you can do with
it :
https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html



The 'description' part is very instructive about the benefits of Streams.


String input = "AAAB";

String output = "";
for (int index = 0; index < input.length(); index++)
if (input.charAt(index % input.length()) != input
.charAt((index + 1) % input.length()))

output += input.charAt(index);



System.out.println(output);



but you cant use it if the input has the same elements, or if its empty!





This will not work on the examples you asked about in Remove duplicate in a string without using arrays
– Xavi López
Dec 13 '12 at 19:10



Code to remove the duplicate characters in a string without using any additional buffer. NOTE: One or two additional variables are fine. An extra array is not:


import java.util.*;
public class Main
public static char removeDupes(char arr) arr.length < 2)
return arr;
int len = arr.length;
int tail = 1;
for(int x = 1; x < len; x++)
int y;
for(y = 0; y < tail; y++)
if (arr[x] == arr[y]) break;

if (y == tail)
arr[tail] = arr[x];
tail++;


return Arrays.copyOfRange(arr, 0, tail);


public static char bigArr(int len);:',.<>/?`~";

for(int x = 0; x < len; x++)
arr[x] = alphabet.charAt(r.nextInt(alphabet.length()));


return arr;

public static void main(String args)

String result = new String(removeDupes(new char'a', 'b', 'c', 'd', 'a'));
assert "abcd".equals(result) : "abcda should return abcd but it returns: " + result;

result = new String(removeDupes(new char'a', 'a', 'a', 'a'));
assert "a".equals(result) : "aaaa should return a but it returns: " + result;

result = new String(removeDupes(new char'a', 'b', 'c', 'a'));
assert "abc".equals(result) : "abca should return abc but it returns: " + result;

result = new String(removeDupes(new char'a', 'a', 'b', 'b'));
assert "ab".equals(result) : "aabb should return ab but it returns: " + result;

result = new String(removeDupes(new char'a'));
assert "a".equals(result) : "a should return a but it returns: " + result;

result = new String(removeDupes(new char'a', 'b', 'b', 'a'));
assert "ab".equals(result) : "abba should return ab but it returns: " + result;


char arr = bigArr(5000000);
long startTime = System.nanoTime();
System.out.println("2: " + new String(removeDupes(arr)));
long endTime = System.nanoTime();
long duration = (endTime - startTime);
System.out.println("Program took: " + duration + " nanoseconds");
System.out.println("Program took: " + duration/1000000000 + " seconds");





How to read and talk about the above code:



Explain how this code works:



The first part of the array passed in is used as the repository for the unique characters that are ultimately returned. At the beginning of the function the answer is: "the characters between 0 and 1" as between 0 and tail.



We define the variable y outside of the loop because we want to find the first location where the array index that we are looking at has been duplicated in our repository. When a duplicate is found, it breaks out and quits, the y==tail returns false and the repository is not contributed to.



when the index x that we are peeking at is not represented in our repository, then we pull that one and add it to the end of our repository at index tail and increment tail.



At the end, we return the array between the points 0 and tail, which should be smaller or equal to in length to the original array.



Talking points exercise for coder interviews:



Will the program behave differently if you change the y++ to ++y? Why or why not.



Does the array copy at the end represent another 'N' pass through the entire array making runtime complexity O(n*n) instead of O(n) ? Why or why not.



Can you replace the double equals comparing primitive characters with a .equals? Why or why not?



Can this method be changed in order to do the replacements "by reference" instead of as it is now, "by value"? Why or why not?



Can you increase the efficiency of this algorithm by sorting the repository of unique values at the beginning of 'arr'? Under which circumstances would it be more efficient?


public class RemoveRepeated4rmString

public static void main(String args)
String s = "harikrishna";
String s2 = "";
for (int i = 0; i < s.length(); i++)
Boolean found = false;
for (int j = 0; j < s2.length(); j++)
if (s.charAt(i) == s2.charAt(j))
found = true;
break; //don't need to iterate further


if (found == false)
s2 = s2.concat(String.valueOf(s.charAt(i)));


System.out.println(s2);




Here is an improvement to the answer by Dave.



It uses HashSet instead of the slightly more costly LinkedHashSet, and reuses the chars buffer for the result, eliminating the need for a StringBuilder.


HashSet


LinkedHashSet


chars


StringBuilder


String string = "aabbccdefatafaz";

char chars = string.toCharArray();
Set<Character> present = new HashSet<>();
int len = 0;
for (char c : chars)
if (present.add(c))
chars[len++] = c;

System.out.println(new String(chars, 0, len)); // abcdeftz



To me it looks like everyone is trying way too hard to accomplish this task. All we are concerned about is that it copies 1 copy of each letter if it repeats. Then because we are only concerned if those characters repeat one after the other the nested loops become arbitrary as you can just simply compare position n to position n + 1. Then because this only copies things down when they're different, to solve for the last character you can either append white space to the end of the original string, or just get it to copy the last character of the string to your result.



String removeDuplicate(String s)


String result = "";

for (int i = 0; i < s.length(); i++)
if (i + 1 < s.length() && s.charAt(i) != s.charAt(i+1))
result = result + s.charAt(i);

if (i + 1 == s.length())
result = result + s.charAt(i);



return result;






I just realized his second example shows that it does remove duplicates even if they don't follow one another. So this solution is incorrect for what he/she is trying to accomplish.
– Chris
Feb 5 at 15:55



You can't. You can create a new String that has duplicates removed. Why aren't you using StringBuilder (or StringBuffer, presumably)?



You can run through the string and store the unique characters in a char array, keeping track of how many unique characters you've seen. Then you can create a new String using the String(char, int, int) constructor.


String(char, int, int)



Also, the problem is a little ambiguous—does “duplicates” mean adjacent repetitions? (In other words, what should happen with abcab?)


abcab



Okay Guys, I have found a better way to do this


public static void alpha(char finalname)

if (finalname == null)

return;


if (finalname.length <2)

return;


char empty = '00';
for (int i=0; i<finalname.length-1; i++)

if (finalname[i] == finalname[i+1])

finalname[i] = empty;



String alphaname = String.valueOf(finalname);
alphaname = alphaname.replace("00", "");
System.out.println(alphaname);







This code makes two mistakes, first: it only replaces consecutive duplicates. It fails to compress abcabc to abc because inside your loop you are only testing the similarity of adjacent indices in the array. second: you are passing a char by reference, and in order to change the array by reference is to destroy it and re-create it, forcing its lifetime to only exist in this particular method. You'll have to return the variable, which makes a clone of the entire thing, one of which needs to be garbage collected.
– Eric Leschinski
Apr 24 '15 at 18:44


abcabc


abc





Yea I realized it later haha thanks for pointing it out
– Tahalil morsilin
Apr 21 '17 at 18:22



Oldschool way (as we wrote such a tasks in Apple ][ Basic, adapted to Java):


int i,j;
StringBuffer str=new StringBuffer();
Scanner in = new Scanner(System.in);
System.out.print("Enter string: ");
str.append(in.nextLine());

for (i=0;i<str.length()-1;i++)
for (j=i+1;j<str.length();j++)
if (str.charAt(i)==str.charAt(j))
str.deleteCharAt(j);


System.out.println("Removed non-unique symbols: " + str);





This answer is right, but it has a runtime complexity of O(n * n * n ). Each time you call str.length, you are stepping the entire array. Since an algorithm can be designed to do this in O(n) runtime complexity without using additional memory, this answer will get you in trouble if I see you put this sort of thing in production. This is the generic easy-to-understand answer given by programmers who write very VERY slow running code. It's a good exercise in understanding runtime complexity.
– Eric Leschinski
Apr 24 '15 at 18:13


O(n * n * n )





O(n2) bad complexity
– nagendra547
Aug 24 '17 at 6:40



Here is another logic I'd like to share. You start comparing from midway of the string length and go backward.



Test with:
input = "azxxzy";
output = "ay";


String removeMidway(String input)
cnt = cnt+1;
StringBuilder str = new StringBuilder(input);
int midlen = str.length()/2;
for(int i=midlen-1;i>0;i--)

for(int j=midlen;j<str.length()-1;j++)
if(str.charAt(i)==str.charAt(j))
str.delete(i, j+1);
midlen = str.length()/2;
System.out.println("i="+i+",j="+j+ ",len="+ str.length() + ",midlen=" + midlen+ ", after deleted = " + str);



return str.toString();



This is another approach


void remove_duplicate (char* str, int len)
unsigned int index = 0;
int c = 0;
int i = 0;
while (c < len)
/* this is just example more check can be added for
capital letter, space and special chars */

int pos = str[c] - 'a';
if ((index & (1<<pos)) == 0) = (1<<pos);

c++;

str[i] = 0;



Another possible solution, in case a string is an ASCII string, is to maintain an array of 256 boolean elements to denote ASCII character appearance in a string. If a character appeared for the first time, we keep it and append to the result. Otherwise just skip it.


public String removeDuplicates(String input)
boolean chars = new boolean[256];
StringBuilder resultStringBuilder = new StringBuilder();
for (Character c : input.toCharArray())
if (!chars[c])
resultStringBuilder.append(c);
chars[c] = true;


return resultStringBuilder.toString();



This approach will also work with Unicode string. You just need to increase chars size.


chars



Solution using JDK7:


public static String removeDuplicateChars(final String str) str.isEmpty())
return str;


final char chArray = str.toCharArray();
final Set<Character> set = new LinkedHashSet<>();
for (char c : chArray)
set.add(c);


final StringBuilder sb = new StringBuilder();
for (Character character : set)
sb.append(character);

return sb.toString();


public static void main(String a)
String name="Madan";
System.out.println(name);
StringBuilder sb=new StringBuilder(name);
for(int i=0;i<name.length();i++)
for(int j=i+1;j<name.length();j++)
if(name.charAt(i)==name.charAt(j))
sb.deleteCharAt(j);




System.out.println("After deletion :"+sb+"");






Good to give some code, but it should come with some explanation to point the changes and why it is the solution of the OP's question.
– рüффп
Oct 6 '16 at 18:16


String str = "eamparuthik@gmail.com";
char c = str.toCharArray();
String op = "";

for(int i=0; i<=c.length-1; i++)
if(!op.contains(c[i] + ""))
op = op + c[i];

System.out.println(op);





Whilst this code snippet is welcome, and may provide some help, it would be greatly improved if it included an explanation of how and why this solves the problem. Remember that you are answering the question for readers in the future, not just the person asking now! Please edit your answer to add explanation, and give an indication of what limitations and assumptions apply.
– Toby Speight
Mar 16 '17 at 15:04


public static String removeDuplicateChar(String str)
char charArray = str.toCharArray();
StringBuilder stringBuilder= new StringBuilder();
for(int i=0;i<charArray.length;i++)
int index = stringBuilder.toString().indexOf(charArray[i]);
if(index <= -1)
stringBuilder.append(charArray[i]);


return stringBuilder.toString();


import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;

public class RemoveDuplicacy

public static void main(String args)throws IOException

BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
System.out.print("Enter any word : ");
String s = br.readLine();
int l = s.length();
char ch;
String ans=" ";

for(int i=0; i<l; i++)

ch = s.charAt(i);
if(ch!=' ')
ans = ans + ch;
s = s.replace(ch,' '); //Replacing all occurrence of the current character by a space


System.out.println("Word after removing duplicate characters : " + ans);




import java.util.Scanner;

public class dublicate
public static void main(String... a)
System.out.print("Enter the String");
Scanner Sc = new Scanner(System.in);
String st=Sc.nextLine();
StringBuilder sb=new StringBuilder();
boolean bc=new boolean[256];
for(int i=0;i<st.length();i++)

int index=st.charAt(i);
if(bc[index]==false)

sb.append(st.charAt(i));
bc[index]=true;



System.out.print(sb.toString());






Whilst this code snippet is welcome, and may provide some help, it would be greatly improved if it included an explanation of how and why this solves the problem. Remember that you are answering the question for readers in the future, not just the person asking now! Please edit your answer to add explanation, and give an indication of what limitations and assumptions apply. (Thanks @Toby Speight for this message)
– Adonis
Aug 4 '17 at 18:47



public static void main(String args)

int i,j;
StringBuffer str=new StringBuffer();
Scanner in = new Scanner(System.in);
System.out.print("Enter string: ");

str.append(in.nextLine());

for (i=0;i<str.length()-1;i++)

for (j=1;j<str.length();j++)

if (str.charAt(i)==str.charAt(j))
str.deleteCharAt(j);


System.out.println("Removed String: " + str);





Please, do not only give code, explain what was wrong and how this code solves the problem.
– Alexandre Fenyo
Aug 24 '17 at 3:30



This is improvement on solution suggested by @Dave. Here, I am implementing in single loop only.



Let's reuse the return of set.add(T item) method and add it simultaneously in StringBuffer if add is successfull



This is just O(n). No need to make a loop again.


String string = "aabbccdefatafaz";

char chars = string.toCharArray();
StringBuilder sb = new StringBuilder();
Set<Character> charSet = new LinkedHashSet<Character>();
for (char c : chars)
if(charSet.add(c) )
sb.append(c);



System.out.println(sb.toString()); // abcdeftz



Simple solution is to iterate through the given string and put each unique character into another string(in this case, a variable result ) if this string doesn't contain that particular character.Finally return result string as output.



Below is working and tested code snippet for removing duplicate characters from the given string which has O(n) time complexity .


private static String removeDuplicate(String s)
String result="";
for (int i=0 ;i<s.length();i++)
char ch = s.charAt(i);
if (!result.contains(""+ch))
result+=""+ch;


return result;



If the input is madam then output will be mad.

If the input is anagram then output will be angrm



Hope this helps.

Thanks



For the simplicity of the code- I have taken hardcore input, one can take input by using Scanner class also


public class KillDuplicateCharInString
public static void main(String args)
String str= "aaaabccdde ";
char arr= str.toCharArray();
int n = arr.length;
String finalStr="";
for(int i=0;i<n;i++)
if(i==n-1)
finalStr+=arr[i];
break;

if(arr[i]==arr[i+1])
continue;

else
finalStr+=arr[i];


System.out.println(finalStr);






public static void main (String args)

Scanner sc = new Scanner(System.in);
String s = sc.next();
String str = "";
char c;
for(int i = 0; i < s.length(); i++)

c = s.charAt(i);
str = str + c;
s = s.replace(c, ' ');
if(i == s.length() - 1)

System.out.println(str.replaceAll("\s", ""));







Give some explanation about your solution and how it solves the problem.
– Vaibhav Desai
Jan 7 at 9:00


package com.st.removeduplicate;
public class RemoveDuplicate
public static void main(String args)
String str1="shushil",str2="";
for(int i=0; i<=str1.length()-1;i++)
int count=0;
for(int j=0;j<=i;j++)
if(str1.charAt(i)==str1.charAt(j))
count++;
if(count >1)
break;

if(count==1)
str2=str2+str1.charAt(i);

System.out.println(str2);






package com.core.interview.client;



import java.util.LinkedHashSet;



import java.util.Scanner;



import java.util.Set;



public class RemoveDuplicateFromString


public static String DupRemoveFromString(String str)




char c1 =str.toCharArray();

Set<Character> charSet = new LinkedHashSet<Character>();

for(char c:c1)

charSet.add(c);


StringBuffer sb = new StringBuffer();


for (Character c2 : charSet)


sb.append(c2);


return sb.toString();



public static void main(String args)


System.out.println("Enter Your String: ");


Scanner sc = new Scanner(System.in);


String str = sc.nextLine();


System.out.println(DupRemoveFromString(str));





Hope this will help.


public void RemoveDuplicates()
String s = "Hello World!";
int l = s.length();
char ch;
String result = "";
for (int i = 0; i < l; i++)
ch = s.charAt(i);
if (ch != ' ')
result = result + ch;

// Replacing space in all occurrence of the current character
s = s.replace(ch, ' ');

System.out.println("After removing duplicate characters : " + result);






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

How do I collapse sections of code in Visual Studio Code for Windows?

ャフサォクコ ケウ,コ,ワ メ,ロスョノ゙,クネ,フムカヤヲニ,エコ゚ツ ウイオン゙ケワサネォキモュキォウイノンコチ゚メヌナイゥフュ,カヒウネェ ネ,ホノケ,ムュキ ッボーミュハ,チ ツス ィ メウイマヤ,゙ウチ ヅ ロ,ォジヌェ ャヌット ェ,マャ,チナエヒネソキツテ トホヲヲミーァ