从 Java 8 中的列表中提取重复对象

Extract duplicate objects from a List in Java 8(从 Java 8 中的列表中提取重复对象)
本文介绍了从 Java 8 中的列表中提取重复对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

此代码从原始列表中删除重复项,但我想从原始列表中提取重复项 -> 不删除它们(此包名称只是另一个项目的一部分):

This code removes duplicates from the original list, but I want to extract the duplicates from the original list -> not removing them (this package name is just part of another project):

给定:

一个人 pojo:

package at.mavila.learn.kafka.kafkaexercises;

import org.apache.commons.lang3.builder.ToStringBuilder;

public class Person {

private final Long id;
private final String firstName;
private final String secondName;


private Person(final Builder builder) {
    this.id = builder.id;
    this.firstName = builder.firstName;
    this.secondName = builder.secondName;
}


public Long getId() {
    return id;
}

public String getFirstName() {
    return firstName;
}

public String getSecondName() {
    return secondName;
}

public static class Builder {

    private Long id;
    private String firstName;
    private String secondName;

    public Builder id(final Long builder) {
        this.id = builder;
        return this;
    }

    public Builder firstName(final String first) {
        this.firstName = first;
        return this;
    }

    public Builder secondName(final String second) {
        this.secondName = second;
        return this;
    }

    public Person build() {
        return new Person(this);
    }


}

@Override
public String toString() {
    return new ToStringBuilder(this)
            .append("id", id)
            .append("firstName", firstName)
            .append("secondName", secondName)
            .toString();
}
}

重复提取码.

注意这里我们过滤了 id 和名字来检索一个新列表,我在其他地方看到了这段代码,不是我的:

Notice here we filter the id and the first name to retrieve a new list, I saw this code someplace else, not mine:

package at.mavila.learn.kafka.kafkaexercises;

import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.concurrent.ConcurrentHashMap;
import java.util.function.Function;
import java.util.function.Predicate;
import java.util.stream.Collectors;

import static java.util.Objects.isNull;

public final class DuplicatePersonFilter {


private DuplicatePersonFilter() {
    //No instances of this class
}

public static List<Person> getDuplicates(final List<Person> personList) {

   return personList
           .stream()
           .filter(duplicateByKey(Person::getId))
           .filter(duplicateByKey(Person::getFirstName))
           .collect(Collectors.toList());

}

private static <T> Predicate<T> duplicateByKey(final Function<? super T, Object> keyExtractor) {
    Map<Object,Boolean> seen = new ConcurrentHashMap<>();
    return t -> isNull(seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE));

}

}

测试代码.如果你运行这个测试用例,你会得到 [alex, lolita, elpidio, romualdo].

The test code. If you run this test case you will get [alex, lolita, elpidio, romualdo].

我希望得到 [romualdo, otroRomualdo] 作为给定 id 和 firstName 的提取副本:

I would expect to get instead [romualdo, otroRomualdo] as the extracted duplicates given the id and the firstName:

package at.mavila.learn.kafka.kafkaexercises;


import org.junit.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.ArrayList;
import java.util.List;

import static org.junit.Assert.*;

public class DuplicatePersonFilterTest {

private static final Logger LOGGER = LoggerFactory.getLogger(DuplicatePersonFilterTest.class);



@Test
public void testList(){

    Person alex = new Person.Builder().id(1L).firstName("alex").secondName("salgado").build();
    Person lolita = new Person.Builder().id(2L).firstName("lolita").secondName("llanero").build();
    Person elpidio = new Person.Builder().id(3L).firstName("elpidio").secondName("ramirez").build();
    Person romualdo = new Person.Builder().id(4L).firstName("romualdo").secondName("gomez").build();
    Person otroRomualdo = new Person.Builder().id(4L).firstName("romualdo").secondName("perez").build();


    List<Person> personList = new ArrayList<>();

    personList.add(alex);
    personList.add(lolita);
    personList.add(elpidio);
    personList.add(romualdo);
    personList.add(otroRomualdo);

    final List<Person> duplicates = DuplicatePersonFilter.getDuplicates(personList);

    LOGGER.info("Duplicates: {}",duplicates);

}

}

在我的工作中,我能够通过使用 TreeMap 和 ArrayList 的 Comparator 来获得所需的结果,但这是创建一个列表然后对其进行过滤,再次将过滤器传递给新创建的列表,这看起来很臃肿的代码,(并且可能效率低下)

In my job I was able to get the desired result it by using Comparator using TreeMap and ArrayList, but this was creating a list then filtering it, passing the filter again to a newly created list, this looks bloated code, (and probably inefficient)

有人对如何提取重复项有更好的想法吗?而不是删除它们.

Does someone has a better idea how to extract duplicates?, not remove them.

提前致谢.

更新

感谢大家的回答

使用与 uniqueAttributes 相同的方法删除重复项:

To remove the duplicate using same approach with the uniqueAttributes:

  public static List<Person> removeDuplicates(List<Person> personList) {
    return getDuplicatesMap(personList).values().stream()
            .filter(duplicates -> duplicates.size() > 1)
            .flatMap(Collection::stream)
            .collect(Collectors.toList());
}

private static Map<String, List<Person>> getDuplicatesMap(List<Person> personList) {
    return personList.stream().collect(groupingBy(DuplicatePersonFilter::uniqueAttributes));
}

private static String uniqueAttributes(Person person){

    if(Objects.isNull(person)){
        return StringUtils.EMPTY;
    }

    return (person.getId()) + (person.getFirstName()) ;
}

更新 2

但@brett-ryan 提供的答案也是正确的:

But also the answer provided by @brett-ryan is correct:

public static List<Person> extractDuplicatesWithIdentityCountingV2(final List<Person> personList){

        List<Person> duplicates = personList.stream()
                .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
                .entrySet().stream()
                .filter(n -> n.getValue() > 1)
                .flatMap(n -> nCopies(n.getValue().intValue(), n.getKey()).stream())
                .collect(toList());

        return duplicates;

    }

编辑

上面的代码可以在下面找到:

Above code can be found under:

https://gitlab.com/totopoloco/marco_utilities/-/tree/master/duplicates_exercises

请看:

用法:https://gitlab.com/totopoloco/marco_utilities/-/blob/master/duplicates_exercises/src/test/java/at/mavila/exercises/duplicates/lists/DuplicatePersonFilterTest.java

实施:https://gitlab.com/totopoloco/marco_utilities/-/blob/master/duplicates_exercises/src/main/java/at/mavila/exercises/duplicates/lists/DuplicatePersonFilter.java

推荐答案

如果你可以在 Person 上实现 equalshashCode 那么你就可以使用 groupingBy 的计数下游收集器来获取已重复的不同元素.

If you could implement equals and hashCode on Person you could then use a counting down-stream collector of the groupingBy to get distinct elements that have been duplicated.

List<Person> duplicates = personList.stream()
  .collect(groupingBy(identity(), counting()))
  .entrySet().stream()
  .filter(n -> n.getValue() > 1)
  .map(n -> n.getKey())
  .collect(toList());

如果您想保留一个连续重复元素的列表,您可以使用 Collections.nCopies 将其展开.此方法将确保重复的元素排列在一起.

If you would like to keep a list of sequential repeated elements you can then expand this out using Collections.nCopies to expand it back out. This method will ensure repeated elements are ordered together.

List<Person> duplicates = personList.stream()
    .collect(groupingBy(identity(), counting()))
    .entrySet().stream()
    .filter(n -> n.getValue() > 1)
    .flatMap(n -> nCopies(n.getValue().intValue(), n.getKey()).stream())
    .collect(toList());

这篇关于从 Java 8 中的列表中提取重复对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

相关文档推荐

How to send data to COM PORT using JAVA?(如何使用 JAVA 向 COM PORT 发送数据?)
How to make a report page direction to change to quot;rtlquot;?(如何使报表页面方向更改为“rtl?)
Use cyrillic .properties file in eclipse project(在 Eclipse 项目中使用西里尔文 .properties 文件)
Is there any way to detect an RTL language in Java?(有没有办法在 Java 中检测 RTL 语言?)
How to load resource bundle messages from DB in Java?(如何在 Java 中从 DB 加载资源包消息?)
How do I change the default locale settings in Java to make them consistent?(如何更改 Java 中的默认语言环境设置以使其保持一致?)