"已有列表： a = [1, 2, 3, 2, 1, 5, 6, 5, 5, 5] 写法 1: import collections print([item for item, count in collections.Counter(a). ...."

gurongrong

Rpa 180 号会员
python基础其他经验 • 2 回帖 • 1.1K 浏览 • 2019-05-07 15:37:05

剔除列表中重复元素

已有列表：

a = [1, 2, 3, 2, 1, 5, 6, 5, 5, 5]

写法 1:

import collections

print([item for item, count in collections.Counter(a).items() if count > 1])

## [1, 2, 5]

但是 Counter 效率不高，写法 2:

seen = set()
uniq = []
for x in a:
   if x not in seen:
	seen.add(x)
   else:
	duplicated.add(x)
print(duplicated)
# {1， 2， 5}

和写法 1 比较，这种写法不那么直观。
我自己的写法：

a = [1, 2, 3, 2, 1, 5, 6, 5, 5, 5]
b = set(a)
for each_b in b:
    count = 0
    for each_a in a:
        if each_b == each_a:
            count += 1
    print(each_b, ": ", count)

运行如下：
1 : 2
2 : 2
3 : 1
5 : 4
6 : 1

或者这样写：

duplicated = set()  
for i in range(0, len(a)):
    if a[i] in a[i+1:]:
        duplicated.add(a[i])
print(duplicated)
# {1， 2， 5}

用下面的函数还可以列出重复元素的索引：

a = [1, 2, 3, 2, 1, 5, 6, 5, 5, 5]
source = a
from collections import defaultdict
def list_duplicates(seq):
    tally = defaultdict(list)
    for i,item in enumerate(seq):
        tally[item].append(i)
    return ((key,locs) for key,locs in tally.items() 
                            if len(locs)>1)
 
for dup in sorted(list_duplicates(source)):
    print(dup)

输出如下：
(1, [0, 4])
(2, [1, 3])
(5, [5, 7, 8, 9])