pandas DataFrame 根据其他列新建列并赋值
主要是 DataFrame.apply 函数的应用
如果设置 axis 参数为 0 则每次函数会取出 DataFrame 的一行来做处理;
如果设置 axis 参数为 1 则每次函数会取出 DataFrame 的一列来做处理。
如代码所示,判断数学和英语均大于 75 分的同学,则新列 test 值赋为 1,否则为 0。
import pandas as pd
data = {
'name': ["one", "two", "three", "four", "five", "six", "seven"],
'math': [99, 65, 78, 43, 88, 75, 36],
'English': [85, 74, 92, 76, 86, 36, 72]
}
frame = pd.DataFrame(data, columns=['name', 'math', 'English'])
# 查找数学和英语均大于75分的同学
def function(a, b):
if a > 75 and b > 75:
return 1
else:
return 0
print(frame)
# 两种格式都可以
# frame['test'] = frame.apply(lambda x: function(x.math, x.English), axis=1)
frame['test'] = frame.apply(lambda x: function(x["math"], x["English"]), axis=1)
print(frame)
运行结果如下:
name math English
0 one 99 85
1 two 65 74
2 three 78 92
3 four 43 76
4 five 88 86
5 six 75 36
6 seven 36 72
name math English test
0 one 99 85 1
1 two 65 74 0
2 three 78 92 1
3 four 43 76 0
4 five 88 86 1
5 six 75 36 0
6 seven 36 72 0
另外 Series 类型也有 apply 函数,用法示例如下:
import pandas as pd
data = {
'name': ["one", "two", "three", "four", "five", "six", "seven"],
'math': [99, 65, 78, 43, 88, 75, 36],
'English': [85, 74, 92, 76, 86, 36, 72]
}
frame = pd.DataFrame(data, columns=['name', 'math', 'English'])
print(frame)
# 判断数学成绩是否及格
frame['test'] = frame.math.apply(lambda x: 1 if x >= 60 else 0)
print(frame)
运行效果如下:
name math English
0 one 99 85
1 two 65 74
2 three 78 92
3 four 43 76
4 five 88 86
5 six 75 36
6 seven 36 72
name math English test
0 one 99 85 1
1 two 65 74 1
2 three 78 92 1
3 four 43 76 0
4 five 88 86 1
5 six 75 36 1
6 seven 36 72 0