pandas if条件判断

python pandas模块是一个功能强大的DataFrame数据处理模块,这里就是结果几个具体常见的应用场景来展示下其应用,该处展示的功能excel上也可以实现,不过站在一个懂python的人角度来说,我觉得这种处理方法比excel更高效好玩。

一、数字判断

这里实现的功能比较简单,根据一列数据生成另一列数据,比如,我们给出一列数据1-10,其中大于4的判断为False,小于等于4的设置为True。操作语法如下:



<br />
df.loc[df['column name'] condition, 'new column name'] = 'value if condition is met'
完整示例代码:



<br />
from pandas import DataFrame
numbers = {'set_of_numbers': [1,2,3,4,5,6,7,8,9,10]}
df = DataFrame(numbers,columns=['set_of_numbers'])
df.loc[df['set_of_numbers'] <= 4, 'equal_or_lower_than_4?'] = 'True'
df.loc[df['set_of_numbers'] > 4, 'equal_or_lower_than_4?'] = 'False'
print (df)
输出结果如下:



<img src="https://www.361way.com/wp-content/uploads/2020/02/pandas-if-number.png" width="350" height="174" title="pandas-if-number" alt="pandas-if-number" />

二、lambda数字判断

上面的示例比较好理解,直接通过值的多次判断给出不同的值,不好的地方在于需要写多个条件,哪有一条搞定的方法。当然有,使用lambda匿名函数,如下:



<br />
df['new column name'] = df['column name'].apply(lambda x: 'value if condition is met' if x condition else 'value if condition is not met')
完整测试代码:



<br />
from pandas import DataFrame
numbers = {'set_of_numbers': [1,2,3,4,5,6,7,8,9,10]}
df = DataFrame(numbers,columns=['set_of_numbers'])
df['equal_or_lower_than_4?'] = df['set_of_numbers'].apply(lambda x: 'True' if x <= 4 else 'False')
print (df)
上面代码的输出效果和第一种输出是一样的。效率吗,数据量比较小是看不出的,你可以生成一个大列表几十万数据量的列表,分别试下效果看。

三、字符串判断

这里的需求也比较简单,比如一个列表里有多个名字,我们现在想把BILL找出来,是BILL的行显示Match,不叫的BILL的Mismatch 。这个示例就是第一个示例的变种,测试代码如下:



<br />
from pandas import DataFrame
names = {'First_name': ['Jon','Bill','Maria','Emma']}
df = DataFrame(names,columns=['First_name'])
df.loc[df['First_name'] == 'Bill', 'name_match'] = 'Match'
df.loc[df['First_name'] != 'Bill', 'name_match'] = 'Mismatch'
print (df)
# 代码执行后输出如下:
  First_name name_match
0        Jon   Mismatch
1       Bill      Match
2      Maria   Mismatch
3       Emma   Mismatch
同样该问题也可以用lamda解决:



<br />
from pandas import DataFrame
names = {'First_name': ['Jon','Bill','Maria','Emma']}
df = DataFrame(names,columns=['First_name'])
df['name_match'] = df['First_name'].apply(lambda x: 'Match' if x == 'Bill' else 'Mismatch')
print (df)

四、字符串多重匹配

还是第三部分的示例,假如我想找出BILL和Emma两个人,怎么办呢?是不是要加 or 条件 and 条件判断,具体代码如下:



<br />
from pandas import DataFrame
names = {'First_name': ['Jon','Bill','Maria','Emma']}
df = DataFrame(names,columns=['First_name'])
df.loc[(df['First_name'] == 'Bill') | (df['First_name'] == 'Emma'), 'name_match'] = 'Match'
df.loc[(df['First_name'] != 'Bill') & (df['First_name'] != 'Emma'), 'name_match'] = 'Mismatch'
print (df)
#输出结果如下:
  First_name name_match
0        Jon   Mismatch
1       Bill      Match
2      Maria   Mismatch
3       Emma      Match

五、修改匹配值

经如,我想将某列数据中的5改为555,0改为999,具体代码如下:



<br />
from pandas import DataFrame
numbers = {'set_of_numbers': [1,2,3,4,5,6,7,8,9,10,0,0]}
df = DataFrame(numbers,columns=['set_of_numbers'])
print (df)
df.loc[df['set_of_numbers'] == 0, 'set_of_numbers'] = 999
df.loc[df['set_of_numbers'] == 5, 'set_of_numbers'] = 555
print (df)

 效果如下:

<br />



<img src="https://www.361way.com/wp-content/uploads/2020/02/pandas-number-change.png" width="402" height="273" title="pandas-number-change" alt="pandas-number-change" />



同样的,我们还可以把pandas中的Nan数据修改为0,具体代码如下:



<br />
from pandas import DataFrame
import numpy as np
numbers = {'set_of_numbers': [1,2,3,4,5,6,7,8,9,10,np.nan,np.nan]}
df = DataFrame(numbers,columns=['set_of_numbers'])
print (df)
df.loc[df['set_of_numbers'].isnull(), 'set_of_numbers'] = 0
print (df)
<img src="https://www.361way.com/wp-content/uploads/2020/02/pandas-nan-zero.png" width="418" height="268" title="pandas-nan-zero" alt="pandas-nan-zero" />

pandas if条件判断》有2条评论

  1. Pingback: DataFrame | Coding栈

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注