在使用pandas的过程中,我们可以结合lambda函数很方便的进行各种数据处理操作。而lambda在pandas就又经常和df.assign、df.apply两个函数组合使用,df.assign经常用于列的修改和新增,apply经常作用于一维向量上,其既可作用于行,也可以作用于列,又可作用于元素。lambda单独使用的示例如下:
lambda:输入是传入到参数列表x的值,输出是根据表达式(expression)计算得到的值。
比如:lambda x, y: xy #函数输入是x和y,输出是它们的积xy
lambda x :x[-2:] #x是字符串时,输出字符串的后两位
lambda x :func #输入 x,通过函数计算后返回结果
lambda x: ‘%.2f’ % x # 对结果保留两位小数
lambda与pandas组合使用,只保留某列字符的最后两位内容的操作如下:
df[‘time’]=df[‘time’].apply(lambda x :x[-2:])
1、使用lambda增加Dataframe一列
df.assign是进行创建修改列操作的函数。
# importing pandas library
import pandas as pd
# creating and initializing a list
values= [['Rohan',455],['Elvish',250],['Deepak',495],
['Soni',400],['Radhika',350],['Vansh',450]]
# creating a pandas dataframe
df = pd.DataFrame(values,columns=['Name','Total_Marks'])
# Applying lambda function to find
# percentage of 'Total_Marks' column
# using df.assign()
df = df.assign(Percentage = lambda x: (x['Total_Marks'] /500 * 100))
# displaying the data frame
df
2、使用lambda进行多列操作
# importing pandas library
import pandas as pd
# creating and initializing a nested list
values_list = [[15, 2.5, 100], [20, 4.5, 50], [25, 5.2, 80],
[45, 5.8, 48], [40, 6.3, 70], [41, 6.4, 90],
[51, 2.3, 111]]
# creating a pandas dataframe
df = pd.DataFrame(values_list, columns=['Field_1', 'Field_2', 'Field_3'])
# Applying lambda function to find
# the product of 3 columns using
# df.assign()
df = df.assign(Product=lambda x: (x['Field_1'] * x['Field_2'] * x['Field_3']))
# printing dataframe
df
上面最后一列的结果是前面三列结果的乘积。
3、单行apply操作
通过符合条件的单行进行平方操作。这里使用了lambda和apply函数的结果,axis=1代表对行进行操作,默认是对列进行操作。
# importing pandas and numpy libraries
import pandas as pd
import numpy as np
# creating and initializing a nested list
values_list = [[15, 2.5, 100], [20, 4.5, 50], [25, 5.2, 80],
[45, 5.8, 48], [40, 6.3, 70], [41, 6.4, 90],
[51, 2.3, 111]]
# creating a pandas dataframe
df = pd.DataFrame(values_list, columns=['Field_1', 'Field_2', 'Field_3'],
index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
# Apply function numpy.square() to square
# the values of one row only i.e. row
# with index name 'd'
df = df.apply(lambda x: np.square(x) if x.name == 'd' else x, axis=1)
# printing dataframe
df
执行结果如下:
Field_1 Field_2 Field_3
----------------------------------------
a 15.0 2.50 100.0
b 20.0 4.50 50.0
c 25.0 5.20 80.0
d 2025.0 33.64 2304.0
e 40.0 6.30 70.0
f 41.0 6.40 90.0
g 51.0 2.30 111.0
4、lambda多行操作示例
可以通过循环,对多行记录进行操作,具体如下:
# importing pandas and numpylibraries
import pandas as pd
import numpy as np
# creating and initializing a nested list
values_list = [[1.5, 2.5, 10.0], [2.0, 4.5, 5.0], [2.5, 5.2, 8.0],
[4.5, 5.8, 4.8], [4.0, 6.3, 70], [4.1, 6.4, 9.0],
[5.1, 2.3, 11.1]]
# creating a pandas dataframe
df = pd.DataFrame(values_list, columns=['Field_1', 'Field_2', 'Field_3'],
index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
# Apply function numpy.square() to square
# the values of 2 rows only i.e. with row
# index name 'b' and 'f' only
df = df.apply(lambda x: np.square(x) if x.name in ['b', 'f'] else x, axis=1)
# Applying lambda function to find product of 3 columns
# i.e 'Field_1', 'Field_2' and 'Field_3'
df = df.assign(Product=lambda x: (x['Field_1'] * x['Field_2'] * x['Field_3']))
# printing dataframe
df
执行结果如下:
Field_1 Field_2 Field_3 Product
-----------------------------------------
a 1.50 2.50 10.0 37.5000
b 4.00 20.25 25.0 2025.0000
c 2.50 5.20 8.0 104.0000
d 4.50 5.80 4.8 125.2800
e 4.00 6.30 70.0 1764.0000
f 16.81 40.96 81.0 55771.5456
g 5.10 2.30 11.1 130.2030
5、同时进行多行多列的操作
这时候可以通过df.apply和df.assign函数同时作用多行和列的值:
# importing pandas and numpylibraries
import pandas as pd
import numpy as np
# creating and initializing a nested list
values_list = [[1.5, 2.5, 10.0], [2.0, 4.5, 5.0], [2.5, 5.2, 8.0],
[4.5, 5.8, 4.8], [4.0, 6.3, 70], [4.1, 6.4, 9.0],
[5.1, 2.3, 11.1]]
# creating a pandas dataframe
df = pd.DataFrame(values_list, columns=['Field_1', 'Field_2', 'Field_3'],
index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
# Apply function numpy.square() to square
# the values of 2 rows only i.e. with row
# index name 'b' and 'f' only
df = df.apply(lambda x: np.square(x) if x.name in ['b', 'f'] else x, axis=1)
# Applying lambda function to find product of 3 columns
# i.e 'Field_1', 'Field_2' and 'Field_3'
df = df.assign(Product=lambda x: (x['Field_1'] * x['Field_2'] * x['Field_3']))
# printing dataframe
df
执行结果如下:
Field_1 Field_2 Field_3 Product
-------------------------------------------
a 1.50 2.50 10.0 37.5000
b 4.00 20.25 25.0 2025.0000
c 2.50 5.20 8.0 104.0000
d 4.50 5.80 4.8 125.2800
e 4.00 6.30 70.0 1764.0000
f 16.81 40.96 81.0 55771.5456
g 5.10 2.30 11.1 130.2030
6、apply函数操作
最后再列一个通过风速、气温、相对湿度计算人体舒适度指数的示例,如下:
import pandas as pd
import numpy as np
import math
path='D:\\data\\57582.csv' #文件路径
data=pd.read_csv(path,index_col=0,encoding='gbk') #读取数据有中文时用gbk解码
#定义舒适指数公式函数,结果保留1位小数
def get_CHB(T,RH,S):
return round(1.8*T-0.55*(1.8*T-26)*(1-RH/100)-3.2*math.sqrt(S)+32,1)
#增加一列CHB并计算数据后赋值
data['舒适指数']=data.apply(lambda x:get_CHB(x['平均气温'],x['平均相对湿度'],x['2M风速']),axis=1)
#打印结果
print(data)
#保存结果
data.to_csv('D:\\CHB.csv',encoding='gbk')
代码中使用了apply和lambda的组合,传入的参数x为整个data数据,在函数中引入的参数则是x[‘平均气温’],x[‘平均相对湿度’],x[‘2M风速’],与自定义的函数get_CHB对应。最后需使用axis=1来指定是对列进行运算。计算结果是: