注:本文由VeryToolz翻译自Pyspark - Filter dataframe based on multiple conditions,非经特殊声明,文中代码和图片版权归原作者kumar_satyam所有,本译文的传播和使用请遵循“署名-相同方式共享 4.0 国际 (CC BY-SA 4.0)”协议。
你可以用where在你的df```df.where("""(col1='FALSE' AND col2='Approved') OR col1 <> 'FA...
你可以用where在你的df```df.where("""(col1='FALSE' AND col2='Approved') OR col1 <> 'FA...
(df2)# Use not in filter with multiple columnlist_values=["Spark","Pandas",1000]df2=df[~df[['Courses','Discount']].isin(list_values).any(axis=1)]print(df2)# Filter in Courses and Duration columnlist_values=["PySpark",'30days']df2=df[~df[['Courses','Duration']].isin(list_...
# 1 PySpark 25000 50days 2300 # 2 Hadoop 23000 30days 1000 # 3 Python 24000 None 1200 # 4 Pandas 26000 NaN 2500 Pandas Filter by Multiple Columns In pandas or any table-like structures, most of the time we would need to filter the rows based on multiple conditions by using multiple ...
# Filter Pandas DataFrame of Multiple columns. df2=df.apply(lambda col: col.str.contains('Spark|Python', na=False), axis=1) # Join multiple terms. terms = ['Spark', 'PySpark'] df2=df[df['Courses'].str.contains('|'.join(terms))] ...