frompyspark.sql.typesimportStringTypedefgetTwoDigits(arr):forxinarr:iflen(x) ==2:returnxreturnNoneextractTwoDigits_udf = F.udf(getTwoDigits, StringType()) df = df.withColumn("twoDigits", extractTwoDigits_udf(F.col("array_with_strings"))) df.show(truncate=False)# +---+---+# |id |...
I would like to check if items in my lists are in the strings in my column, and know which of them. Let say I have a PySpark Dataframe containingidanddescriptionwith 25M rows like this: And I have a list of strings like this : ...