你可以使用pyspark,但是你必须看看它在你的场景中的表现。我在每个步骤中添加了注解来解释逻辑。
"""scol = self.spark.column# Here we check:# 1. the distinct count without nulls and count without nulls for non-null values# 2. count null values and see if null is a distinct value.## This workaround is in order to calculate the distinct count including nulls in# single pass. No...