假设我们可以使用id来连接这两个数据集,我认为不需要UDF。这可以通过使用内部连接、数组和array_remove等...
A cross join, also known as a Cartesian join, is a join operation that produces the Cartesian product of two DataFrames in PySpark. It pairs each row from the first DataFrame with every row from the second DataFrame, generating a DataFrame with a total number of rows equal to the product ...