贝尔曼期望方程[Math Processing Error]Vπ(s)=Eπ[Gt|St=s]=Eπ[Rt+1+γGt+1|St=s]=Eπ[Rt...
law of total expectation or law of iterated expectation