data is loaded into pandas dataframes, then an intial exploration is conducted, the next phase was assessing the data quality: that included multiple aspects such as: Detection of missing values, detection of duplicated values, detection of outliers and fiannly inconsistentny between the datsets....
Parameters --- raw_data_frame: pandas.DataFrame The data frame containing the data to check. allow_nan: bool, optional (default=False) If True, allows NaN values in the data. Otherwise, an error is raised. """ if (raw_data_frame.dtypes == 'object').values.any(): # scikit-multi...
Check consistency of the same columns between two different tables by merging tables on the provided keys. (It might be useful when we want to compare training set with test set, or sample table from two different snapshot dates) 'key': same as data_compare for key type ...
Feature Engineering tab: Features engineered such as salary_Ratio1 exist as columns in the excel. Value 1 means that feature was engineered in that particular experiment and 0 means it was absent. Modelling tab: This tab tracks all the variables used in the code. Say variable precision was co...