Clustering-based undersampling with random over sampling examples and support vector machine for imbalanced classification of breast cancer diagnosisBreast cancer diagnosisclass-imbalance problemsample selectionTo overcome the two-class imbalanced classification problem existing in the diagnosis of breast cancer,...
Keywords: ensemble classifiers; healthcare-associated infections; ICU infections; imbalanced data; machine learning; oversampling; undersampling 1. Introduction Healthcare-associated infections (HAI) are one of the major problems of health systems in many countries due to their direct impact on ...
The empirical outcomes on the benchmark datasets showed that the recommended SC-FSM framework had a better output compared to previous under-sampling techniques. Statistical tests were used for more detailed analysis. The statistical analysis results showed the superiority of the Koczy fuzzy similarity ...
样算法(random under-sampling algorithm,RUS算法)进 行了预处理,以降低数据集的不平衡度。由于本实验是 一个不平衡数据分类实验,所以传统算法的分类准确 率评价指标不能完全反映出分类器的性能。为有效进 行不平衡数据分类问题上的分类器性能评价,本文使 ...
Clustering-based undersampling in class-imbalanced data - ScienceDirect Class imbalance is often a problem in various real-world data sets, where one class (i.e. the minority class) contains a small number of data points and th... Wei-Chao,Lin,Chih-Fong,... - 《Information Sciences》 被...
namely Active Learning using a Clustering-based Sampling (ALCS), is proposed to simultaneously consider the representativeness and informativeness of samples using no prior label information. A density-based clustering approach is employed to explore the cluster structure from the data without requiring ex...
In addition, the dataset was maintained at a constant outcome ratio through stratified sampling for analysis, but this may not necessarily be the case with a new dataset. Nonetheless, a neural network-based cluster model was first applied to stroke patients from a real-world dataset. Second, ...
The area under the curve (AUC) of each ROC curve was used to gauge performances. To examine the frequency of amplification and deletions for subgroups of samples or populations and evaluate the sensitivity of our CNA-calling method, we further combined the ρ′ data to create ρ′′ by ...
In this step, the optimization is aimed towards the data base of the FCM model and based on the sampling data as Eq. (3). MATLAB-Fuzzy Logic Tool Box (genfis) is utilized to generate the FIS of FCM model. To improve the precision as well as reduce the loss of the interpretability,...
Furthermore, the relatively large spread of values in these three measures shows that, except for SilhoutteOD, all methods have a low resilience concerning the type of data under analysis even when using their best possible parameter configurations. Overall, both the clustering-based and the non-...