Project: K-Means clustering Algorithm for Customer Segmentation (Parallelizing K-Means: Serial, OpenMP, and CUDA Approaches) (Programs which uses Parallel computing concepts in order to speedup the execution tim
Hong, "An Efficient k-Means Algorithm on CUDA," Proc. IEEE Int'l Symp. Parallel and Distributed Processing Workshops and PhD Forum (IPDPSW '11), pp. 1740-1749, 2011.Jiadong Wu,Bo Hong."An Efficient k-means Algorithm on CUDA". 2011IEEEInternational Parallel&Distributed Processing Symposium ...
This software is dervied from Professor Wei-keng Liao's parallel k-means clustering code obtained on November 21, 2010 fromhttp://users.eecs.northwestern.edu/~wkliao/Kmeans/index.html(http://users.eecs.northwestern.edu/~wkliao/Kmeans/simple_kmeans.tar.gz). With his permission, I am publi...
The fasttraining implementationcontains two parts, the K-meansalgorithm for modelinitializationand the EM algorithm for parameter estimation.Furthermore, this fasttrainingmethod has been applied in language GMMs training.The experimental results show thatlanguagemodel training using GPU is about 26 times ...
算法级联(Algorithm Cascading):这种方法实际上是算法级联的一个应用,即将顺序算法和并行算法结合起来。
enum CUmemDecompressAlgorithm Bitmasks for CU_DEVICE_ATTRIBUTE_MEM_DECOMPRESS_ALGORITHM_MASK. Functions CUresult cuArray3DCreate ( CUarray* pHandle, const CUDA_ARRAY3D_DESCRIPTOR* pAllocateArray ) Creates a 3D CUDA array. CUresult cuArray3DGetDescriptor ( CUDA_ARRAY3D_DESCRIPTOR* pArrayDescri...
void AddTraceMe(absl::string_view traceme_name, const void* chunk_ptr, int64_t req_bytes, int64_t alloc_bytes) TF_EXCLUSIVE_LOCKS_REQUIRED(lock_); // A ChunkHandle is an index into the chunks_ vector in BFCAllocator // kInvalidChunkHandle means an invalid chunk typedef size_t ChunkHa...
作品名:高并行度K-means聚类算法 作者:菅立恒 学校:中科院研究生院 详细点评:获得了35倍(Tesla C1060与4核Intel CPU相比),程序高度优化,充分利用了GPU的各种优化策略。文档完整,清晰。测试充分。并与目前公开的他人实现的算法进行了比较,性能更佳 自选组 作品名:基于GPU的高性能嵌入式应用实现与性能优化 作者:王晨...
38.3.2 Seismic Migration Using the SRMIP Algorithm In the case of the CGGVeritas algorithm, called SRMIP, that we want to develop using CUDA, the wave propagation is performed using a finite-difference algorithm applied in the frequency domain. ...
Its occurrence should therefore be minimized by mapping the algorithm to the CUDA programming model in such a way that the computations that require inter-thread communication are performed within a single thread block as much as possible. 这第二种情况要比第一种情况不太优化,因为它增加了额外的...