Hi! (I was originally going to file this issue on mlx-examples, but I figured performance is more relevant to this repository.) I have had the chance to do some comparative benchmarking on PyTorch (MPS) and MLX for training and inference...
A straightfoward implementation of Mamba in PyTorch with a simple parallel scan implementation, offering an major speedup over a sequential implementation, as the parallel scan allows the parallelization over the time dimension. It combines the ease of read with good performances when training. Few ot...
MLX has higher-level packages like mlx.nn and mlx.optimizers with APIs that closely follow PyTorch to simplify building more complex models. Composable function transformations: MLX supports composable function transformations for automatic differentiation, automatic vectorization, and computation graph ...
Any chance you can provide Apple Silicon support with MLX checkpoints, especially for the smaller models? On Qwen2 (non VL model), MLX is supported and it works really quickly! Unfortunately, without the MLX for Qwen2-VL, it's incredibly slow just trying to use the Pytorch MPS backend. ...
MLX CPU performance overtakes PyTorch CPU at batch sizes > 256, and PyTorch's GPU backend finally beats the CPU at extremely large batch sizes of 8192+, matching MLX CPU's performance at 16384. Conclusions Training performance with both MLX and PyTorch on the M1 CPU are virtually indistinguisha...
PyTorch backend. At the time of writing this comparison convolutions are still some of the least optimized operations in MLX. Despite that, MLX still achieves **~40% higher throughput** than PyTorch with a batch size of 16 and ~15% higher when comparing the optimal batch sizes. Notably, ...
可以看到 mlx 社区置顶的 issue 就是解释为啥我们不在 pytorch 里加 backend:Why not implement this i...
MLX 设计理念简单,参考了NumPy、PyTorch、Jax 和ArrayFire等框架,其关键功能包括:* 熟悉的 API:MLX ...
MLX通过结合 NumPy、PyTorch、Jax 和 ArrayFire 等成熟框架的最佳功能,MLX 为开发人员提供了一个强大且...
MLX 拥有像mlx.nn和mlx.optimizers这样的高级包,其 API 紧密遵循 PyTorch,以简化构建更复杂的模型。