This bug was remarkable since the result was not gibberish but maximally bad output. The authors were asleep during the training process, so the problem was noticed only once training had finished. A mechanism such as Toyota’s Andon cord(opens in a new window) could have prevented this,...
Now I wouldn't say I have full confidence that the PyTorch script is maximally tuned, but the following observations can be made. PyTorch seems to be taking a lot more memory (this run is ~80GB), while llm.c is at 57GB (29% improvement). Memory is important because it allows you to...
Now I wouldn't say I have full confidence that the PyTorch script is maximally tuned, but the following observations can be made. PyTorch seems to be taking a lot more memory (this run is ~80GB), while llm.c is at 57GB (29% improvement). Memory is important because it allows you to...