Monday, March 11, 2024

New top story on Hacker News: Who uses Google TPUs for inference in production?

Who uses Google TPUs for inference in production?
19 by arthurdelerue | 2 comments on Hacker News.
I am really puzzled by TPUs. I've been reading everywhere that TPUs are powerful and a great alternative to NVIDIA. I have been playing with TPUs for a couple of months now, and to be honest I don't understand how can people use them in production for inference: - almost no resources online showing how to run modern generative models like Mistral, Yi 34B, etc. on TPUs - poor compatibility between JAX and Pytorch - very hard to understand the memory consumption of the TPU chips (no nvidia-smi equivalent) - rotating IP addresses on TPU VMs - almost impossible to get my hands on a TPU v5 Is it only me? Or did I miss something? I totally understand that TPUs can be useful for training though.

No comments:

Post a Comment