Checking for Machine Learning Hardware online for the SMR-lab.
Another advantage of using multiple GPUs, even if you do not parallelize algorithms, is that you can run multiple algorithms or experiments separately on each GPU. Efficient hyperparameter search is the most common use of multiple GPUs. You gain no speedups, but you get faster information about the performance of different hyperparameter settings or different network architecture. This is also very useful for novices, as you can quickly gain insights and experience into how you can train a unfamiliar deep learning architecture.
Using multiple GPUs in this way is usually more useful than running a single network on multiple GPUs via data parallelism. You should keep this in mind when you buy multiple GPUs: Qualities for better parallelism like the number of PCIe lanes is not that important when you buy multiple GPUs.
On the other hand, NVIDIA has now a policy that the use of CUDA in data centers is only allowed for Tesla GPUs and not GTX or RTX cards.
If we look at performance measures of the Tensor-Core-enabled V100 versus TPUv2 we find that both systems have nearly the same in performance for ResNet50 [source is lost, not on Wayback Machine]. However, the Google TPU is more cost-efficient.
Note that to use the benefits of Tensor Cores you should use 16-bit data and weights — avoid using 32-bit with RTX cards!
Normalized performance/cost numbers for convolutional networks (CNN), recurrent networks (RNN) and transformers. Higher is better. An RTX 2060 is more than 5 times more cost-efficient than a Tesla V100. The word RNN numbers refer to biLSTM performance for short sequences of length <100. Benchmarking was done using PyTorch 1.0.1 and CUDA 10.
So a 16-bit 8GB memory is about equivalent in size to a 12 GB 32-bit memory.
Comparison between RTX 2060 (& Super version), RTX 2070 (& Super version), RTX 2080 Super:
Cpu vs GPU from 2018
Test between my GPU vs a RTX 2060
List of supported GPU’s
Please, do not spend your time on figuring out hardware combinations and server installations.
It will take much longer than you think and is not worth the money in comparison to the usage, you can set up a cloud server for ML training in minutes.
Yes but problem is:
- Can’t rely on students turning VPS off after use
- No easy way to let school pay for the (recurring) cost of VPS
With own hardware we don’t have those problems
Wow this is cool man. Missed this when I quickly scanned your post first time 'round