How do we roughly estimate whether the GPU memory is sufficient by the model parameters?
I. Understand the parameters of large language models
1.1 Model parameter unit
Terms such as “10b”, “13b” and “70b” usually refer to the number of parameters of large neural network models. “b” stands for “billion”, that is, billion. Represents the number of parameters in the model, and each parameter is used to store information such as the weight and deviation of the model.
For example:
“10b” means that the model has about 10 billion parameters.
“13b” means that the model has about 13 billion parameters.
“70b” means that the model has about 70 billion parameters.