INDEX
Explanations
modelmodel versions and numbers
mentions of specific language-model names, versions, or size identifiers (e.g., model names with suffixes like "-13B", "1.5", "16K", etc.).
New Auto-Interp
Negative Logits
让你
0.28
fraught
0.27
yıldır
0.27
常见的
0.27
गाई
0.26
неред
0.26
弟
0.26
subordination
0.26
Scientology
0.26
Harry
0.26
POSITIVE LOGITS
icin
0.31
version
0.30
eight
0.28
版本
0.28
II
0.27
modello
0.27
ursprünglich
0.27
training
0.27
Version
0.27
optimized
0.26
Activations Density 0.185%