INDEX
Explanations
expressions of high praise or quality
New Auto-Interp
Negative Logits
greatness
-0.16
ست
-0.14
abi
-0.14
plevel
-0.14
-0.14
theless
-0.14
elta
-0.14
elight
-0.13
unner
-0.13
emens
-0.13
POSITIVE LOGITS
s
0.30
-grand
0.29
sword
0.22
dane
0.19
lest
0.17
(est
0.17
deal
0.17
coat
0.17
ÏĤ
0.17
fully
0.17
Activations Density 0.048%