INDEX
Explanations
questions punctuated with a question mark
New Auto-Interp
Negative Logits
enen
-0.16
¢
-0.15
eness
-0.14
ibase
-0.14
éłĵ
-0.14
quez
-0.14
äl
-0.14
ãĥ§
-0.14
_DF
-0.13
ãĤ¥
-0.13
POSITIVE LOGITS
ariat
0.17
ÑģоÑĢ
0.16
Affected
0.16
bia
0.16
704
0.15
rama
0.15
lic
0.15
Insensitive
0.15
basket
0.14
dock
0.14
Activations Density 0.033%