INDEX
Explanations
instances of the word "for" in various contexts
New Auto-Interp
Negative Logits
lse
-0.20
immers
-0.15
angu
-0.15
dn
-0.15
dent
-0.14
rawer
-0.14
okoj
-0.14
iza
-0.14
ENSOR
-0.13
dle
-0.13
POSITIVE LOGITS
fun
0.18
ató
0.18
hire
0.18
FUN
0.17
sembling
0.16
_fun
0.15
pleasure
0.15
plier
0.15
Fun
0.15
============================================================================↵
0.14
Activations Density 0.101%