INDEX
Explanations
expressions relating to sensation, perception, or feelings
New Auto-Interp
Negative Logits
ains
-0.17
stery
-0.16
oser
-0.15
lund
-0.15
achten
-0.15
zsche
-0.15
engu
-0.15
Leon
-0.15
leon
-0.15
otte
-0.14
POSITIVE LOGITS
lessly
0.23
less
0.17
igne
0.17
wick
0.16
ual
0.16
115
0.15
ertino
0.15
commun
0.15
esh
0.14
apore
0.14
Activations Density 0.022%