INDEX
Explanations
negative descriptors related to intelligence or foolishness
New Auto-Interp
Negative Logits
istrovstvÃŃ
-0.17
yang
-0.16
å¹ħ
-0.16
TypeDef
-0.15
odge
-0.15
bjerg
-0.14
typed
-0.14
_eof
-0.14
imate
-0.14
banks
-0.14
POSITIVE LOGITS
arton
0.23
ass
0.22
visor
0.20
est
0.19
asses
0.19
dumb
0.18
bell
0.18
kop
0.17
assed
0.16
foon
0.16
Activations Density 0.016%