INDEX
Explanations
negative terms indicating lack or absence
New Auto-Interp
Negative Logits
myſelf
-0.82
itſelf
-0.80
raiſ
-0.75
Reſ
-0.75
pleaſure
-0.73
houſe
-0.72
المعيارى
-0.71
ſte
-0.70
purpoſe
-0.70
opérés
-0.69
POSITIVE LOGITS
no
0.83
no
0.80
any
0.69
any
0.63
NO
0.62
gno
0.60
NO
0.59
semantics
0.56
Any
0.55
keinem
0.54
Activations Density 0.254%