INDEX
Explanations
phrases indicating negation
phrases emphasizing limitations or negations
New Auto-Interp
Negative Logits
ahime
-0.71
Presence
-0.71
iership
-0.68
lycer
-0.65
mosp
-0.65
ideon
-0.64
yss
-0.60
Basics
-0.59
igree
-0.59
Metatron
-0.59
POSITIVE LOGITS
xious
1.07
except
0.86
avail
0.85
oses
0.83
ct
0.82
obs
0.79
xus
0.78
AH
0.78
vell
0.77
longer
0.77
Activations Density 0.052%