INDEX
Explanations
negative words or phrases indicating denial or absence
New Auto-Interp
Negative Logits
nya
-0.16
noon
-0.16
patrick
-0.16
Various
-0.15
ucci
-0.15
rape
-0.15
mente
-0.15
rick
-0.15
empo
-0.15
nek
-0.14
POSITIVE LOGITS
thin
0.35
-one
0.35
longer
0.34
xious
0.33
things
0.30
isy
0.28
one
0.27
ël
0.26
pe
0.26
BODY
0.25
Activations Density 0.100%