INDEX
Explanations
phrases containing the words "no one"
the repetition of the word "one."
New Auto-Interp
Negative Logits
anga
-0.63
prem
-0.62
xton
-0.60
ypes
-0.59
efully
-0.59
gur
-0.58
KING
-0.56
utterstock
-0.56
ahime
-0.56
Territories
-0.55
POSITIVE LOGITS
else
1.13
whatsoever
0.90
dime
0.89
bothered
0.81
ody
0.76
rieve
0.72
cared
0.72
imaginable
0.72
else
0.71
answ
0.71
Activations Density 0.031%