INDEX
Explanations
the word "all."
the occurrence of the word "all."
New Auto-Interp
Negative Logits
rouse
-0.70
orio
-0.62
ogram
-0.62
krit
-0.57
ocative
-0.57
osen
-0.57
SPONSORED
-0.56
agogue
-0.55
kamp
-0.55
oice
-0.55
POSITIVE LOGITS
all
2.63
ALL
2.06
all
1.67
everything
1.64
All
1.55
All
1.49
every
1.39
everyone
1.37
everybody
1.31
EVERY
1.24
Activations Density 0.155%