INDEX
Explanations
phrases indicating potential actions or outcomes
conditional statements and hypothetical situations
New Auto-Interp
Negative Logits
Named
-0.66
noticed
-0.63
Cosponsors
-0.57
TIM
-0.57
Xuan
-0.56
Gladiator
-0.56
psy
-0.55
named
-0.55
eyed
-0.55
standing
-0.55
POSITIVE LOGITS
require
1.11
ideally
1.10
be
1.06
entail
1.05
imply
1.05
suffice
0.99
allow
0.99
likely
0.97
involve
0.97
presumably
0.96
Activations Density 0.154%