INDEX
Explanations
words related to action or behavior
instances of the word "act" and its variations, indicating a focus on actions or behaviors
New Auto-Interp
Negative Logits
ickets
-0.69
nic
-0.66
yip
-0.65
ciating
-0.64
fer
-0.62
antha
-0.60
ixels
-0.60
xual
-0.59
burn
-0.59
Teg
-0.59
POSITIVE LOGITS
uate
1.13
uated
1.08
uary
1.04
accordingly
0.98
decisively
0.92
differently
0.92
inic
0.90
upon
0.90
impuls
0.88
appropriately
0.87
Activations Density 0.053%