INDEX
Explanations
adjectives expressing confusion, frustration, or uncertainty
emotional responses related to confusion or frustration
New Auto-Interp
Negative Logits
waivers
-0.66
bye
-0.64
deviations
-0.63
GP
-0.62
llan
-0.59
Policies
-0.58
approved
-0.57
payers
-0.57
eatures
-0.57
rule
-0.57
POSITIVE LOGITS
ingly
1.60
ating
0.99
eful
0.98
ening
0.96
ifying
0.92
enment
0.89
ruciating
0.89
ing
0.88
risome
0.86
esc
0.85
Activations Density 0.084%