INDEX
Explanations
phrases related to making decisions and taking action
expressions of frustration or critique regarding societal issues and responsibilities
New Auto-Interp
Negative Logits
sole
-0.79
soType
-0.75
Attribution
-0.70
ledged
-0.69
iban
-0.66
hailed
-0.65
lor
-0.65
ithe
-0.65
indistinguishable
-0.64
applaud
-0.64
POSITIVE LOGITS
Otherwise
1.10
Luckily
1.07
Ideally
1.02
Fortunately
0.93
preferably
0.92
Thankfully
0.86
Otherwise
0.79
lest
0.78
roman
0.74
Normally
0.70
Activations Density 0.668%