INDEX
Explanations
words related to emotional reactions or states such as "shocked", "heartened", "gratified", "humbled"
expressions of strong emotions and reactions
New Auto-Interp
Negative Logits
occurrence
-0.76
divergence
-0.73
pecul
-0.73
contribution
-0.71
arcs
-0.69
legitimacy
-0.68
inacc
-0.66
similarity
-0.66
transitions
-0.66
interchange
-0.66
POSITIVE LOGITS
raged
1.21
alysed
1.20
azed
1.19
rified
1.17
cerned
1.16
ivated
1.15
iliated
1.12
racted
1.12
informed
1.11
employed
1.10
Activations Density 0.283%