INDEX
Explanations
phrases related to thoughts, actions, or feelings
instances of strong visual or emotional states and actions
New Auto-Interp
Negative Logits
pora
-0.79
successfully
-0.76
inducing
-0.76
grave
-0.75
etheus
-0.74
afort
-0.73
egal
-0.72
olor
-0.71
tie
-0.69
robe
-0.68
POSITIVE LOGITS
plain
0.83
ifiable
0.81
coincidence
0.73
cks
0.70
darn
0.68
Facts
0.67
curiosity
0.67
luck
0.64
scratch
0.63
shrug
0.62
Activations Density 0.311%