INDEX
Explanations
words related to personal or societal actions or behaviors
references to "actions" and their implications or consequences
New Auto-Interp
Negative Logits
bid
-0.89
anus
-0.75
ļéĨĴ
-0.70
skinned
-0.69
vu
-0.65
weights
-0.63
Dwell
-0.63
comma
-0.62
ickets
-0.61
inately
-0.61
POSITIVE LOGITS
uate
1.01
ives
0.94
actions
0.92
terday
0.91
uits
0.90
hops
0.88
uations
0.87
undertaken
0.86
ivism
0.86
ACTIONS
0.86
Activations Density 0.060%