INDEX
Explanations
phrases related to actions, especially focusing on the impact on others
phrases concerning personal or collective actions and their implications
New Auto-Interp
Negative Logits
bid
-0.76
mbuds
-0.66
inately
-0.66
AES
-0.65
anus
-0.63
ixed
-0.61
skinned
-0.61
owship
-0.61
Dise
-0.61
BLE
-0.60
POSITIVE LOGITS
ACTIONS
1.02
uations
1.00
actions
0.98
uate
0.92
hops
0.91
uation
0.89
uated
0.87
ives
0.85
uits
0.83
ivism
0.82
Activations Density 0.040%