INDEX
Explanations
phrases related to actions and behavior
references to actions and their accountability or consequences
New Auto-Interp
Negative Logits
bid
-0.80
antha
-0.71
inately
-0.66
alg
-0.66
ļéĨĴ
-0.65
skinned
-0.65
ARC
-0.64
fortable
-0.63
ANC
-0.63
Flavoring
-0.63
POSITIVE LOGITS
undertaken
1.02
actions
0.92
uate
0.90
toward
0.90
towards
0.89
ACTIONS
0.85
perpetrated
0.82
betray
0.80
impuls
0.79
taken
0.78
Activations Density 0.085%