INDEX
Explanations
phrases that indicate actions taken by populations or groups related to expressing dissatisfaction or taking control
New Auto-Interp
Negative Logits
^(@)
-0.73
ſelves
-0.72
Reſ
-0.65
$_"
-0.64
IBLIO
-0.64
itſelf
-0.64
leſs
-0.62
Majefty
-0.60
ſelf
-0.60
ENEFITS
-0.60
POSITIVE LOGITS
taking
0.99
Taking
0.98
Taking
0.95
taken
0.95
TAKEN
0.86
take
0.84
Take
0.84
taken
0.82
takes
0.81
taking
0.80
Activations Density 0.224%