INDEX
Explanations
verbs or phrases related to control, influence, or power
actions and phrases associated with negative consequences or situations
New Auto-Interp
Negative Logits
ortment
-0.70
oÄŁ
-0.67
âĹ¼
-0.67
ciplinary
-0.66
last
-0.61
alian
-0.61
tu
-0.60
utive
-0.59
eele
-0.58
Approximately
-0.57
POSITIVE LOGITS
nor
1.57
anymore
1.48
anybody
1.46
anything
1.34
anyone
1.32
any
1.03
ANY
0.92
slightest
0.91
whatsoever
0.88
anywhere
0.88
Activations Density 0.277%