INDEX
Explanations
instances where control or empowerment is emphasized in various contexts
New Auto-Interp
Negative Logits
uario
-0.17
iasi
-0.16
ings
-0.15
ë¡ľëĵľ
-0.14
代
-0.14
ãĥ¬ãĥĥãĥĪ
-0.14
addCriterion
-0.14
_PRIORITY
-0.14
Shock
-0.13
ollapse
-0.13
POSITIVE LOGITS
/control
0.26
control
0.24
(control
0.22
destiny
0.21
.Control
0.20
-control
0.20
CONTROL
0.20
Control
0.20
control
0.19
controlled
0.19
Activations Density 0.059%