INDEX
Explanations
phrases related to control and dominance
New Auto-Interp
Negative Logits
ucky
-0.15
thần
-0.15
wick
-0.15
sey
-0.15
sid
-0.15
Fav
-0.14
uels
-0.14
HDR
-0.14
hud
-0.14
кав
-0.14
POSITIVE LOGITS
control
0.20
/control
0.20
(control
0.17
control
0.17
minds
0.17
egra
0.16
proceedings
0.16
destiny
0.16
ehir
0.16
destin
0.15
Activations Density 0.071%