INDEX
Explanations
phrases related to power dynamics, control, and dominance
expressions of power dynamics and control
New Auto-Interp
Negative Logits
ÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤ
-0.80
uum
-0.80
Ô
-0.80
aunder
-0.78
UE
-0.78
ãĥ¼ãĥĨãĤ£
-0.75
ô
-0.74
ÃĥÃĤ
-0.73
illary
-0.72
////////////////
-0.72
POSITIVE LOGITS
lord
1.07
hang
1.04
loading
0.99
drive
0.98
tones
0.90
lander
0.88
lords
0.86
stay
0.83
sea
0.83
rule
0.82
Activations Density 0.050%