INDEX
Explanations
words related to power and control
phrases related to control and power dynamics
New Auto-Interp
Negative Logits
aunder
-0.81
uum
-0.79
Finally
-0.77
illary
-0.77
ÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤ
-0.77
teasp
-0.75
oshenko
-0.71
Lastly
-0.70
ydia
-0.70
////////////////
-0.70
POSITIVE LOGITS
drive
1.01
loading
0.96
lord
0.93
hang
0.87
rule
0.82
reaching
0.82
haul
0.81
tones
0.80
stay
0.78
lander
0.78
Activations Density 0.056%