INDEX
Explanations
phrases related to manipulation or coercion
phrases indicating coercion or manipulation
New Auto-Interp
Negative Logits
thora
-0.83
DragonMagazine
-0.74
geist
-0.72
netflix
-0.68
tm
-0.67
mr
-0.66
ILA
-0.64
enery
-0.63
engers
-0.63
die
-0.63
POSITIVE LOGITS
submission
1.29
believing
1.23
accepting
1.10
agreeing
1.06
becoming
1.03
committing
1.02
behaving
1.01
joining
0.97
adopting
0.97
signing
0.96
Activations Density 0.058%