INDEX
Explanations
phrases related to coercion or manipulation
phrases that describe manipulation or coercion
New Auto-Interp
Negative Logits
thora
-0.68
tm
-0.67
hetically
-0.64
day
-0.62
cu
-0.62
rike
-0.62
fred
-0.61
entimes
-0.60
ener
-0.60
idates
-0.59
POSITIVE LOGITS
believing
1.27
submission
1.23
agreeing
1.08
accepting
1.07
buying
1.00
submitting
0.99
adopting
0.98
thinking
0.97
abandoning
0.96
committing
0.96
Activations Density 0.077%