INDEX
Explanations
verbs related to causing or forcing actions, often with negative consequences
New Auto-Interp
Negative Logits
yd
-0.70
chool
-0.64
Birthday
-0.62
ta
-0.59
submitted
-0.59
Basics
-0.59
at
-0.58
styled
-0.58
bid
-0.58
supplied
-0.57
POSITIVE LOGITS
havoc
1.05
morale
0.89
untold
0.88
confusion
0.84
us
0.80
resentment
0.79
headaches
0.76
viewers
0.75
friction
0.74
irre
0.74
Activations Density 5.308%