INDEX
Explanations
instances of intimidation and manipulation in various contexts
New Auto-Interp
Head Attr Weights
0:0.03
1:0.02
2:0.05
3:0.06
4:0.13
5:0.03
6:0.04
7:0.36
8:0.03
9:0.03
10:0.09
11:0.09
Negative Logits
variable
-1.45
rh
-1.35
Somewhere
-1.31
fixed
-1.31
salvage
-1.28
album
-1.27
amy
-1.27
Imaging
-1.27
ゴン
-1.23
miracle
-1.21
POSITIVE LOGITS
opponents
1.72
foes
1.70
passers
1.65
subordinates
1.64
intimidated
1.58
challengers
1.57
adversaries
1.54
superiors
1.53
bullies
1.48
merciless
1.43
Activations Density 0.003%