INDEX
Explanations
words indicating an action or movement
New Auto-Interp
Head Attr Weights
0:0.08
1:0.07
2:0.07
3:0.06
4:0.09
5:0.08
6:0.08
7:0.10
8:0.08
9:0.07
10:0.07
11:0.08
Negative Logits
implementation
-2.39
seminars
-2.37
promotion
-2.32
rollout
-2.30
monet
-2.28
ministries
-2.27
Advertising
-2.24
referees
-2.19
claimants
-2.16
implement
-2.11
POSITIVE LOGITS
akable
2.94
ciating
2.63
acid
2.54
ascript
2.54
udder
2.46
ebted
2.44
eely
2.36
Brave
2.35
zx
2.33
Luckily
2.27
Activations Density 0.000%