INDEX
Explanations
phrases indicating action or change in context
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.08
3:0.08
4:0.21
5:0.03
6:0.03
7:0.26
8:0.03
9:0.05
10:0.09
11:0.05
Negative Logits
BuyableInstoreAndOnline
-1.45
quartered
-1.44
currently
-1.44
disabled
-1.43
aligned
-1.43
eligible
-1.40
neutral
-1.40
placed
-1.39
irable
-1.39
icted
-1.39
POSITIVE LOGITS
herd
1.42
reins
1.33
wedge
1.31
youthful
1.27
defenses
1.25
cynicism
1.21
sarc
1.19
partisans
1.19
expectations
1.18
defences
1.18
Activations Density 0.163%