INDEX
Explanations
phrases and contexts related to surprise and unexpected events
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.07
3:0.05
4:0.16
5:0.03
6:0.05
7:0.35
8:0.03
9:0.04
10:0.06
11:0.07
Negative Logits
agall
-1.66
inventoryQuantity
-1.55
spir
-1.49
oneliness
-1.45
rompt
-1.42
sidx
-1.41
onomous
-1.40
VOL
-1.39
taboola
-1.37
acebook
-1.35
POSITIVE LOGITS
buffs
1.40
Lumin
1.36
computation
1.35
guests
1.34
Beng
1.30
Gö
1.30
HH
1.29
forthcoming
1.28
Rocks
1.27
Benson
1.22
Activations Density 0.002%