INDEX
Explanations
comparative phrases that highlight differences between entities or concepts
New Auto-Interp
Head Attr Weights
0:0.02
1:0.10
2:0.06
3:0.02
4:0.01
5:0.03
6:0.07
7:0.07
8:0.15
9:0.27
10:0.07
11:0.07
Negative Logits
shoots
-1.16
lies
-1.15
basis
-1.14
}}
-1.13
Bruins
-1.13
trainer
-1.10
uncertainties
-1.09
foundation
-1.06
spoke
-1.04
Italian
-1.04
POSITIVE LOGITS
perty
1.76
SPONSORED
1.57
outer
1.41
emis
1.39
ubis
1.32
jriwal
1.30
icket
1.27
ottest
1.26
indal
1.26
oute
1.24
Activations Density 0.066%