INDEX
Explanations
phrases indicating actions or decisions in political contexts
New Auto-Interp
Head Attr Weights
0:0.07
1:0.03
2:0.05
3:0.04
4:0.05
5:0.04
6:0.28
7:0.04
8:0.04
9:0.25
10:0.03
11:0.03
Negative Logits
Tan
-4.35
Titanic
-4.21
ahime
-4.00
Tud
-3.92
anu
-3.78
Cind
-3.73
Tan
-3.63
Dia
-3.59
BuyableInstoreAndOnline
-3.56
Ashton
-3.53
POSITIVE LOGITS
Gro
10.83
Gro
10.17
gro
8.62
gro
6.51
Grove
5.76
grocer
4.58
GR
4.55
RO
4.50
GR
4.39
groove
4.38
Activations Density 0.005%