INDEX
Explanations
phrases indicating temporal events and specific occurrences
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.10
3:0.06
4:0.14
5:0.02
6:0.04
7:0.36
8:0.03
9:0.03
10:0.07
11:0.06
Negative Logits
administ
-1.71
ealous
-1.64
lled
-1.56
lawy
-1.56
staking
-1.54
persuaded
-1.52
oath
-1.48
abst
-1.48
administ
-1.48
topic
-1.46
POSITIVE LOGITS
Thumbnails
1.75
Vision
1.60
opia
1.58
Ther
1.57
Refugees
1.55
Exhibit
1.52
Crate
1.52
Glass
1.44
Pulse
1.42
Rot
1.41
Activations Density 0.016%