INDEX
Explanations
words related to historical events, political discussions, and policy activities, with a focus on specific details and narratives
phrases indicating contrast or exceptions in contexts
New Auto-Interp
Head Attr Weights
0:0.23
1:0.03
2:0.06
3:0.13
4:0.03
5:0.10
6:0.07
7:0.02
8:0.06
9:0.11
10:0.08
11:0.02
Negative Logits
Blend
-1.14
Yep
-1.12
dear
-1.08
Vampire
-1.08
Pick
-1.04
Wiz
-1.03
Yep
-1.02
Mats
-1.02
guard
-1.00
Picks
-1.00
POSITIVE LOGITS
xual
1.38
agara
1.34
olson
1.33
glim
1.28
ihara
1.25
acknowled
1.24
anecd
1.24
glers
1.17
lihood
1.12
ulton
1.12
Activations Density 0.117%