INDEX
Explanations
references to North Korea
New Auto-Interp
Head Attr Weights
0:0.08
1:0.08
2:0.08
3:0.09
4:0.09
5:0.08
6:0.08
7:0.08
8:0.07
9:0.08
10:0.08
11:0.06
Negative Logits
dwarves
-3.17
["
-2.92
performers
-2.80
contestants
-2.79
entrants
-2.76
casters
-2.71
strikers
-2.65
comparable
-2.65
Ghostbusters
-2.65
portrayal
-2.58
POSITIVE LOGITS
iece
3.55
apeake
3.16
grain
3.03
dow
2.94
rils
2.94
enges
2.92
hum
2.87
ignt
2.81
ruce
2.79
dn
2.78
Activations Density 0.000%