INDEX
Explanations
references to political achievements and their perceived significance
New Auto-Interp
Head Attr Weights
0:0.03
1:0.01
2:0.11
3:0.27
4:0.11
5:0.04
6:0.05
7:0.06
8:0.06
9:0.05
10:0.08
11:0.09
Negative Logits
????????
-1.69
indo
-1.66
)!
-1.65
)?
-1.64
YOUR
-1.63
;)
-1.63
eh
-1.60
luaj
-1.56
goddamn
-1.54
王
-1.48
POSITIVE LOGITS
outright
1.36
ailable
1.35
excluding
1.35
occupancy
1.33
overt
1.31
atories
1.29
verages
1.28
excluding
1.28
auna
1.28
rouch
1.28
Activations Density 0.036%