INDEX
Explanations
mentions of user interface components
New Auto-Interp
Head Attr Weights
0:0.07
1:0.07
2:0.08
3:0.09
4:0.08
5:0.08
6:0.09
7:0.08
8:0.08
9:0.09
10:0.06
11:0.08
Negative Logits
unpop
-2.87
Samoa
-2.75
moratorium
-2.62
Ã
-2.58
––
-2.58
Angola
-2.53
governors
-2.47
shortages
-2.43
……………………
-2.43
�
-2.42
POSITIVE LOGITS
Towards
2.62
hyde
2.59
DonaldTrump
2.59
oward
2.58
afort
2.58
hedon
2.58
Hate
2.49
Dangerous
2.49
Dock
2.46
Digest
2.44
Activations Density 0.000%