INDEX
Explanations
references to religious themes and censorship issues
New Auto-Interp
Head Attr Weights
0:0.23
1:0.06
2:0.04
3:0.09
4:0.08
5:0.02
6:0.10
7:0.11
8:0.03
9:0.04
10:0.09
11:0.06
Negative Logits
Bowser
-3.76
ovych
-3.57
Snake
-3.41
Sounders
-3.39
Seattle
-3.30
Snake
-3.23
Mario
-3.22
streetcar
-3.20
Leafs
-3.02
Mario
-2.99
POSITIVE LOGITS
blasphemy
8.14
blasp
6.79
hemy
5.04
clerics
4.44
lyn
4.41
Pakistan
4.28
Koran
4.23
Quran
4.09
Muslims
4.06
Prophet
4.06
Activations Density 0.002%