INDEX
Explanations
references to Hitler, specifically in a comparative context
New Auto-Interp
Head Attr Weights
0:0.05
1:0.05
2:0.16
3:0.08
4:0.12
5:0.06
6:0.05
7:0.05
8:0.11
9:0.07
10:0.08
11:0.08
Negative Logits
jri
-1.64
Offensive
-1.63
Conspiracy
-1.52
POLIT
-1.47
rocked
-1.45
Alam
-1.40
lis
-1.39
Consortium
-1.37
Pix
-1.36
Solid
-1.35
POSITIVE LOGITS
vernight
1.79
places
1.68
Introduced
1.64
prints
1.57
elsh
1.50
bars
1.49
rely
1.46
meter
1.45
gencies
1.41
valued
1.41
Activations Density 0.000%