INDEX
Explanations
references to specific groups or individuals related to societal issues
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.08
3:0.11
4:0.41
5:0.03
6:0.05
7:0.05
8:0.03
9:0.04
10:0.06
11:0.04
Negative Logits
rout
-1.84
ゴン
-1.57
Cheong
-1.52
insign
-1.44
ted
-1.42
consolation
-1.41
obin
-1.38
*.
-1.37
fixme
-1.34
Abyssal
-1.34
POSITIVE LOGITS
ographers
1.87
esters
1.87
ammers
1.75
ographer
1.72
writers
1.68
ists
1.67
rafted
1.66
achers
1.65
nai
1.61
elected
1.61
Activations Density 0.020%