INDEX
Explanations
references to nationalism and related ideologies
New Auto-Interp
Negative Logits
Oracle
-0.68
OME
-0.65
neau
-0.65
EVA
-0.64
ĸļ
-0.63
lift
-0.62
nesota
-0.61
Thumbnails
-0.61
err
-0.60
Tree
-0.60
POSITIVE LOGITS
sentiments
0.74
ferv
0.70
pride
0.69
sentiment
0.63
nationalist
0.63
rhetoric
0.61
popul
0.61
nationalism
0.61
vis
0.59
zeal
0.59
Activations Density 8.502%