INDEX
Explanations
adjectives used in hierarchical or confrontational settings
references to political and ideological affiliations
New Auto-Interp
Negative Logits
vana
-0.83
ndra
-0.78
akeru
-0.75
nyder
-0.70
*/(
-0.70
aceae
-0.68
inki
-0.68
agascar
-0.67
gger
-0.64
ocating
-0.63
POSITIVE LOGITS
INAL
0.79
-+-+
0.78
Shogun
0.71
Ĩ
0.68
rontal
0.67
Kod
0.65
Ö¼
0.64
Wad
0.64
Bottom
0.64
Rect
0.63
Activations Density 0.376%