INDEX
Explanations
words related to classification or categorization
words related to declarations or classifications of political entities
New Auto-Interp
Negative Logits
κ
-0.79
awaru
-0.75
è£ı
-0.69
tremend
-0.69
Democr
-0.67
ALD
-0.65
BIP
-0.65
BALL
-0.63
sell
-0.63
DEM
-0.62
POSITIVE LOGITS
osures
1.10
osing
1.03
ength
0.99
avier
0.98
oser
0.97
ipper
0.96
uster
0.92
othes
0.92
ojure
0.89
iff
0.89
Activations Density 0.021%