INDEX
Explanations
mentions of different political parties and organizations
mentions of entities or groups that are abbreviated as "AN"
New Auto-Interp
Negative Logits
lda
-0.79
ãĤ¡
-0.76
Cind
-0.76
ï¸ı
-0.72
odore
-0.68
Wink
-0.68
uably
-0.68
ments
-0.67
Kem
-0.64
guiIcon
-0.64
POSITIVE LOGITS
alyst
1.05
igans
1.01
OVA
0.97
ufact
0.95
NER
0.94
NING
0.93
thood
0.93
ARCH
0.91
pour
0.89
TRY
0.89
Activations Density 0.017%