INDEX
Explanations
references to political parties and their activities
New Auto-Interp
Negative Logits
worsh
-0.15
quet
-0.14
Client
-0.14
ä¸ģ
-0.14
δÏĮ
-0.14
ovic
-0.14
æ²
-0.13
acher
-0.13
Batch
-0.13
urch
-0.13
POSITIVE LOGITS
party
0.45
Party
0.41
party
0.39
Party
0.37
PARTY
0.36
_party
0.33
.party
0.32
åħļ
0.31
-party
0.29
黨
0.27
Activations Density 0.002%