INDEX
Explanations
phrases related to political figures and establishments
references to political affiliations and social structures within communities
New Auto-Interp
Negative Logits
acly
-0.67
ãĥ¼ãĥĨ
-0.66
ãĤ¶
-0.62
Bul
-0.62
Pg
-0.60
ãĤ¯
-0.59
ãĤ¦ãĤ¹
-0.59
Incre
-0.59
é¾įå
-0.59
ãĤ¤
-0.57
POSITIVE LOGITS
rejoice
0.99
recognise
0.88
unite
0.88
agrees
0.83
dared
0.78
reacted
0.77
accuse
0.76
teamed
0.76
raided
0.76
allege
0.75
Activations Density 0.467%