INDEX
Explanations
mentions of political ideologies and leaders
New Auto-Interp
Negative Logits
eret
-0.67
hett
-0.56
hiba
-0.56
Downloadha
-0.55
afety
-0.55
ecause
-0.55
jri
-0.54
atche
-0.54
ogs
-0.54
accompan
-0.54
POSITIVE LOGITS
Centauri
0.68
autical
0.66
omaly
0.65
wered
0.62
Ü
0.59
axis
0.57
Catalyst
0.57
âĶĢâĶĢâĶĢâĶĢ
0.57
enment
0.53
ALLY
0.52
Activations Density 6.150%