INDEX
Explanations
references to social and economic policies, especially those that involve wealth distribution and political constructs that influence public perception
New Auto-Interp
Negative Logits
.springboot
-0.16
.zh
-0.15
doz
-0.14
roz
-0.13
CumhurbaÅŁ
-0.13
kvinnor
-0.13
ossal
-0.13
idak
-0.13
urst
-0.13
ç´ł
-0.13
POSITIVE LOGITS
mas
0.27
via
0.25
via
0.23
through
0.22
aim
0.21
masked
0.21
by
0.21
abet
0.20
clo
0.20
aimed
0.20
Activations Density 0.358%