INDEX
Explanations
names and mentions of political figures, particularly those associated with significant events or discussions
New Auto-Interp
Negative Logits
onth
-0.18
ãĤ¥
-0.15
inspace
-0.15
ters
-0.14
byt
-0.14
landers
-0.14
ality
-0.14
anela
-0.14
cut
-0.13
çľł
-0.13
POSITIVE LOGITS
uzzer
0.17
åºľ
0.16
Та
0.15
.lift
0.14
Ø
0.14
;/
0.14
dob
0.13
Wid
0.13
ouncer
0.13
igua
0.13
Activations Density 0.003%