INDEX
Explanations
complex interactions between people and societal issues
New Auto-Interp
Negative Logits
奪
-0.16
ά
-0.15
pios
-0.15
itorio
-0.15
Kill
-0.14
ubu
-0.14
seedu
-0.14
yla
-0.14
itori
-0.14
860
-0.14
POSITIVE LOGITS
被
0.42
éģŃ
0.37
被
0.36
being
0.35
åıĹåΰ
0.33
bá»ĭ
0.32
åıĹ
0.31
zosta
0.30
being
0.29
receive
0.28
Activations Density 0.277%