INDEX
Explanations
expressions related to societal and political issues
New Auto-Interp
Negative Logits
ikel
-0.17
amacare
-0.17
usters
-0.16
ãĥ¼ãĥģ
-0.15
erson
-0.15
137
-0.15
afone
-0.14
personally
-0.14
strup
-0.14
hazi
-0.14
POSITIVE LOGITS
-wide
0.24
itself
0.22
wide
0.20
collectively
0.19
wide
0.17
Brace
0.16
collective
0.16
기ê´Ģ
0.16
braces
0.16
consensus
0.16
Activations Density 0.146%