INDEX
Explanations
words related to public response and social issues
New Auto-Interp
Negative Logits
uncio
-0.15
Interrupt
-0.14
ingleton
-0.13
omor
-0.13
yen
-0.13
numbered
-0.13
))*(
-0.13
unkt
-0.13
λι
-0.13
Æ¡
-0.12
POSITIVE LOGITS
push
0.31
fur
0.28
resistance
0.26
push
0.25
hue
0.24
opp
0.24
vit
0.24
criticism
0.23
blow
0.23
calls
0.22
Activations Density 0.142%