INDEX
Explanations
expressions related to criticism and political discourse
New Auto-Interp
Negative Logits
etch
-0.16
bie
-0.16
знаÑĩ
-0.15
ẹn
-0.15
indi
-0.15
punt
-0.14
ussy
-0.14
293
-0.14
rolley
-0.14
mand
-0.13
POSITIVE LOGITS
nexus
0.18
discrim
0.16
unc
0.16
æŃ
0.15
ftime
0.15
.hh
0.15
_UL
0.15
isko
0.14
embr
0.14
VIP
0.14
Activations Density 0.412%