INDEX
Explanations
sentences related to political and social issues
phrases that express resistance or refusal to cooperate with authority
New Auto-Interp
Negative Logits
sequently
-0.69
oche
-0.67
inguished
-0.65
nown
-0.63
respectively
-0.63
including
-0.62
aran
-0.59
staking
-0.59
çͰ
-0.59
yssey
-0.58
POSITIVE LOGITS
crappy
0.97
shitty
0.92
crap
0.91
stuff
0.84
somebody
0.78
shit
0.74
dudes
0.74
oneself
0.73
passively
0.72
mediocre
0.72
Activations Density 1.451%