INDEX
Explanations
expressions related to sociopolitical opinions and discourse
Contradictory or disagreeable statements
I do not want
New Auto-Interp
Negative Logits
autorytatywna
-0.77
հղումներ
-0.70
rungsseite
-0.67
rrggbb
-0.66
Picchu
-0.65
Попис
-0.65
Autoritní
-0.65
bcryptjs
-0.64
нгред
-0.62
propOrder
-0.62
POSITIVE LOGITS
I
0.38
0.35
blind
0.35
who
0.31
blinded
0.30
racist
0.29
this
0.29
his
0.28
comment
0.28
em
0.28
Activations Density 0.510%