INDEX
Explanations
phrases expressing skepticism or criticism of societal norms and practices
New Auto-Interp
Negative Logits
ochen
-0.15
Succ
-0.15
zek
-0.14
styl
-0.14
zon
-0.14
ctors
-0.14
å°¾
-0.14
обÑĢаÐ
-0.14
.nlm
-0.13
itra
-0.13
POSITIVE LOGITS
mere
0.17
alone
0.15
mere
0.15
èĢĮ
0.14
Hol
0.14
ingt
0.14
KF
0.14
nor
0.14
umont
0.14
anni
0.14
Activations Density 0.261%