INDEX
Explanations
instances of passive voice construction or mentions of admiration
New Auto-Interp
Negative Logits
apg
-0.17
REAM
-0.16
incinn
-0.16
orman
-0.16
reat
-0.14
reater
-0.14
ohon
-0.14
odom
-0.14
iane
-0.14
amarin
-0.14
POSITIVE LOGITS
kaz
0.16
kö
0.15
нка
0.14
ers
0.14
Blank
0.14
bio
0.14
laden
0.14
uku
0.14
iar
0.14
iy
0.13
Activations Density 0.051%