INDEX
Explanations
statements about the importance of moral and physical character
New Auto-Interp
Negative Logits
famously
-0.17
duk
-0.15
uci
-0.15
uur
-0.15
favor
-0.15
IColor
-0.14
abay
-0.14
à¤ľà¤¯
-0.14
basically
-0.14
icari
-0.14
POSITIVE LOGITS
addict
0.18
intr
0.18
fancy
0.18
essay
0.17
shrink
0.17
consent
0.16
sacr
0.15
hourly
0.15
docs
0.15
sha
0.15
Activations Density 0.289%