INDEX
Explanations
words associated with value judgments and assessments
New Auto-Interp
Negative Logits
º
-0.15
stein
-0.15
imat
-0.15
untas
-0.15
.selenium
-0.15
quet
-0.15
éĨ
-0.15
çģŃ
-0.15
iment
-0.15
wart
-0.14
POSITIVE LOGITS
495
0.17
oucher
0.16
ongoose
0.15
ango
0.15
rello
0.15
ENTA
0.14
_ARCH
0.14
PNG
0.14
libertine
0.14
ollywood
0.14
Activations Density 0.001%