INDEX
Explanations
words related to criticism or negative judgment
terms that convey negative judgments or criticisms
New Auto-Interp
Negative Logits
vals
-0.71
chance
-0.68
quart
-0.65
semble
-0.65
rollers
-0.63
interrupted
-0.62
eon
-0.61
ngth
-0.61
Cups
-0.60
asio
-0.60
POSITIVE LOGITS
underest
0.88
enough
0.85
underestimate
0.79
overest
0.74
hypocr
0.73
because
0.72
uably
0.71
grounds
0.71
hypocrisy
0.70
exagger
0.70
Activations Density 0.165%