INDEX
Explanations
terms related to criticism or negative sentiment towards a particular situation or action
New Auto-Interp
Negative Logits
eers
-0.77
Corpus
-0.73
eer
-0.69
ISM
-0.67
SHIP
-0.67
yip
-0.66
Nun
-0.65
eering
-0.65
Pose
-0.64
law
-0.63
POSITIVE LOGITS
agons
1.17
inking
1.16
ags
1.12
unk
1.11
unks
1.10
udge
1.10
ifts
1.10
inks
1.09
iller
1.07
aining
1.05
Activations Density 0.072%