INDEX
Explanations
words related to negative actions or emotions
complex themes related to human emotions and societal issues
New Auto-Interp
Negative Logits
arnaev
-0.72
eatures
-0.68
bag
-0.63
version
-0.61
Achievements
-0.57
uscript
-0.56
SAY
-0.55
Tracks
-0.55
Cases
-0.54
LOT
-0.54
POSITIVE LOGITS
lessness
1.20
fulness
1.07
iness
0.94
liness
0.92
thood
0.91
ulence
0.85
smanship
0.83
ism
0.80
ality
0.79
ishment
0.79
Activations Density 0.412%