INDEX
Explanations
words related to negative emotions or sentiments like boredom, resentment, and distrust
words associated with boredom or lack of engagement
New Auto-Interp
Negative Logits
Ĥİ
-0.81
esville
-0.80
utable
-0.77
utes
-0.70
zilla
-0.66
ortium
-0.65
nect
-0.65
asel
-0.65
esar
-0.65
uting
-0.63
POSITIVE LOGITS
MpServer
0.87
skirts
0.80
tsky
0.74
igger
0.73
dit
0.71
phia
0.69
hole
0.69
APH
0.67
oxin
0.64
rums
0.64
Activations Density 0.051%