INDEX
Explanations
expressions of disappointment or lack of engagement with media
New Auto-Interp
Negative Logits
ãĥ³ãĤº
-0.15
šli
-0.13
iteur
-0.13
foy
-0.13
hev
-0.13
aira
-0.13
zimmer
-0.13
sworth
-0.13
zie
-0.13
ฺ
-0.13
POSITIVE LOGITS
åł
0.14
711
0.14
aven
0.13
Edwin
0.13
cient
0.13
леÑĢ
0.13
fol
0.13
ffen
0.13
diÄŁi
0.13
omor
0.12
Activations Density 0.305%