INDEX
Explanations
words related to emotional states and attitudes, particularly those that convey unexpectedness or contrasts
New Auto-Interp
Negative Logits
raq
-0.17
prise
-0.15
izzo
-0.14
enet
-0.14
sufficiently
-0.14
ascar
-0.14
oq
-0.14
ноÑģÑĤÑĮÑİ
-0.14
tuk
-0.14
vailability
-0.13
POSITIVE LOGITS
sounding
0.48
-looking
0.42
looking
0.33
looking
0.32
-fe
0.32
seeming
0.30
-se
0.29
istic
0.27
ish
0.27
ly
0.27
Activations Density 0.170%