INDEX
Explanations
words related to emotions and feelings expressed in literary contexts
New Auto-Interp
Negative Logits
u
-0.44
at
-0.35
ar
-0.33
w
-0.33
on
-0.32
f
-0.29
ÙĪ
-0.29
k
-0.28
l
-0.28
id
-0.28
POSITIVE LOGITS
eel
0.21
æĺ¯æĪij
0.18
edor
0.16
ity
0.16
otherwise
0.16
eriod
0.15
neau
0.15
erin
0.15
oise
0.15
OTHERWISE
0.15
Activations Density 0.549%