INDEX
Explanations
adjectives describing extremity or intensity
New Auto-Interp
Negative Logits
os
-0.17
iolet
-0.17
ory
-0.17
aje
-0.15
ager
-0.15
044
-0.15
AGER
-0.15
uffle
-0.15
LEASE
-0.14
zon
-0.14
POSITIVE LOGITS
eyse
0.19
sworth
0.17
veis
0.16
isay
0.15
esis
0.15
duto
0.14
psz
0.14
cko
0.14
ghan
0.14
ieder
0.14
Activations Density 0.001%