INDEX
Explanations
words indicating creation or production
New Auto-Interp
Negative Logits
GIN
-0.17
puted
-0.16
же
-0.16
ugin
-0.15
inkle
-0.14
ayın
-0.14
wyn
-0.14
rome
-0.14
ateÅŁ
-0.14
rado
-0.14
POSITIVE LOGITS
it
0.20
ALLED
0.14
#ad
0.14
zew
0.14
sense
0.13
enberg
0.13
iert
0.13
riad
0.13
amura
0.13
ossa
0.13
Activations Density 0.067%