INDEX
Explanations
words that refer to specific letters in the alphabet, especially 'G' and variations of it
New Auto-Interp
Negative Logits
tk
-0.21
ен
-0.20
ett
-0.20
rid
-0.19
ui
-0.19
et
-0.17
ettle
-0.17
uh
-0.17
ame
-0.17
NU
-0.17
POSITIVE LOGITS
opher
0.22
rooms
0.18
rafted
0.18
localized
0.18
libc
0.18
nosis
0.18
ener
0.18
KD
0.17
azing
0.17
urus
0.17
Activations Density 0.162%