INDEX
Explanations
roman numerals
words related to literary works or novels
New Auto-Interp
Negative Logits
̶
-0.72
gypt
-0.70
Methodist
-0.65
Lever
-0.64
Wi
-0.63
Zombie
-0.61
visor
-0.61
Lund
-0.60
Camb
-0.60
Hyder
-0.60
POSITIVE LOGITS
urnal
0.90
onyms
0.79
theless
0.76
xious
0.74
TPPStreamerBot
0.72
agon
0.68
whatsoever
0.67
Else
0.67
ODY
0.65
seekers
0.64
Activations Density 0.044%