INDEX
Explanations
proper nouns relating to famous people or specific scenarios
New Auto-Interp
Negative Logits
pim
-0.60
REDACTED
-0.59
notch
-0.55
Metatron
-0.54
reconc
-0.54
ACTED
-0.53
tumble
-0.53
Spoiler
-0.52
clown
-0.52
WB
-0.52
POSITIVE LOGITS
enei
0.80
ilian
0.78
chel
0.77
eli
0.77
ili
0.76
abyte
0.76
eto
0.76
esh
0.75
emon
0.74
iber
0.74
Activations Density 0.908%