INDEX
Explanations
references to a character named Sam in various contexts
New Auto-Interp
Negative Logits
geois
-0.17
unner
-0.16
eking
-0.16
eken
-0.15
#__
-0.15
emer
-0.15
eyn
-0.14
evin
-0.14
ebi
-0.14
esion
-0.14
POSITIVE LOGITS
uel
0.34
son
0.33
uels
0.32
urai
0.30
plings
0.27
plers
0.27
uele
0.26
UEL
0.25
SON
0.23
oa
0.23
Activations Density 0.016%