INDEX
Explanations
numerical values embedded within words
occurrences of the substring "od"
New Auto-Interp
Negative Logits
Ago
-0.74
riott
-0.66
Citadel
-0.64
silence
-0.64
PAL
-0.63
Quinn
-0.63
Scand
-0.62
Ń·
-0.61
Falcon
-0.61
Crest
-0.60
POSITIVE LOGITS
od
1.34
odon
1.21
yssey
1.13
iamond
1.10
amn
1.08
sworth
1.06
wig
1.05
opter
1.04
unn
0.99
ata
0.99
Activations Density 0.009%