INDEX
Explanations
proper nouns related to famous personalities, places, or entities
occurrences of the substring "ze" in various contexts
New Auto-Interp
Negative Logits
glim
-0.83
ials
-0.77
ially
-0.74
paced
-0.70
ainment
-0.69
istrate
-0.67
fman
-0.67
ancial
-0.66
cutting
-0.66
selling
-0.64
POSITIVE LOGITS
lda
1.32
ppelin
1.26
zinski
0.99
ppe
0.92
zza
0.90
itsch
0.88
uble
0.87
ppy
0.87
hov
0.86
eker
0.86
Activations Density 0.017%