INDEX
Explanations
the word "Perry" with varying activations
mentions of the name "Perry."
New Auto-Interp
Negative Logits
Carbuncle
-0.75
udic
-0.74
76561
-0.69
ravings
-0.67
azeera
-0.67
hent
-0.67
################
-0.65
atorial
-0.64
########
-0.62
Malays
-0.62
POSITIVE LOGITS
sburg
0.93
man
0.83
ball
0.81
ns
0.80
shire
0.75
burgh
0.74
mons
0.73
bard
0.73
balls
0.70
utical
0.70
Activations Density 0.012%