INDEX
Explanations
mentions of ponies
references to the "My Little Pony" franchise and its characters
New Auto-Interp
Negative Logits
yer
-0.79
enance
-0.69
WINDOWS
-0.68
Las
-0.64
Judgment
-0.64
HER
-0.63
INST
-0.62
scape
-0.62
ENA
-0.61
Jul
-0.60
POSITIVE LOGITS
pony
1.08
ponies
1.02
Pony
0.96
Sparkle
0.93
tail
0.85
atron
0.81
bley
0.80
tremend
0.78
hyde
0.77
suscept
0.75
Activations Density 0.015%