INDEX
Explanations
mentions of the color purple
references to the color purple
New Auto-Interp
Negative Logits
anooga
-0.88
iary
-0.87
rina
-0.83
inez
-0.83
elong
-0.83
tern
-0.82
rium
-0.78
eller
-0.78
emouth
-0.77
icles
-0.77
POSITIVE LOGITS
veyard
0.74
cles
0.72
orescence
0.69
Afgh
0.69
ppelin
0.69
ACTED
0.68
itialized
0.68
prime
0.65
SHIP
0.64
naire
0.64
Activations Density 0.050%