INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
orno
-0.74
elsen
-0.72
artney
-0.71
istries
-0.71
ousand
-0.69
irez
-0.65
cleanup
-0.64
volunteer
-0.64
amnesty
-0.63
asions
-0.61
POSITIVE LOGITS
scar
0.79
-+-+
0.73
bid
0.69
Sph
0.68
morrow
0.66
scrib
0.65
Oswald
0.65
vest
0.65
condem
0.65
Fraz
0.64
Activations Density 0.000%
No Known Activations
This feature has no known activations.