INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ascus
-0.77
anamo
-0.74
irez
-0.72
aston
-0.70
etus
-0.70
bane
-0.69
ars
-0.65
abwe
-0.65
amaz
-0.63
atin
-0.63
POSITIVE LOGITS
DERR
0.67
gamer
0.65
hyde
0.63
llah
0.62
hiro
0.61
lapse
0.60
RAFT
0.58
pree
0.58
memory
0.58
Gupta
0.57
Activations Density 0.000%
No Known Activations
This feature has no known activations.