INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
undo
-0.84
ilo
-0.76
cum
-0.74
catentry
-0.70
fect
-0.69
estyles
-0.67
gypt
-0.67
chery
-0.66
cill
-0.66
ong
-0.65
POSITIVE LOGITS
ulic
0.71
Mechdragon
0.63
OPA
0.63
Attacks
0.62
theless
0.61
incumbent
0.61
Shogun
0.61
exerted
0.59
HB
0.59
Pats
0.59
Activations Density 0.000%
No Known Activations
This feature has no known activations.