INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
Orig
-0.68
Mah
-0.68
ownt
-0.68
ieth
-0.68
Moe
-0.67
resc
-0.64
ablishment
-0.64
Yus
-0.61
reneg
-0.60
olor
-0.59
POSITIVE LOGITS
aptic
0.93
desktop
0.71
shore
0.69
squ
0.68
given
0.67
fitting
0.67
ateurs
0.67
IED
0.65
batch
0.65
gotten
0.63
Activations Density 0.000%
No Known Activations
This feature has no known activations.