INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
neighb
-0.72
bowl
-0.69
thur
-0.69
estones
-0.67
confir
-0.65
flask
-0.65
feather
-0.65
iah
-0.64
helicop
-0.64
skelet
-0.63
POSITIVE LOGITS
ļéĨĴ
0.86
osterone
0.85
oute
0.84
ption
0.78
Constructed
0.77
othal
0.67
ptive
0.67
MAD
0.66
Madness
0.66
ulator
0.65
Activations Density 0.000%
No Known Activations
This feature has no known activations.