INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
slab
-0.63
LM
-0.62
Īè
-0.62
cp
-0.61
premise
-0.60
thw
-0.59
baker
-0.59
outing
-0.58
chim
-0.58
Penguin
-0.58
POSITIVE LOGITS
cone
0.77
ait
0.75
aments
0.75
isco
0.73
upe
0.73
icter
0.72
ivities
0.71
Ur
0.70
lie
0.70
eret
0.69
Activations Density 0.000%
No Known Activations
This feature has no known activations.