INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
izoph
-0.75
flies
-0.73
pads
-0.67
livest
-0.67
colourful
-0.65
ophe
-0.65
butterflies
-0.63
artwork
-0.62
laure
-0.62
ellery
-0.62
POSITIVE LOGITS
yet
0.91
yet
0.84
Hack
0.76
Yet
0.75
ãĤ°
0.74
Critical
0.73
Xi
0.72
bound
0.71
MENTS
0.70
Shift
0.68
Activations Density 0.000%
No Known Activations
This feature has no known activations.