INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
agonist
-0.89
dullah
-0.73
jriwal
-0.70
amous
-0.67
taker
-0.65
ayson
-0.64
gat
-0.63
gger
-0.63
din
-0.63
culosis
-0.62
POSITIVE LOGITS
Decoder
0.68
merce
0.65
OND
0.64
FANTASY
0.64
MAT
0.61
COURT
0.61
Represent
0.61
verages
0.61
FORM
0.61
Designs
0.61
Activations Density 0.000%
No Known Activations
This feature has no known activations.