INDEX
Explanations
Uncommon words
This neuron fires on descriptive appearance modifiers—words that characterize surface quality or visual style (e.g. glimmering, film-like, handsome).
New Auto-Interp
Negative Logits
chewing
-0.07
kans
-0.07
900
-0.06
personas
-0.06
untary
-0.06
iou
-0.06
mak
-0.06
hog
-0.06
ucci
-0.06
ang
-0.06
POSITIVE LOGITS
`='$
0.08
getUsername
0.07
"?>↵
0.06
trieve
0.06
_Request
0.06
MEP
0.06
SOEVER
0.06
.Flat
0.06
مذ
0.06
irresist
0.06
Activations Density 0.588%