INDEX
Explanations
other remaining
The neuron reliably fires on the word “other,” flagging mentions of “other” categories or items.
New Auto-Interp
Negative Logits
Hop
-0.07
volunteer
-0.07
ising
-0.07
Poison
-0.06
Woman
-0.06
[level
-0.06
USART
-0.06
safety
-0.06
Governor
-0.06
recipe
-0.06
POSITIVE LOGITS
zboží
0.07
exas
0.07
с
0.06
’é
0.06
дина
0.06
bilinen
0.06
contours
0.06
ngr
0.06
شرقی
0.06
برنامه
0.06
Activations Density 0.017%