INDEX
Explanations
This neuron activates strongly on phrases that state a “most common” or similar superlative category description.
New Auto-Interp
Negative Logits
Papers
-0.07
painting
-0.07
160
-0.07
ables
-0.06
/Runtime
-0.06
�
-0.06
village
-0.06
Removal
-0.06
Sessions
-0.06
ledger
-0.06
POSITIVE LOGITS
Estimated
0.06
("{}0.06
ported
0.06
},{↵0.06
буд
0.06
зн
0.06
('$0.06
.ERR
0.05
ंपर
0.05
.ST
0.05
Activations Density 0.058%