INDEX
Explanations
possibility
The neuron fires on numeric measurement tokens (e.g. values with units or technical specs).
New Auto-Interp
Negative Logits
aku
-0.07
easy
-0.07
rack
-0.06
grace
-0.06
runner
-0.06
arts
-0.06
book
-0.06
says
-0.06
Thanks
-0.06
-week
-0.06
POSITIVE LOGITS
potentially
0.09
anyl
0.08
possibly
0.08
�
0.07
hypothetical
0.07
entlich
0.07
baskı
0.07
ほ
0.07
اص
0.07
(($
0.07
Activations Density 0.012%