INDEX
Explanations
the main thing this neuron does is find past-tense verbs.
New Auto-Interp
Negative Logits
ával
-0.07
Monitoring
-0.06
números
-0.06
قرارد
-0.06
Credits
-0.06
Makes
-0.06
необходим
-0.06
méně
-0.06
Jane
-0.06
تمر
-0.06
POSITIVE LOGITS
reject
0.06
Flexible
0.06
)↵↵
0.06
силу
0.06
_compress
0.06
#↵↵
0.06
scri
0.06
DD
0.06
�
0.06
disagree
0.06
Activations Density 0.044%