INDEX
Explanations
This neuron activates specifically on the word “latter,” i.e. instances of “latter” used to refer back to a preceding item.
New Auto-Interp
Negative Logits
_manage
-0.08
GP
-0.07
.Do
-0.07
analyzes
-0.07
ูนย
-0.07
tgl
-0.07
UGE
-0.07
Ana
-0.07
bags
-0.07
لیگ
-0.07
POSITIVE LOGITS
latter
0.13
atter
0.07
بوده
0.07
typo
0.07
Τ
0.07
Reaper
0.07
der
0.06
Latter
0.06
GLUT
0.06
згод
0.06
Activations Density 0.006%