INDEX
Explanations
This neuron activates on the word “terms,” especially when it appears in the phrase “in terms of.”
New Auto-Interp
Negative Logits
Face
-0.07
NumberOf
-0.07
deaths
-0.06
/T
-0.06
"L
-0.06
(round
-0.06
Interaction
-0.06
"S
-0.06
_WORD
-0.06
Meat
-0.06
POSITIVE LOGITS
cede
0.07
acock
0.07
elaide
0.07
пері
0.07
plá
0.06
slog
0.06
ecess
0.06
(ident
0.06
الكه
0.06
スレ
0.06
Activations Density 0.030%