INDEX
Explanations
The neuron activates on mentions of close personal relations—specifically “friends” and “family.”
New Auto-Interp
Negative Logits
Investigators
-0.06
розпов
-0.06
';';
-0.06
економ
-0.06
istinguished
-0.06
degree
-0.06
ゆ
-0.06
تق
-0.06
aturally
-0.06
appropriately
-0.06
POSITIVE LOGITS
нужно
0.07
harga
0.07
essenger
0.06
return
0.06
она
0.06
aku
0.06
�재
0.06
_DOWN
0.06
ona
0.06
el
0.06
Activations Density 0.006%