INDEX
Explanations
Data tables
The neuron specifically lights up on Russian section‐header words for pros and cons (e.g. «преимущества» and «недостатки»).
New Auto-Interp
Negative Logits
벨
-0.07
ties
-0.06
帮
-0.06
-0.06
princ
-0.06
_hand
-0.06
_tem
-0.06
Wilson
-0.06
colabor
-0.06
다음
-0.06
POSITIVE LOGITS
bestos
0.07
hư
0.07
энерг
0.06
paj
0.06
freshmen
0.06
_record
0.06
كرة
0.06
secretary
0.06
(cart
0.06
Kurum
0.06
Activations Density 0.032%