INDEX
Explanations
The neuron selectively activates on occurrences of the word “where.”
New Auto-Interp
Negative Logits
-period
-0.07
.Add
-0.06
Late
-0.06
اکی
-0.06
-volume
-0.06
(job
-0.06
яд
-0.06
unit
-0.06
желуд
-0.06
料無料
-0.06
POSITIVE LOGITS
where
0.11
donde
0.08
hvor
0.07
Westminster
0.07
where
0.07
liament
0.06
protestors
0.06
steam
0.06
emplates
0.06
WHERE
0.06
Activations Density 0.037%