INDEX
Explanations
paradise
The neuron activates on occurrences of the word “Paradise” (i.e. the token sequence spelling out “Paradise”).
New Auto-Interp
Negative Logits
elekt
-0.07
姉
-0.07
_ke
-0.06
вважа
-0.06
ugador
-0.06
bow
-0.06
ecs
-0.06
electrodes
-0.06
_Control
-0.06
thro
-0.06
POSITIVE LOGITS
Paradise
0.14
paradise
0.12
Eden
0.09
oasis
0.09
haven
0.09
Oasis
0.08
istence
0.08
공지
0.07
abee
0.07
ره
0.07
Activations Density 0.007%