INDEX
Explanations
This neuron fires on named entities—especially film titles, actor/character names, and other proper nouns.
New Auto-Interp
Negative Logits
_pw
-0.06
吃
-0.06
�
-0.06
_VISIBLE
-0.06
Store
-0.06
damages
-0.06
.Priority
-0.06
nuclear
-0.06
등록
-0.06
medio
-0.06
POSITIVE LOGITS
situation
0.06
legally
0.06
books
0.06
parad
0.06
纸
0.06
ινή
0.06
ským
0.06
>alert
0.06
BEEN
0.06
ανδ
0.06
Activations Density 0.015%