INDEX
Explanations
The main thing this neuron does is detect mentions of movie theaters or cinema‐related terms (e.g., “cinema,” “theater,” movie‐theater chain names).
New Auto-Interp
Negative Logits
-max
-0.07
904
-0.07
Dreams
-0.06
drink
-0.06
Samurai
-0.06
strongly
-0.06
่ว
-0.06
isChecked
-0.06
Flame
-0.06
infield
-0.06
POSITIVE LOGITS
&↵
0.07
getattr
0.07
.axes
0.06
诊
0.06
าศ
0.06
ослав
0.06
アル
0.06
Ngb
0.06
...",↵
0.06
]?.
0.06
Activations Density 0.020%