INDEX
Explanations
finding similarities
The neuron fires on descriptive feature‐oriented words—terms like “elements,” “gameplay,” “style,” “atmosphere,” and similar nouns that describe attributes of films or games.
New Auto-Interp
Negative Logits
mouseout
-0.07
uk
-0.06
Dept
-0.06
.one
-0.06
conte
-0.06
topLeft
-0.06
Sachs
-0.06
عبارت
-0.06
fortunately
-0.06
ques
-0.06
POSITIVE LOGITS
\F
0.07
savory
0.07
anxious
0.06
�
0.06
ovy
0.06
نفر
0.06
hük
0.06
RTLU
0.06
_dbg
0.06
vå
0.06
Activations Density 0.017%