INDEX
Explanations
The neuron activates on mentions of “prize” or “prizes” (and related discussion of prize‐based rewards or competitions).
New Auto-Interp
Negative Logits
XS
-0.07
("")↵-0.07
PLL
-0.06
todos
-0.06
hart
-0.06
Haram
-0.06
Beats
-0.06
приз
-0.06
контра
-0.06
하세요
-0.06
POSITIVE LOGITS
.Nav
0.07
size
0.07
scp
0.06
mime
0.06
replic
0.06
satur
0.06
cartesian
0.06
Wit
0.06
ublic
0.06
rồi
0.06
Activations Density 0.038%