INDEX
Explanations
The neuron activates on occurrences of the word “casino” (in its various tokenized forms).
New Auto-Interp
Negative Logits
equipment
-0.08
extend
-0.07
blue
-0.07
Thumb
-0.06
OLED
-0.06
extends
-0.06
nerd
-0.06
、大
-0.06
ful
-0.06
bent
-0.06
POSITIVE LOGITS
Casino
0.10
casino
0.08
carnival
0.08
INO
0.07
casual
0.07
ysi
0.07
Casinos
0.07
"?
0.07
Cave
0.07
ково
0.07
Activations Density 0.002%