INDEX
Explanations
This neuron detects occurrences of the word “fun.”
New Auto-Interp
Negative Logits
очі
-0.07
зрост
-0.07
loyd
-0.07
serir
-0.07
zie
-0.07
BILL
-0.07
woord
-0.07
heavy
-0.07
CTR
-0.06
overd
-0.06
POSITIVE LOGITS
fun
0.17
Fun
0.16
Fun
0.11
fun
0.10
FUN
0.09
FUN
0.09
grátis
0.08
fun
0.07
Fut
0.07
나
0.07
Activations Density 0.017%