INDEX
Explanations
The neuron’s primary function is to detect occurrences of the word “host.”
New Auto-Interp
Negative Logits
清
-0.07
Coul
-0.07
falls
-0.07
glm
-0.06
Pearl
-0.06
preliminary
-0.06
Rum
-0.06
Linear
-0.06
fallen
-0.06
alu
-0.06
POSITIVE LOGITS
host
0.15
Host
0.15
host
0.14
Host
0.13
HOST
0.12
HOST
0.12
host
0.11
hosts
0.11
.Host
0.11
hosting
0.10
Activations Density 0.021%