INDEX
Explanations
The neuron fires on occurrences of the word “immobile,” i.e. it detects mentions of immobility.
New Auto-Interp
Negative Logits
Feng
-0.07
提示
-0.07
Sug
-0.07
cribed
-0.07
chalk
-0.07
sudo
-0.07
cous
-0.07
twist
-0.06
candid
-0.06
Best
-0.06
POSITIVE LOGITS
Imm
0.10
Imm
0.10
immobil
0.10
imm
0.09
immersed
0.08
IMM
0.08
immense
0.07
ime
0.07
imm
0.07
imary
0.07
Activations Density 0.013%