INDEX
Explanations
The neuron activates on quantifier and scope words indicating broad or approximate coverage—e.g. “world,” “nearly,” “any,” “mix.”
New Auto-Interp
Negative Logits
SHR
-0.07
ฟ
-0.07
rehe
-0.06
softmax
-0.06
deposited
-0.06
Mature
-0.06
련
-0.06
Golf
-0.06
/create
-0.06
mem
-0.06
POSITIVE LOGITS
puedes
0.07
come
0.06
convert
0.06
NK
0.06
konk
0.06
íc
0.06
кредит
0.06
você
0.06
elő
0.06
鎮
0.06
Activations Density 0.145%