INDEX
Explanations
This neuron activates on the word “love” (especially in statements expressing or asking about love).
New Auto-Interp
Negative Logits
耗
-0.07
орож
-0.07
orning
-0.06
|;↵
-0.06
ationship
-0.06
sharing
-0.06
induce
-0.06
disbelief
-0.06
fiction
-0.06
compact
-0.06
POSITIVE LOGITS
EUR
0.07
strav
0.07
Love
0.06
ecedor
0.06
ε
0.06
รอง
0.06
\Object
0.06
etleri
0.06
aken
0.06
İstanbul
0.06
Activations Density 0.035%