INDEX
    Explanations

    This neuron activates on the word “love” (especially in statements expressing or asking about love).

    New Auto-Interp
    Negative Logits
    -0.07
    орож
    -0.07
    orning
    -0.06
    |;↵
    -0.06
    ationship
    -0.06
     sharing
    -0.06
     induce
    -0.06
     disbelief
    -0.06
    fiction
    -0.06
     compact
    -0.06
    POSITIVE LOGITS
    EUR
    0.07
     strav
    0.07
     Love
    0.06
    ecedor
    0.06
    ε
    0.06
    รอง
    0.06
    \Object
    0.06
    etleri
    0.06
    aken
    0.06
     İstanbul
    0.06
    Act Density 0.035%

    No Known Activations