INDEX
    Explanations

    social interactions

    New Auto-Interp
    Negative Logits
     fos
    -0.06
    wner
    -0.06
    	Output
    -0.06
    _details
    -0.06
    _then
    -0.06
    enarios
    -0.06
    /assert
    -0.06
    ictim
    -0.06
    ,error
    -0.06
     "'");↵
    -0.06
    POSITIVE LOGITS
     پیچ
    0.07
    WORD
    0.07
    animal
    0.07
     stone
    0.07
    animals
    0.06
     berry
    0.06
    oren
    0.06
    apl
    0.06
     onchange
    0.06
     quantum
    0.06
    Act Density 0.000%

    No Known Activations