INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    skin
    -0.07
    .tree
    -0.06
     knight
    -0.06
     Men
    -0.06
     MEN
    -0.06
    .signal
    -0.06
    SEN
    -0.06
    issa
    -0.06
     بخشی
    -0.06
     McCain
    -0.06
    POSITIVE LOGITS
     elaborate
    0.15
     elabor
    0.12
     melod
    0.08
    0.08
     δημο
    0.08
     Elvis
    0.07
     Lego
    0.07
    RO
    0.07
    la
    0.07
     arab
    0.07
    Act Density 0.003%

    No Known Activations