INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Fully
    -0.08
    Iso
    -0.07
    lod
    -0.07
    .im
    -0.07
     менее
    -0.06
     wishlist
    -0.06
     india
    -0.06
    Secret
    -0.06
    ila
    -0.06
     UF
    -0.06
    POSITIVE LOGITS
     drowned
    0.12
     drowning
    0.12
     drown
    0.11
    ROWN
    0.08
     Brown
    0.06
     Hans
    0.06
    Plain
    0.06
    خدام
    0.06
     baptized
    0.06
    ーデ
    0.06
    Act Density 0.001%

    No Known Activations