INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Infos
    -0.08
     uomo
    -0.07
     homosex
    -0.06
     людям
    -0.06
     TEXT
    -0.06
    getObject
    -0.06
    ondere
    -0.06
    orientation
    -0.06
    -strokes
    -0.06
     دارم
    -0.06
    POSITIVE LOGITS
     repeal
    0.13
     repealed
    0.10
     dismant
    0.08
     deline
    0.07
     dismantle
    0.07
    ané
    0.07
    -------------</
    0.07
     debunk
    0.06
    mong
    0.06
    makta
    0.06
    Act Density 0.001%

    No Known Activations