INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     jeopard
    -0.08
     overweight
    -0.08
    -threat
    -0.08
     banned
    -0.08
     BMI
    -0.08
     ban
    -0.08
     Ban
    -0.08
     threatens
    -0.07
     wat
    -0.07
     stereotypes
    -0.07
    POSITIVE LOGITS
    guid
    0.09
     Μέ
    0.09
     Guid
    0.09
    .Guid
    0.09
    0.08
     માર્ગ
    0.08
     യൂണ
    0.08
     μον
    0.08
     guid
    0.08
     Κ
    0.08
    Act Density 0.018%

    No Known Activations