INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    emb
    -0.07
    виж
    -0.07
    ений
    -0.07
     підготов
    -0.07
     monstr
    -0.07
     Making
    -0.07
    -0.07
    wil
    -0.07
    -0.07
    νό
    -0.07
    POSITIVE LOGITS
    ;k
    0.06
    *b
    0.06
    Armor
    0.06
    TimeZone
    0.06
    Category
    0.06
     Gala
    0.06
     Sev
    0.06
    _region
    0.06
     insult
    0.06
     whatsapp
    0.05
    Act Density 0.001%

    No Known Activations