INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     touched
    -0.06
     pots
    -0.06
     PRICE
    -0.06
     Gio
    -0.06
     compensate
    -0.05
     близько
    -0.05
    breadcrumbs
    -0.05
    ffffff
    -0.05
     richest
    -0.05
    Ctx
    -0.05
    POSITIVE LOGITS
     warfare
    0.37
     Warfare
    0.31
    fare
    0.16
     борь
    0.08
     toilet
    0.07
     yemek
    0.07
    0.07
    ยนตร
    0.07
     sexuales
    0.07
     wear
    0.06
    Act Density 0.001%

    No Known Activations