INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Uber
    -0.07
     пораж
    -0.06
     П
    -0.06
     Bard
    -0.06
    Wifi
    -0.06
     lowering
    -0.06
     Loft
    -0.06
     Healthcare
    -0.06
    Wow
    -0.06
     VIP
    -0.06
    POSITIVE LOGITS
     pronunciation
    0.07
    _lib
    0.06
    0.06
    (condition
    0.06
    errar
    0.06
    _hist
    0.06
     debugging
    0.06
    )set
    0.06
    racial
    0.06
     horse
    0.06
    Act Density 0.015%

    No Known Activations