INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     TestCase
    -0.07
     vertical
    -0.07
     jealous
    -0.07
    accepted
    -0.06
     Uzbek
    -0.06
    стра
    -0.06
     geopolitical
    -0.06
     bubbles
    -0.06
     largely
    -0.06
    STRUCTIONS
    -0.06
    POSITIVE LOGITS
    fty
    0.10
    infinity
    0.08
     Infinity
    0.07
     plea
    0.07
     infinity
    0.07
    inf
    0.07
    Infinity
    0.06
    �ん
    0.06
    resher
    0.06
     контролю
    0.06
    Act Density 0.002%

    No Known Activations