INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     multiline
    -0.06
     води
    -0.06
     microsoft
    -0.06
     Finder
    -0.06
     Twitch
    -0.06
    ubern
    -0.06
     sublicense
    -0.06
    -users
    -0.06
    houette
    -0.06
    -0.06
    POSITIVE LOGITS
    /f
    0.08
     obrov
    0.07
     fark
    0.07
     dag
    0.07
    \Exceptions
    0.06
    0.06
     arom
    0.06
    ヴィ
    0.06
     Dak
    0.06
    ακ
    0.06
    Act Density 0.003%

    No Known Activations