INDEX
    Explanations

    Avoid colons descriptions in brackets

    New Auto-Interp
    Negative Logits
    ................
    -0.08
     vikt
    -0.07
     نمود
    -0.07
    º
    -0.07
     supers
    -0.07
    ANO
    -0.07
    -0.07
    arek
    -0.07
    БО
    -0.07
    -0.07
    POSITIVE LOGITS
     özel
    0.08
     alcohol
    0.07
     bathrooms
    0.07
    0.07
    につ
    0.07
     briefing
    0.07
     keb
    0.07
    ce
    0.07
    0.07
    歌词
    0.07
    Act Density 0.005%

    No Known Activations