INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     پاد
    -0.07
    itat
    -0.07
    STRUCTION
    -0.06
    цій
    -0.06
    ­ing
    -0.06
    ститу
    -0.06
    Dam
    -0.06
    \model
    -0.06
     lingu
    -0.06
    road
    -0.06
    POSITIVE LOGITS
     anime
    0.14
    Anime
    0.12
     Anime
    0.12
     manga
    0.08
    "strings
    0.07
     če
    0.07
    anime
    0.07
     Manga
    0.07
     zprac
    0.06
     literature
    0.06
    Act Density 0.003%

    No Known Activations