INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ува
    -0.06
     jednotlivých
    -0.06
     aggregated
    -0.06
    nač
    -0.06
    emplates
    -0.06
     Commons
    -0.06
     weekends
    -0.06
    abilia
    -0.06
     autre
    -0.06
    poč
    -0.06
    POSITIVE LOGITS
    _DH
    0.07
     militia
    0.06
    0.06
     girl
    0.06
     çeşitli
    0.06
    _YELLOW
    0.06
     mineral
    0.06
    policy
    0.06
     missionary
    0.06
    λία
    0.06
    Act Density 0.001%

    No Known Activations