INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     surprisingly
    -0.07
     Lowest
    -0.07
     Toyota
    -0.06
     benz
    -0.06
     довольно
    -0.06
     tinder
    -0.06
    uft
    -0.06
     sistem
    -0.06
    mium
    -0.06
     Joi
    -0.06
    POSITIVE LOGITS
     rawData
    0.06
     casualty
    0.06
    比赛
    0.06
    URRED
    0.06
     dark
    0.06
    _INITIALIZ
    0.06
     الات
    0.06
    virt
    0.06
    990
    0.06
    ्दर
    0.06
    Act Density 0.001%

    No Known Activations