INDEX
    Explanations

    past tense verbs

    New Auto-Interp
    Negative Logits
     insistence
    -0.07
    кур
    -0.06
     manslaughter
    -0.06
    とい
    -0.06
     proper
    -0.06
    -metal
    -0.06
    існо
    -0.06
    
    -0.06
     البته
    -0.06
    elleicht
    -0.06
    POSITIVE LOGITS
    vented
    0.07
    ôle
    0.07
     socket
    0.07
     группы
    0.07
    _Local
    0.06
     explained
    0.06
    unched
    0.06
     pushed
    0.06
    φέ
    0.06
    ierte
    0.06
    Act Density 0.014%

    No Known Activations