INDEX
    Explanations

    Sentence endings

    New Auto-Interp
    Negative Logits
     deserve
    -0.08
     waarschijnlijk
    -0.08
     probably
    -0.07
     believe
    -0.07
     ferme
    -0.07
     descend
    -0.07
     encontra
    -0.07
     oversee
    -0.07
     thumbnail
    -0.07
     treat
    -0.07
    POSITIVE LOGITS
    IPO
    0.09
     бывают
    0.08
    halten
    0.08
    Funcion
    0.08
     عج
    0.08
    (bound
    0.08
    nier
    0.08
     illustrates
    0.08
     그렇
    0.08
    084
    0.08
    Act Density 0.056%

    No Known Activations