INDEX
    Explanations

    references to individuals and their actions or states

    New Auto-Interp
    Negative Logits
    seamnă
    -0.75
     bave
    -0.53
    Jegyzetek
    -0.52
     Silly
    -0.51
    activa
    -0.47
    pett
    -0.47
    <bos>
    -0.47
     ift
    -0.47
    Vou
    -0.45
    ulent
    -0.45
    POSITIVE LOGITS
     himself
    1.94
    himself
    1.65
     his
    1.48
     Himself
    1.44
    his
    1.33
    His
    1.33
     His
    1.21
     He
    1.09
    He
    1.08
     he
    1.07
    Act Density 0.441%

    No Known Activations