INDEX
    Explanations

    actions leading to outcomes

    New Auto-Interp
    Negative Logits
     seseorang
    0.97
    0.94
    icing
    0.93
    ผม
    0.93
    the
    0.88
    orting
    0.88
    0.86
    oration
    0.84
    あなたは
    0.83
    iding
    0.83
    POSITIVE LOGITS
     נ
    1.19
     might
    1.17
     may
    1.12
     Με
    1.07
     μο
    1.07
     κα
    1.07
     א
    1.06
     בת
    1.06
     tends
    1.05
     must
    1.03
    Act Density 0.005%

    No Known Activations