INDEX
    Explanations

    Players declare actions

    New Auto-Interp
    Negative Logits
    NaN
    0.46
    0.42
    ッフ
    0.40
     परवानगी
    0.40
     અનુ
    0.40
     Unauthorized
    0.40
    Unauthorized
    0.40
    0.40
     ներ
    0.39
     สิน
    0.39
    POSITIVE LOGITS
     synonymous
    0.48
     cinta
    0.44
    스의
    0.44
     carrots
    0.43
     sadde
    0.43
    ↵↵
    0.42
     aldığı
    0.42
     sette
    0.42
    apal
    0.41
     otto
    0.41
    Act Density 0.006%

    No Known Activations