INDEX
    Explanations

    phrases indicating confirmation or negation in a context of prior actions or states

    New Auto-Interp
    Negative Logits
    InitVars
    -0.75
     schi
    -0.68
     gillar
    -0.68
    NamedQueries
    -0.66
     skär
    -0.65
     scolaires
    -0.65
    zeczytaj
    -0.64
     nemlig
    -0.64
     rød
    -0.63
     convaincre
    -0.61
    POSITIVE LOGITS
     sudah
    1.13
     Sudah
    1.04
     đã
    0.98
     уже
    0.98
    Sudah
    0.96
    0.94
     telah
    0.92
     já
    0.92
     Уже
    0.91
    Уже
    0.86
    Act Density 0.137%

    No Known Activations