INDEX
    Explanations

    prior, most, first, each, recent, some

    New Auto-Interp
    Negative Logits
     sehingga
    0.41
     sodass
    0.40
    }.
    0.39
     gdyż
    0.39
     ذریع
    0.38
     waardoor
    0.38
     betrayal
    0.37
     samt
    0.36
    罢了
    0.36
     zodat
    0.35
    POSITIVE LOGITS
    ،
    0.52
    0.50
    0.47
    *,
    0.45
    $,
    0.44
     ،
    0.43
    ,
    0.43
    ^{+},
    0.42
    ,\\
    0.41
    ¹,
    0.41
    Act Density 0.111%

    No Known Activations