INDEX
    Explanations

    phrases expressing acceptance or permission

    New Auto-Interp
    Negative Logits
    maal
    -0.19
    ilee
    -0.15
    yll
    -0.15
    polator
    -0.15
    ILE
    -0.15
    kop
    -0.14
    ilet
    -0.14
     beauty
    -0.14
    hower
    -0.14
    út
    -0.14
    POSITIVE LOGITS
    aby
    0.18
    apy
    0.15
    ordova
    0.14
    /wait
    0.14
    à¥įयप
    0.13
    ably
    0.13
    rum
    0.13
    pez
    0.13
    asper
    0.13
    ola
    0.13
    Act Density 0.034%

    No Known Activations