INDEX
    Explanations

    expressions indicating past experiences or actions

    New Auto-Interp
    Negative Logits
    ابÙĬ
    -0.19
    ovÃŃ
    -0.17
    unas
    -0.15
    aris
    -0.15
    ói
    -0.14
    emouth
    -0.14
    anner
    -0.14
    ãģĹãģ®
    -0.14
    oppel
    -0.14
    еÑĩно
    -0.14
    POSITIVE LOGITS
     times
    0.17
     since
    0.15
     Lis
    0.14
    imes
    0.14
    .times
    0.14
    quez
    0.14
    _Float
    0.14
     thus
    0.14
     occasion
    0.14
     mist
    0.14
    Act Density 0.309%

    No Known Activations