INDEX
    Explanations

    phrases that express personal reflections or subjective opinions

    New Auto-Interp
    Negative Logits
     -*-č↵
    -0.15
    orthand
    -0.14
    larım
    -0.12
    _Valid
    -0.12
    надлеж
    -0.12
    _Two
    -0.12
    reglo
    -0.11
    ulması
    -0.11
    lepÅ¡ÃŃ
    -0.11
    æł·çļĦ
    -0.11
    POSITIVE LOGITS
     one
    1.20
     ÛĮÚ©ÛĮ
    0.74
     uno
    0.72
    ä¹ĭä¸Ģ
    0.69
     eines
    0.68
    one
    0.67
     одного
    0.66
     salah
    0.65
    _one
    0.63
     одной
    0.62
    Act Density 1.184%

    No Known Activations