INDEX
    Explanations

    affirmative statements or confirmations

    New Auto-Interp
    Negative Logits
    InjectAttribute
    -0.75
    ViewFeatures
    -0.70
    Démographie
    -0.70
    multer
    -0.69
    enumii
    -0.68
    Хьажоргаш
    -0.68
     Muc
    -0.66
    mbggenerated
    -0.64
     ostavi
    -0.64
    protoimpl
    -0.64
    POSITIVE LOGITS
    Это
    0.73
     Это
    0.72
     jde
    0.68
    agissait
    0.58
     дело
    0.58
     bukan
    0.53
    Це
    0.53
     tratta
    0.52
     primarily
    0.52
    したのは
    0.51
    Act Density 0.073%

    No Known Activations