INDEX
    Explanations

    phrases indicating uncertainty or questioning established norms and expectations

    New Auto-Interp
    Negative Logits
     not
    -0.23
    oint
    -0.16
    ارج
    -0.15
    à¹ģล
    -0.15
     no
    -0.15
    not
    -0.15
     ikke
    -0.15
    à¹Ħม
    -0.15
     не
    -0.15
    combe
    -0.14
    POSITIVE LOGITS
     anymore
    0.53
     necessarily
    0.34
     any
    0.30
     nor
    0.29
     anywhere
    0.27
     anything
    0.26
     slightest
    0.25
    ä»»ä½ķ
    0.25
     yet
    0.24
    nor
    0.23
    Act Density 0.543%

    No Known Activations