INDEX
    Explanations

    phrases related to the potential for positive or effective outcomes when guidelines or support are in place

    New Auto-Interp
    Negative Logits
     safest
    -0.19
    uros
    -0.17
    jen
    -0.14
     безопаÑģ
    -0.13
    ACY
    -0.13
     safer
    -0.13
    arest
    -0.13
    ãĥ³ãĤ°ãĥ«
    -0.13
     Cove
    -0.13
    wig
    -0.13
    POSITIVE LOGITS
     properly
    0.42
    proper
    0.40
     proper
    0.39
     Proper
    0.37
     correctly
    0.33
     пÑĢавилÑĮно
    0.29
     richtig
    0.28
    æŃ£ç¡®
    0.28
     Äijúng
    0.25
     correct
    0.23
    Act Density 0.391%

    No Known Activations