INDEX
    Explanations

    affirmations or strong agreements in statements

    New Auto-Interp
    Negative Logits
    idy
    -0.15
    оÑĩек
    -0.15
    _trait
    -0.15
    vla
    -0.14
    gaard
    -0.14
    queda
    -0.13
    .ef
    -0.13
     trou
    -0.13
    ảo
    -0.13
    ocrates
    -0.13
    POSITIVE LOGITS
    um
    0.16
    rost
    0.15
    ief
    0.15
    ected
    0.14
    fest
    0.14
    oup
    0.14
    ely
    0.14
    å¯Į
    0.14
    ose
    0.14
    atest
    0.14
    Act Density 0.025%

    No Known Activations