INDEX
    Explanations

    affirmations and expressions of agreement

    New Auto-Interp
    Negative Logits
    елиÑĩ
    -0.16
    arella
    -0.15
    ushman
    -0.15
    ocop
    -0.14
     Bounty
    -0.14
    viÄį
    -0.14
    ãĥ«ãĤ¯
    -0.14
    ount
    -0.14
    olest
    -0.14
    éal
    -0.13
    POSITIVE LOGITS
     correct
    0.59
     Correct
    0.45
     yes
    0.45
    Correct
    0.45
     right
    0.44
    correct
    0.43
    right
    0.39
     Äijúng
    0.39
     Yes
    0.38
    yes
    0.36
    Act Density 0.232%

    No Known Activations