INDEX
    Explanations

    expressions of agreement and appreciation in discussions

    New Auto-Interp
    Negative Logits
    хьтан
    -0.57
    ConstraintMaker
    -0.53
    argest
    -0.52
     arbitrarily
    -0.50
     otomatig
    -0.48
     مشين
    -0.47
     Designer
    -0.47
    timeter
    -0.46
    нгред
    -0.46
     DELAY
    -0.46
    POSITIVE LOGITS
     truth
    0.65
     valid
    0.64
     truths
    0.63
    truth
    0.59
     insightful
    0.54
    Truth
    0.52
     agree
    0.51
     verdades
    0.51
    valid
    0.51
    真理
    0.51
    Act Density 0.382%

    No Known Activations