INDEX
    Explanations

    expressions of agreement or validation in conversation

    New Auto-Interp
    Negative Logits
    elles
    -0.15
    ropp
    -0.15
    είο
    -0.14
    upy
    -0.14
    ophone
    -0.14
    opis
    -0.13
    hud
    -0.13
    γο
    -0.13
    _managed
    -0.13
     STATS
    -0.13
    POSITIVE LOGITS
     correct
    0.60
     spot
    0.50
    spot
    0.47
    correct
    0.46
     Correct
    0.43
     Spot
    0.42
    Correct
    0.41
    Spot
    0.40
     accurate
    0.40
    -spot
    0.38
    Act Density 0.200%

    No Known Activations